Most Stable Raspberry Pi? 81% Better NTP with Thermal Management

(austinsnerdythings.com)

114 points | by todsacerdoti 4 hours ago

14 comments

jsolson 3 hours ago
You might have even better precision if you stay away from CPU0 and also set idle=poll in your kernel command line. Lots of things (including other interrupts) often land on CPU0. It would not be my first choice for something where I wanted high timing precision.
[-]
- yc-kraln 2 hours ago
  I came here to post this. We make a lot of the same sorts of optimizations for our OS distro (debian based) -- disabling frequency scaling, core pinning, etc. Critically, CPU0 has a bunch of stuff you cannot push, and you're better off with using one of the other cores as an isolated island.
  This is what the scheduler latency looks like on our isolated core:
  # Total: 000300000 # Min Latencies: 00001 # Avg Latencies: 00005 # Max Latencies: 00059 # Histogram Overflows: 00000
  (those are uS!)
  [-]
  - msephton 44 minutes ago
    Very cool. What are you running on it? What's your use case?
avidiax 1 hour ago
Why not put a resistor (for heating) and a bit of foam insulation on the crystal?
This is way more direct than spacebar heating.
You could also add a transistor attached to the resistor and a GPIO and use the clock drift as a proxy for temperature. PID is probably enough but since you have a 24 hour cycle you could calculate a baseline heating schedule.
[-]
- ErroneousBosh 27 minutes ago
  This is a technique that's been used for crystal oscillators for almost a century by now. I have some 1950s crystal ovens that are a little metal box that fits over the crystal (quite a large crystal, about the size of two or three SD cards stacked) and heats up to around 75°C. The crystals were supposed to be specially cut to have close to zero temperature coefficient around that temperature so the slight up and down drift caused by the thermostat wouldn't affect it.
  I have test equipment made as recently as the early 2000s that uses a crystal oscillator in an oven as a frequency standard. It takes a good five minutes to fully stabilise.
geerlingguy 3 hours ago
It's an SBC-scale OXCO. I half wonder if adding a larger heatsink, or even putting thermal mass around the existing oscillator could also help, or if the heating is more localized in the PCB itself.
Always fun new things to learn when doing something "simple" like setting up an NTP server!
[-]
- MayeulC 15 minutes ago
  > or even putting thermal mass around the existing oscillator
  I was thinking along these lines as well. Put a metal block on the CPU and oscillator for thermal mass (not sure if separate blocks would be better). Ideally, with a large enough thermal capacity, the block should reach an average temperature and remain there.
  Inertia is also good even if the temperature is not constant: clock drift can be measured and compensated. If the temperature rises slowly, the clock speed will increase slowly: the rate can be measured and compensated for. Jitter is the issue here, and thermal inertia should dampen it.
  It may also be worth preventing convection from happening on the board. Putting the Pi in a wool sock may not be the best idea depending on its temperature, but an electrically insulating thermal conductor (or an electrical insulation layer + steel wool may do it).
  Heatsinks may also be counter-productive (if they have a small thermal capacity), as their temperature depends on room temperature, which changes during the day.
- LeoPanthera 2 hours ago
  Flirc makes a metal Pi case where the CPU is pressed against the metal body of the case, resulting in a huge thermal mass for passive cooling. I have a bunch of them and it works very well. No fan necessary.
- jauntywundrkind 2 hours ago
  I was thinking it might be nice to add some insulation around some the pi's enclosure, to reduce its cooling significantly. A little bit to tamp down any potential rapid fluctuations in the room's temperature (if someone opens a window, steps out of the bath, whatever). But more so that it could save a watt or two of power, by having the time-burner cores working much less.
  You're right that this is a over-controller oscillator. The goal generally with ovens is to keep heat! (To an extent of course.)
- IlikeKitties 3 hours ago
  > I half wonder if adding a larger heatsink, or even putting thermal mass around the existing oscillator could also help, or if the heating is more localized in the PCB itself.
  That would likely make it worse. The trick here is that the other cores are running at essentially their maximum temperature and and will dynamically reduce their clockspeed if required to keep from going above that limit. In essence, the environment becomes actively temperature controlled. If the ambient heat goes higher, the cores clock lower, if it gets colder the cores clock higher (up to a point).
  If you add too much heat dissipation, the total power used by those cores might not be enough to keep well above ambient.
  [-]
  - mytailorisrich 2 hours ago
    Extreme power dissipation would keep temperature stable so that this whole setup might not be needed, though.
    Author should experiment with liquid nitrogen ;) [1]
    [11] https://www.xda-developers.com/liquid-nitrogen-cooling-raspb...
    [-]
    - IlikeKitties 2 hours ago
      The timing crystals don't work better when colder but worse. That's why they are heated in high end time appliances, not cooled.
      [-]
      - mytailorisrich 1 hour ago
        Isn't the issue here temperature stability? (Also, humour)
        [-]
        cap11235 50 minutes ago
        Right, and they are heated because a hot wire is much simpler than a fridge.
HPsquared 23 minutes ago
A microsecond is still quite a lot if GPS is involved, that's about 1000 light-feet!
[-]
- speedgoose 14 minutes ago
  How much is it in eagle wingspans ?
- jojomodding 16 minutes ago
  or 300 light-meters
anonymousDan 1 hour ago
Couldn't you model the effect of temperature on clock drift and try to factor that in dynamically (e.g. using a temperature sensor) instead of burning CPU unnecessarily?
[-]
- mlichvar 14 minutes ago
  That's what the chrony tempcomp directive is for. But you would have to figure out the coefficients, it's not automatic.
  An advantage of constantly loading at least one core of the CPU might be preventing the deeper power states from kicking in, which should make the RX timestamping latency more stable and improve stability of synchronization of NTP clients.
ACCount37 3 hours ago
It's the good old OCXO - Oven Controlled Crystal Oscillator. But the heating element is the CPU. Fucking hilarious.
throwaway81523 1 hour ago
I wonder about using an RPI Pico for this instead, using the Pico's synchronous PIO gizmo to intercept the PPS pulses.
mort96 1 hour ago
What's the point in reading posts like this when the solution "they" came up with is basically, "tell Claude to make a script which does whatever"? I read blog posts to read thoughts from people, not computers
[-]
- stavros 1 hour ago
  Are you similarly frustrated that he didn't sit there 24/7, heating the oscillator with a small lighter when needed, but automated it instead? Why would this be more interesting for you if he'd written the script himself?
  [-]
  - mort96 57 minutes ago
    > Are you similarly frustrated that he didn't sit there 24/7, heating the oscillator with a small lighter when needed, but automated it instead?
    No
    > Why would this be more interesting for you if he'd written the script himself?
    Was I unclear? I read blog posts to read thoughts from humans, not from computers.
    [-]
    - stavros 54 minutes ago
      Well, I guess your era of reading is over, sadly.
- tensegrist 28 minutes ago
  But here's the key insight: what's the point in reading posts where the post itself is "tell Claude to write a post about…"
hnchm 2 hours ago
There was a paper on this in 2022. Not sure if it's used in production or not.
https://www.usenix.org/conference/nsdi22/presentation/najafi
Kerbonut 2 hours ago
Wouldn't a temperature compensating algorithm be just as effective?
[-]
nottorp 2 hours ago
The related question is:
Is the Pi going the Pentium 4 route?
[-]
- mort96 1 hour ago
  What is this even supposed to mean? What's "the Pentium 4 route"?
  [-]
  - nottorp 1 hour ago
    I'm an old fart :)
    Intel tried to scale frequency up with the Pentium 4 in the name of performance, and it ended up extremely hot and power hungry. Just like some high end CPUs now, but then it applied to every model from Intel.
    I suppose you don't remember when a Raspberry Pi could run fine even without a heatsink, let alone active cooling. That's more recent than the Pentium 4.
    [-]
    - esskay 22 minutes ago
      It's already there really. It's heat output on the 4 and more so the 5 benefits from active cooling. The good news is the pi is practically pointless as a product for most people these days, and vastly better options are available cheaper, so unless you genuinely need the gpio theres little reason to buy one - very much their own fault for focusing on commercial applications but the Pi 5 as a product is practically pointless for a consumer use at this point. An old Pi 2 or 3 which dont need any cooling are very useful still for a range of applications but the newer ones are in a bit of a weird niche where they're overpriced compared to most options.
irjustin 3 hours ago
I love this. Chasing perfection for perfection alone.
ckocagil 1 hour ago
That's a neat software solution. My first inclination would be to grab a soldering iron and replace the crystal with either a TCXO or a socket to provide an external clock disciplined to the 1PPS.
jauntywundrkind 3 hours ago
Amazing project, great write-up. Would love to see a temperature graph as well! I'm wondering how good the PID controller here is working.
For future improvements, a cheap but effective win might be to put a temperature sensor on the oscillator (or two or three in various places). And use that to drive the PID loop.
Even if just experimental & not long term, it would be nice to have data on how strong the correlation is between the cpu & oscillator temperatures. To see their difference and how much that changes over time. Another graph! CPU vs txco (vs ambient?) temperatures over time.
[-]
- Kerbonut 2 hours ago
  > put a temperature sensor on the oscillator
  At that point, couldn't we just use the temperature value to compensate for the drift?
  [-]
  - HPsquared 26 minutes ago
    You can do that as well, but (in theory) the correction will be smaller than it otherwise would need to be if the temperature is regulated within a narrower range.