Show HN: Git bayesect – Bayesian Git bisection for non-deterministic bugs

(github.com)

135 points | by hauntsaninja 4 days ago

6 comments

hauntsaninja 4 days ago
git bisect works great for tracking down regressions, but relies on the bug presenting deterministically. But what if the bug is non-deterministic? Or worse, your behaviour was always non-deterministic, but something has changed, e.g. your tests went from somewhat flaky to very flaky.
In addition to the repo linked in the title, I also wrote up a little bit of the math behind it here: https://hauntsaninja.github.io/git_bayesect.html
[-]
- ajb 51 minutes ago
  Nice! I implemented a similar thing a while back: https://github.com/Ealdwulf/BBChop
  I'm going to have to check out how you got linear time with Shannon entropy, because I used Renyi entropy to do that, to make the algebra easier.
  It's also possible to do it over the DAG, rather than a linear history - although that makes the code a lot more complicated. Unfortunately there doesn't seem to be a linear time cumulative sum algorithm over dags, so it's super linear in some cases.
- Myrmornis 2 hours ago
  This is really cool! Is there an alternative way of thinking about it involving a hidden markov model, looking for a change in value of an unknown latent P(fail)? Or does your approach end up being similar to whatever the appropriate Bayesian approach to the HMM would be?
supermdguy 3 days ago
Okay this is really fun and mathematically satisfying. Could even be useful for tough bugs that are technically deterministic, but you might not have precise reproduction steps.
Does it support running a test multiple times to get a probability for a single commit instead of just pass/fail? I guess you’d also need to take into account the number of trials to update the Beta properly.
[-]
- hauntsaninja 3 days ago
  Yay, I had fun with it too!
  IIUC the way you'd do that right now is just repeatedly recording the individual observations on a single commit, which effectively gives it a probability + the number of trials to do the Beta update. I don't yet have a CLI entrypoint to record a batch observation of (probability, num_trials), but it would be easy to add one
  But ofc part of the magic is that git_bayesect's commit selection tells you how to be maximally sample efficient, so you'd only want to do a batch record if your test has high constant overhead
  [-]
  - __s 1 hour ago
    recompiling can be high constant overhead
    [-]
    - ajb 43 minutes ago
      In theory, the algorithm could deal with that by choosing the commit at each step, which gives the best expected information gain; divided by expected test time. In most cases it would be more efficient just to cache the compiled output though.
      [-]
      - sfink 2 minutes ago
        This doesn't sound quite right, but I'm not sure why.
        Perhaps: a reasonable objective would be to say that for N bits of information, I would like to pick the test schedule that requires the least total elapsed time. If you have two candidate commits and a slow recompile time, it seems like your algorithm would do many repeats of commit A until the gain in information per run drops below the expected gain from B divided by the recompile time, then it would do many repeats of B, then go back to A, etc. So there are long runs, but you're still switching back and forth. You would get the same number of bits by doing the same number of test runs for each commit, but batching all of the A runs before all of the B runs.
        Then again: you wouldn't know how many times to run each in advance, and "run A an infinite number of times, then run B an infinite number of times" is clearly not a winning strategy. Even with a fixed N, I don't think you could figure it out without knowing the results of the runs in advance. So perhaps your algorithm is optimal?
        It still feels off. You're normalizing everything to bits/sec and choosing the maximum. But comparing an initial test run divided by the rebuild time vs a subsequent test run divided by a much faster time seems like you're pretending a discrete thing is continuous.
        I wish I could math good.
Retr0id 2 hours ago
Super cool!
A related situation I was in recently was where I was trying to bisect a perf regression, but the benchmarks themselves were quite noisy, making it hard to tell whether I was looking at a "good" vs "bad" commit without repeated trials (in practice I just did repeats).
I could pick a threshold and use bayesect as described, but that involves throwing away information. How hard would it be to generalize this to let me plug in a raw benchmark score at each step?
[-]
- ajb 32 minutes ago
  At a guess, you can reuse the entropy part, but you'd need to plug in a new probability distribution.
SugarReflex 1 hour ago
I hope this comment is not out of place, but I am wondering what the application for all this is? How can this help us or what does it teach us or help us prove? I am asking out of genuine curiosity as I barely understand it but I believe it has something to do with probability.
edit: thanks for the responses! I was not even familiar with `git bisect` before this, so I've got some new things to learn.
[-]
- Retr0id 51 minutes ago
  If you're git bisecting a flakey test, normally your only option is to run the test many times until you're ~certain it's either flakey or not flakey. If your test suite is slow, this can take a long time.
  One way to think about the tool presented is that it minimizes the number of times you'd need to run your test suite, to locate the bad commit.
  [-]
  - ajb 8 minutes ago
    It's worth noting that the analysis (although not this specific algorithm) applies in cases where there is a deterministic approach, but a nondeterministic algorithm is faster.
    For example, suppose you have some piece of hardware which you can interrogate, but not after it crashes. It crashes at a deterministic point. You can step it forward by any amount of steps, but only examine it's state if it did not crash. If it crashed, you have to go back to the start. (I call this situation "Finnegan Search", after the nursery rhyme which prominently features the line "poor old Finnegan had to begin again").
    The deterministic algorithm has you do an examination after every step. The nondeterministic algorithm has you choose some number of steps, accepting the risk that you have to go back to the start. The optimal number of steps (and thus the choice of algorithm) depends on the ratio of the cost of examination to the cost of a step. It can be found analytically as the expected information gain per unit time.
    (Either way the process is pretty annoying and considerable effort in hardware and software design has gone into providing ways to render it unnecessary, but it still crops up sometimes in embedded systems).
- augusto-moura 1 hour ago
  The writeup [1] linked on the README has examples and a better explanation
  [1]: https://hauntsaninja.github.io/git_bayesect.html
- curuinor 1 hour ago
  Bayesian inference is, to be overly simple, a way to write probabilistic if-statements and fit them from data. The "if" statement in this case is "if the bug is there...", and of course it's often the case that in actual software that if statement is probabilistic in nature. This thing does git bisect with a flaky bug with bayesian inference handling the flakiness to get you a decent estimate of where the bug is in the git history. It seems to be usable, or at least as usable as a Show HN thingy is expected to be.
davidkunz 2 hours ago
Useful for tests with LLM interactions.
skrun_dev 53 minutes ago
[dead]