What Is a Centipawn Advantage?

(win-vector.com)

63 points | by jmount 4 days ago

5 comments

knuckleheads 14 hours ago
I'm not a chess engine guy, but I've talked to some, and, from what I recall, there is a very interesting difference between an engine like Leela Chess Zero (lc0) and Stockfish. Stockfish internally calculates in centipawns while lc0 calculates in WDL's. Stockfish has a model they use that converts their centipawn calculation to WDL's, but it's not _really_ WDL of the position, it's just their estimate of it according to a probabilistic model. Same in reverse applies to lc0. Why I find this interesting is that it shows how they come from different generations, with Stockfish representing the old deterministic style with deep search, and lc0 being directly inspired by Alpha Zero and the new generation of engines based on neural nets. Stockfish has by now adopted the best of both worlds (deep search with a small neural net) and is the better for it, but I still think the developers of both engines banter over who is really producing the True WDL numbers for a given position.
For my part, I find that WDL is more amendable to interpretation. Being up 5 pawns worth of material sort of makes sense, but being told you have a 95% chance of winning makes more sense to me at first blush.
[-]
- n_e 12 hours ago
  > but I still think the developers of both engines banter over who is really producing the True WDL numbers for a given position
  In fact, stockfish's WDL is very rudimentary: it is a function of the centipawn evaluation of the position and the value of the remaining material.
  See https://github.com/official-stockfish/Stockfish/blob/a6d055d...
  [-]
  - knuckleheads 9 hours ago
    Yeah, it's still unclear to me why Stockfish produces WDL numbers beyond sometimes people ask for them.
- tarentel 13 hours ago
  To your last point, the centipawns thing doesn't make a whole lot of sense from an interpretation perspective because it is so shallow. WDL can give you much more insight into how tame or chaotic things are. A 1 pawn evaluated advantage with a 95% chance to win is wildly different from a similar evaluation and a 50% chance to win. The first position likely has an obvious tactic that leads to a win, the latter may require perfect play for 15 moves that only a computer can calculate.
  Also, from a computer perspective, a >= 1 pawn is usually sufficient for a computer to win 100% of the time so it's not really interesting and says very little about whether a person could win 100% of the time.
  [-]
  - knuckleheads 9 hours ago
    Yep, exactly. I spent a lot of time trying to figure out better ways for interpreting the evaluations of engines for https://www.schachzeit.com/en/openings/barnes-opening-with-d... and I ended up liking WDL much better than centipawns. A blunder defined in terms of decreasing your chance of winning by such and such percentage is, to me, a much better definition than a blunder losing such and such material. What does that mean? It makes sense to me now, but it took a long while.
    Relatedly, there is an interesting thing that lc0 has been doing as well, where it takes the contempt concept even further, and can beat you with queens odd. https://lczero.org/blog/2024/12/the-leela-piece-odds-challen... It assumes it is better than you and that it shouldn't just give up because you might be up a knight, rook or even queen.
- deklesen 13 hours ago
  FWIW, as an avid chess player, I find the "up 5 pawns" has more intuitive signal.
  [-]
  - knuckleheads 9 hours ago
    Was it intuitive when you were first learning to play? Or have you gotten used to understanding positions via centipawns?
    [-]
    - recursivecaveat 41 minutes ago
      It's always made as much sense to me as being up or down money in Monopoly, or points in basketball. Stating the W/L value of a position feels like an weird mixing of the present and future to me. Of course the centipawn value holds an implicit prediction of the future, but the indirection makes it more palatable.
    - deklesen 9 hours ago
      I learned chess when I was 5, and didnt have a chess computer in the first like 5 years and by then I have progressed quite far.. so i cannot really tell
      [-]
      - knuckleheads 9 hours ago
        Makes sense. I started learning how to play Chess when I was ~30 and my tutors were just chess engines, game reviews on chess.com and whatever books I found interesting enough to get through. I have fun, and that's all I'll ever have, no titles or anything. The centipawn stuff makes sense now, but it took a while.
salamo 6 hours ago
You'll also have some fun pinning down the difference between an "inaccuracy", a "mistake", and a "blunder". These are meaningful delineations for humans but not for a chess algorithm. Objectively, any amount of centipawn loss either changes the best possible outcome for the player or it does not.
So in practice, a drop in win probability greater than 14% is considered a blunder on Lichess.
For reference, lichess uses the following function to map centipawn advantage to the probability bar, derived from observed outcomes: https://github.com/lichess-org/lila/pull/11148
From an ML perspective, this is basically logistic regression with a single feature. However, once we leave the realm of theoretical centipawn value and begin to optimize predictive power, we could imagine adding in other things like the players' ELOs or time remaining per player, etc.
I think there are some interesting theoretical differences between predicted win probability derived from Stockfish CP and actual outcomes. As in, you could even imagine predicting positions where certain players struggle and steering them towards those positions. [0]
[0] https://www.youtube.com/watch?v=KgOC1D8wkyE
ramses0 16 hours ago
"""under perfect play all chess games be a the same single one outcome of the following (we just currently don’t know which one, “A” playing the white pieces):
Mr. A says, “I resign” or Mr. B says, “I resign” or Mr. A says, “I offer a draw,” and Mr. B replies, “I accept.” That is, under perfect play, each chess position is either a forced win, forced draw, or forced loss. The domain of a perfect chess position evaluation function is these three cases as symbols."""
There's an interesting point I've heard of in Backgammon, somewhat related to this statement. Modern Backgammon offers "the doubling cube" as a play option. https://en.wikipedia.org/wiki/Backgammon#Doubling_cube
...basically if you think you're going to win (aka: you have a 200 centi-pawn advantage), you can offer the doubling cube to your opponent (doubling the stakes of losing). If you're playing to win $5, and halfway through you think "yep, 90% chance I'm going to win this one...", you push the doubling cube to 2x (aka: $10 consequence), and kindof like poker your opponent has to evaluate whether it's "worth it" for them to stay in the game.
You might imagine a "2xELO penalty" where White takes a Queen with a Pawn, and then offers "2x, or I'm gonna beat 'ya!". If Black say "Naaah, you just activated my trap card!" and then either accepts "2x" or pushes back at "4x", then it becomes a little more like poker... you think you can beat me, then prove it!
Not that I'm suggesting changing the rules of Chess, but overall I'm really fascinated by the concept of formalized semi-out-of-band risk-taking to potentially end games early.
[-]
- qsort 16 hours ago
  The doubling cube works well in Backgammon because it is a rare example of a popular game with randomness, without hidden information (every information set contains exactly one node of the decision tree, if you want to get extremely technical,) and, critically, with "different endings" (normal win, gammon, backgammon.) Doubling decisions are especially interesting because while they're always objective (it could never be the case that perfect players disagree on the correct move, that requires nontrivial information sets,) it could be the case that:
  - it's correct for a player to double and for the other to accept;
  - it's correct for a player to double and for the other not to accept;
  - the position is "too good to double," because the equity from the probability of a double or triple game exceeds the advantage you'd get from a double;
  - all of the above being influenced by the match score, e.g. if I'm 3 points away from winning and you're 5 points away from winning, I could make different decisions than if it were the opposite.
  Chess has none of them, the doubling cube would be exclusively a psychological power play, something like "it's theoretically drawn but I don't think you can defend it," which is not a great game dynamic.
  In general, transplanting the doubling mechanic without a similarly rich context doesn't tend to work well.
- fernandopj 10 hours ago
  I'd like to point out that some online chess tournaments, mostly using rapid and bullet times, have a "berserk" option pre-start, where the player taking it halves their allotted time bank, for double the winning points.
  It's not a bluff, since information is still 100% open to both players, but it changes dynamic a lot.
- jmount 16 hours ago
  This is an important point. Thank you.
  Games like backgammon (that have betting and the doubling cube to continue), Go (which is calculated in stones), and bridge (again having points) have more natural intermediate scoring systems than chess.
  In my opinion the "winner takes all" aspect of chess is similar to what makes analyzing voting systems difficult. In a non game context: Aspnes, Beigel, Furst, and Rudich had some amazing work on how all or nothing calculation really changes things: https://www.cs.yale.edu/homes/aspnes/papers/stoc91voting.pdf .
  [-]
  - ramses0 15 hours ago
    For a while I really dug in to multiple player (and teams-of-players) ELO calculations. I got into an argument with my friend about whether second place was any better than last place... specifically in poker, but applicable to multiple games (imagine chinese checkers [race to finish], or carcassonne/ticket-to-ride [semi-hidden scoring until the end]).
    His POV was that "if you don't win, you lose" and my POV was "second place is better than last place". His response was: "if I play poker to get first place it's wildly different than playing for second or third place [and I may end up in last place wildly more often due to risk % or bad beats]"
    I've been more used to "climbing" type performance games (ie: last place => mid-field => second place => first place) and in my gut I wanted my ELO to reflect that (top-half players are better than bottom-half players), however his very valid point was that different games have different payout matrices (eg: poker is often "top-3 payout", and first may be 10x second or third).
    I think in my mind I've settled on EV-payout for multiplayer games should match the "game payout", and that maybe my gut is telling me the difference between "Casual ELO" (aka: top-half > bottom-half), and "Competitive ELO" (aka: only the winner gets paid).
  - hyperpape 15 hours ago
    Go is also winner take all. It's psychologically satisfying to have a big win, in the same way that it's psychologically satisfying to achieve a brilliant checkmate, but in any ordinary game or tournament (outside of certain gambling setups), a win by 1/2 point is the same as a win by 20+ points.
    [-]
    - aidenn0 13 hours ago
      Yes and no. One could say this of any game with points where the margin of victory doesn't affect long-term outcomes (e.g. most ball games).
      A win by 1/2 point or 20 points it suggests a very different relative skill between the two players. Similarly the custom of the stronger player playing white without komi suggests that the point differential matters.
      [-]
      - kuboble 11 hours ago
        Not necessarily. In go you often calculate the score and come up with a conclusion that by playing proper moves you will lose by a small margin.
        So instead you launch a desperate maneuver in a hope to either turn the game around or lose by 30 points.
        [-]
        aidenn0 6 hours ago
        I see what you're saying; this is true for any game scored win/loss. Even gridiron football if you're down by 4 points with time almost out you won't kick a field goal (worth 3 points).
paulddraper 16 hours ago
Surprised it didn’t mention until the very end, but since chess is deterministic, there is no objective probability.
Every position is objectively plus infinity, minus infinity, or zero.
The “advantage” is an engine-specific notion that helps prune search paths.
Some chess engines don’t even evaluate an advantage.
[-]
- kuboble 16 hours ago
  There are also objective measures for more fine position evaluation.
  For winning/drawn positions: "What is the smallest program that can guarantee your side to win/draw" probably adding some time constraint.
  [-]
  - jmount 15 hours ago
    That is a neat variation.
  - paulddraper 12 hours ago
    Measuring the size of a model that produces a win?
    Theoretically valid, but that's not going to be a very useful/diable.
    [-]
    - kuboble 11 hours ago
      No, but in practice centipawns reported by the imperfect engine are good.
      But I want to point out that in theory there is also something more than pure win/ lose/ draw with prefect play.
  - im3w1l 15 hours ago
    I think program size is probably not a good measure since any heuristic you can put in could be discovered at runtime with a metaheuristic that searches for good heuristics. Time and memory make more sense.
- janalsncm 5 hours ago
  Yeah it’s confusing because there are really three “evaluations” you could have for a position
  1) god-mode 1/0/-1 which you could argue is the “true” position 2) engine centipawns which help the search algorithm 3) human evaluation which would distinguish between two positions in terms of a subjective difficulty
  For example, two positions might be 0.0 on the eval bar but one position is an obvious draw and in the other position one player has to walk a tightrope of precise moves to draw. Just because that’s obvious to a computer doesn’t mean a human can easily draw the second position.
- monktastic1 12 hours ago
  Yes, this is a huge omission, because it means that as engines improve, the stated advantage becomes increasingly meaningless to humans (which is the opposite of what we may intuitively expect).
  What I really want to know as a player is how easy it will be for me to win from this position against someone of my opponent's strength, which is admittedly a very hard thing to define, let alone compute.
  [-]
  - TurdF3rguson 8 hours ago
    How likely you mean. It's the same effort to win a game as to lose a game.
- TZubiri 15 hours ago
  Not only it is mentioned, but it's mentioned that it was mentioned as early as 1950, by none other than Claude Shannon:
  >""under perfect play all chess games be a the same single one outcome of the following (we just currently don’t know which one, “A” playing the white pieces): Mr. A says, “I resign” or Mr. B says, “I resign” or Mr. A says, “I offer a draw,” and Mr. B replies, “I accept"
throwaway198846 15 hours ago
fractional tempi sounds fascinating