SQLite's Durability Settings Are a Mess

(agwa.name)

89 points | by ciconia 4 hours ago

8 comments

  • layer8 1 hour ago
    Durability also requires the file system implementation and the disk to do the right thing on fsync, which, if I recall past discussions correctly, isn’t a given.
  • NewsaHackO 2 hours ago
    > By default, SQLite is not durable, because the default value of journal_mode is DELETE, and the default value of synchronous is FULL, which doesn't provide durability in DELETE mode.

    From the documentation, it seems like synchronous being FULL does provide durability of the database in DELETE mode, as FULL means it calls fsync after the transaction is completed. I think you may be confusing durability of the journal file with durability of the database. I don't think WAL can ever really have a durable transaction; it is essentially based on leaving a transaction open until it gets "check-pointed" or actually committed to the database file.

    • mlyle 2 hours ago
      > I don't think WAL can ever really have a durable transaction; it is essentially based on leaving a transaction open until it gets "check-pointed" or actually committed to the database file.

      In general: WAL means you write the transaction to the WAL, fsync (in sqlite, this depends upon the sync mode], and then return it's done to the application. The transaction is then durable: even if the database crashes, the contents of the WAL will be applied to the database file.

      Checkpointing later just lets you throw away that part of the WAL and not have to replay as much to the database file.

    • agwa 2 hours ago
      My understanding of DELETE mode is that the transaction is not committed until the rollback journal file is deleted - if the rollback journal is present when sqlite opens a database, it applies the rollback journal to undo the changes that the transaction made. See https://www.sqlite.org/atomiccommit.html#1_deleting_the_roll...

      If the directory containing the rollback journal is not fsynced after the journal file is deleted, then the journal file might rematerialize after a power failure, causing sqlite to roll back a committed transaction. And fsyncing the directory doesn't seem to happen unless you set synchronous to EXTRA, per the docs cited in the blog post.

      • NewsaHackO 2 hours ago
        >if the rollback journal is present when sqlite opens a database, it applies the rollback journal to undo the changes that the transaction made. See https://www.sqlite.org/atomiccommit.html#1_deleting_the_roll...

        I don't follow. How would fsyncing the rollback journal affect the durability of the actual database? Do you actually think that the database would reapply an already committed journal whose ID in the header already indicates that the transaction was committed, when the database is already consistent? I really think you should re-review the definition of durability of a database, especially before saying the creator of SQLite is incorrect about its implementation.

        • agwa 1 hour ago
          I'm pretty sure I understand what durability means; the definition is not hard - https://en.wikipedia.org/wiki/Durability_(database_systems)

          It's possible I've misunderstood how DELETE mode works. But here's the thing - I shouldn't have to understand how DELETE mode works to know what SQLite setting I need to use to get durability. Unfortunately, the SQLite docs don't clearly say what guarantees each setting provides - instead they talk about about what SQLite does when you choose the setting, leaving the reader to try to figure out if those actions provide durability. And the docs really make it seem like you need synchronous=EXTRA in DELETE mode to get durability, for the reasons explained above.

          This is a docs problem; I'm not saying SQLite is buggy.

          • NewsaHackO 1 hour ago
            This may just be a expectations difference then. I would fully expect a developer to read the docs and know how a settings works to know what guarantees it has.
        • int_19h 1 hour ago
          It's specifically about fsyncing journal deletion. The problem isn't that it would reapply it if it was already used to rollback. Rather, the problem is that if you commit, and that commit has succeeded (and so your app believes that it has written the data and might e.g. perform some other actions on it), the deletion of the now-unneeded journal might not be flushed to disk in event of power loss or similar. So when you start the app again, SQLite sees said rollback journal, and - since it would be considered "hot" - applies it, effectively reverting the transaction that was supposedly already committed.

          FWIW I don't think it's wrong per se. The article links to a HN comment in which Richard Hipp explains why this is the default behavior, and it does make sense: https://news.ycombinator.com/item?id=45014296. At the same time, clearly, the definition of "durable" here could use some clarification.

          • agwa 1 hour ago
            Yes, that's exactly right.

            Note that the comment by Richard Hipp is justifying why WAL mode is not durable by default. It's a completely reasonable explanation, and would be for DELETE mode too, yet his comment claims that DELETE mode is durable by default, which I can't reconcile with the docs.

          • NewsaHackO 1 hour ago
            >So when you start the app again, SQLite sees said rollback journal, and - since it would be considered "hot" - applies it, effectively reverting the transaction that was supposedly already committed.

            Guys. The journal would not be a hot journal though, as the hot journal selection only applies if the database is in a inconsistent state. Otherwise, the database knows from the ID of the journal not to reapply an already applied rollback journal. The process you are talking about ONLY happens when the journal database has been corrupted state, and it has to try and file a file to help recover the database.

            • agwa 26 minutes ago
              OK, I just tested it:

              In terminal 1, I created a database and added a table to it:

                $ sqlite3 testdb
                sqlite> create table test (col int);
              
              In terminal 2, I attached gdb to sqlite3 and set a breakpoint on unlink:

                $ gdb sqlite3 `pidof sqlite3`
                (gdb) b unlink
                (gdb) c
              
              Back in terminal 1, I inserted data into the table:

                sqlite> insert into test values(123);
              
              In terminal 3, I saved a copy of testdb-journal:

                $ cp testdb-journal testdb-journal.save
              
              Then in terminal 2, I resumed executing sqlite3:

                (gdb) c
              
              In terminal 1, the INSERT completed without error.

              Back in terminal 3, I sent SIGKILL to sqlite3, simulating a power failure:

                $ killall -9 sqlite3
              
              I then restored testdb-journal, simulating what could happen after a power failure when the parent directory is not fsynced:

                $ mv testdb-journal.save testdb-journal
              
              I then opened testdb again and ran `SELECT * FROM test` and it returned zero rows.

              This proves int_19h and I are right - if the journal file comes back, SQLite will apply it and roll back a committed transaction.

              I then confirmed with strace that, as the documentation says, the directory is only fsynced after unlink when synchronous=EXTRA. It doesn't happen with synchronous=FULL. So you need synchronous=EXTRA to get durability in DELETE mode.

            • int_19h 1 hour ago
              > Otherwise, the database knows from the ID of the journal not to reapply an already applied rollback journal.

              But it's not "already applied", that's the whole point. The transaction was committed, not rolled back, so the changes in transaction were persisted to disk and the journal was just thrown away. If it magically reappears again, how is SQLite supposed to know that it needs to be discarded again rather than applied to revert the change?

            • agwa 1 hour ago
              The docs list 5 conditions that all must be satisfied for the journal to be considered hot: https://www.sqlite.org/atomiccommit.html#_hot_rollback_journ...

              I believe they would all be satisfied.

              I don't see any mention of checking IDs. Not saying you're wrong - I think the docs could very well be wrong - but could you provide a citation for that behavior?

              • NewsaHackO 1 hour ago
                Please read the entire document instead of just picking out sections; you will then be able to see where your misconceptions are occurring. You have attempted to make this same point three times, so I will say it for a third time; that section is about CORRUPTED databases, not database that are consistent after fsync.
                • agwa 1 hour ago
                  I read the whole document. It doesn't mention IDs anywhere.

                  If you're not going to provide citations for your claims, yet criticize me for "picking out sections" when I provide citations, then continuing this conversation won't be productive.

  • diekhans 1 hour ago
    It seems like a bug report on what is not clear in the documentation would be highly useful.
  • tiffanyh 2 hours ago
    SQLite is an incredible piece of software, and its commitment to backward compatibility is deeply admirable. But that same promise has also become a limitation.

    v3.0 was first released in 2004—over 20 years ago—and the industry has changed dramatically since then.

    I can’t help but wish for a “v4.0” release: one that deliberately breaks backward compatibility and outdated defaults, in order to offer a cleaner, more modern foundation.

    Note: I'm not asking for new functionality per se. But just a version of SQLite that defaulted to how it should be used, deployed in 2025.

  • d1l 2 hours ago
    This is disingenuous and probably was written this way for HN cred and clicks. Sqlite's test suite simulates just about every kind of failure you can imagine - this document is worth reading if you have any doubts: https://www.sqlite.org/atomiccommit.html
    • eatonphil 56 minutes ago
      > Sqlite's test suite simulates just about every kind of failure you can imagine

      The page you link even mentions scenarios they know about that do happen and that they still assume won't happen. So even sqlite doesn't make anywhere near as strong a claim as you make.

      > SQLite assumes that the operating system will buffer writes and that a write request will return before data has actually been stored in the mass storage device. SQLite further assumes that write operations will be reordered by the operating system. For this reason, SQLite does a "flush" or "fsync" operation at key points. SQLite assumes that the flush or fsync will not return until all pending write operations for the file that is being flushed have completed. We are told that the flush and fsync primitives are broken on some versions of Windows and Linux. This is unfortunate. It opens SQLite up to the possibility of database corruption following a power loss in the middle of a commit. However, there is nothing that SQLite can do to test for or remedy the situation. SQLite assumes that the operating system that it is running on works as advertised. If that is not quite the case, well then hopefully you will not lose power too often.

    • agwa 2 hours ago
      That document addresses atomicity, not durability, and is thus non-responsive to my concerns.
  • topspin 3 hours ago
    [flagged]
    • 3eb7988a1663 3 hours ago
      It is the weekend. Most people use this site to avoid doing work.

      Edit: whoops, it is Friday! Gave myself a long weekend, and was just default thinking it is Saturday.

      • lyjackal 3 hours ago
        it's Friday, and I'm avoiding doing work
    • meindnoch 3 hours ago
      [flagged]
      • anotherhue 3 hours ago
        I'm laughing but mostly crying.

        Reliability is a dirty word, because it almost always comes at the cost of 'growth'.

  • bawolff 2 hours ago
    I can't help but feel that the difference to other DBs is that they just don't have these knobs or tell you at all.
  • cenamus 3 hours ago
    So this article ask exactly the same as the reply do Dr Hipps comment, just in a 1000 words, instead of 10? Whether the docs are out of sync?
    • topspin 2 hours ago
      > Whether the docs are out of sync?

      Were this a one off, you would have a point. It isn't, however. My experience over many years has been that you can't ever be certain about what is actually going on, based on the documentation alone, and that you wind up in Reddit and Stack Overflow and a plethora of blog posts attempting to figure it out. With LLMs, we have only more sources of contradictory and chronically obsolescent input.

      There is an actual problem here. However I can see that, based on the contributions from the SQLite downmod mafia, this talk isn't welcome, so I'm off to some other thing. Have a nice weekend, I suppose.

      • mtmail 2 hours ago
        > the SQLite downmod mafia

        Oh, come on. There's no open or secret attempt at censoring talk about sqlite on HN. (The story is #11 on the frontpage the minute the comment was made.)