Test Postgres in Python Like SQLite

(github.com)

143 points | by wey-gu 21 hours ago

21 comments

  • WhyNotHugo 11 hours ago
    I can't imagine a wasm running inside nodejs being faster than native code that's been optimised for decades.

    > No PostgreSQL install needed—just Node.js

    postgres is 32MB, nodejs is 63MB. I know that 31MB isn't a huge deal for most folks, but it's hard to see as "doesn't require postgres" as a selling point when you need something else that's twice the size instead.

    • TheTaytay 8 hours ago
      This isn’t because of the size. Frankly, this is appealing because we already use both python and node package managers, so not needing to reach for another binary install mechanism is really appealing.
    • wey-gu 9 hours ago
      Haha yeah, I put a (I know) in the readme

      > Effortless Setup: No PostgreSQL install needed—just Node.js(I know)!

      Was just to have kind of SQLite dx in 1 hour thus did so.

      And then I thought why not open source it?

      Maybe in v2 I could abstract actual binary with same dx

  • jauco 16 hours ago
    There’s also https://testcontainers.com/ not sure about the speed difference but testcontainers has never felt slow to me when using it from node js unittests.
    • hardwaresofton 16 hours ago
      Ding ding ding. Testcontainers is a fanatstic way to write the most important tests (arguably) for your app. Don't test E2E with a mock, just use a real database.

      If that feels hard to you (to set up your app pointing to another DB, run a single E2E-testable part of your app with it's own DB connection, etc), fix that.

      • 9rx 4 hours ago
        If you feel you need E2E tests, that means that you have implementation details leaking across different areas of your application. That's what you really need to fix. Bandaids may be found useful, but even better is not cut yourself in the first place.
        • glookler 3 hours ago
          If you just know when you aren't cutting yourself then why do you need tests?
          • 9rx 2 hours ago
            Normally you would write tests at the user boundaries (like a set of APIs) for the purposes of documenting the behavioural intent for the user as to not leave unspecified behaviour. While you could theoretically do that in Word instead, testing, as we call it, offers a couple of advantages:

            1. It is written in code, which is what coders want to read. There is nothing worse than having to read documentation in English (or insert your natural language of choice).

            2. It is able to automatically prove that what is documented is true. Programmers tend to have a penchant for lying if you let them, so this is a surprisingly big advantage.

            You might not need them per se, but unless you are trying to be an asshole, it is socially expected that you will be kind to your users by filling them in on important details; to not leave them guessing.

            Now, if your application only has one user boundary (a.k.a. spaghetti code) then any tests you have will technically end up being E2E tests by there being only one end (although calling that E2E would be rather silly), that is true, but if that's what your code looks like then you're going to have trouble connecting to a test database anyway, and per the earlier comment you need to fix that for the sake of E2E tests. But if you reach that point, you may as well go all the way and fix it properly.

      • maartenh 16 hours ago
        Yep. I do create fresh db's from a fixture db using postgres's ability to create a database from a template. Very quick, always correct.
    • kinow 11 hours ago
      That's exactly what we are using for our tests in a new pull request to add support to Postgres, https://github.com/BSC-ES/autosubmit/pull/2187

      The last GH Actions jobs with SQLite and Python 3.9 took 3m 41s, and the same tests with Postgres took 4m 11s. Running a single test locally in PyCharm also executes in less than 1 second. You notice some bootstrap happening, but once the container image is downloaded locally, it's really quite fast.

    • wey-gu 16 hours ago
      yeah, w/o py-pglite attempt this should be the only approach, the pglite ideally could make it more flexibly/lightweight in unittest cases, but as you mentioned it's never felt slow, it should be fine to working on it.

      And actually, more e2e cases I think it's way better to not use the lite backend.

      the non-container solutions would do more like the lifecycle mgmt/isolated env prep/tear-down with elegantly designed abstractions. While I think similar abstractions could be done on top of containers.

      Maybe we ideally could have unified abstractions on both container-based, wasm evantually to boost dx yet with different expectation of speed vs compatibility.

    • whalesalad 4 hours ago
      It's remarkably easy to integrate. We have a tests/db module with a single conftest.py file that does everything. Every test suite run spins up a fresh container (no state), runs all migrations against it to prepare it for duty, and then the test suite continues. You could do this on a per-test basis for ultra isolation, but this works for us. That change would simply be to modify the `scope` value.

      Also a convenient `conn` fixture at the end which allows any test in the module to get a new db connection.

      linky: https://gist.github.com/whalesalad/6ecd284460ac3836a6c2b9ca8...

    • globular-toast 15 hours ago
      For Python specifically I use pytest-docker: https://pypi.org/project/pytest-docker/
  • benpacker 10 hours ago
    I have this setup and integrated for Node/Bun -

    This is an example of a unit test of an API route on a fully isolated WASM backed Postgres - very few lines of code, and all your API unit tests can run fully in parallel without any shared state: https://github.com/ben-pr-p/bprp-react-router-starter/blob/m...

    This is all of the code needed to use Postgres in prod and PGLite in test/dev: https://github.com/ben-pr-p/bprp-react-router-starter/blob/m...

    • peterldowns 3 hours ago
      Thanks for sharing this, I'm not a Node/Bun dev but it's quite understandable. One question: does each test/database need to wait for migrations to run, or are you somehow using template databases to run them once and then quickly clone a new database for each test?
    • wey-gu 9 hours ago
      wow thanks! Should I use bun instead of node now?
      • benpacker 7 hours ago
        It’s the same - just saying it works in both.

        I like it because I can do full stack development, including the database, with a single system level dependency (Bun).

        • wey-gu 6 hours ago
          ha, thanks, make sense
  • samwillis 15 hours ago
    Awesome work Wey! Love that you're building this!

    I work on PGlite, we have an experimental WASI build that can run in a WASI runtime, it should enable dropping the Node requirement. It lacks error handling at the moment (WASM has no long jump, and Postgres uses that for error handling - Emscripten has hacks that fix this via JS), and so we haven't yet pushed it far.

    Do ping me on our Discord and I can point you towards it.

    Happy to answers any PGlite questions while I'm here!

  • ptx 11 hours ago
    How does running PostgreSQL compiled to WebAssembly reduce "the overhead of a full PostgreSQL installation"? Couldn't a native version be configured similarly and avoid the additional overhead of WebAssembly, Node.js and npm?
    • CamouflagedKiwi 10 hours ago
      Yes, it can. It's not especially hard to start up a Postgres instance, and with a couple of config tweaks you can improve the startup time. I've had this working nicely at a previous job, it's under a couple of seconds to start.
  • veggieroll 7 hours ago
    Yo, this installs npm packages at runtime. Very not cool IMO. You should disclose this prominently in the README.

    This is a nice project idea. But, you should use a Python WASM interpreter to run the PostgreSQL WASM.

  • laurencerowe 17 hours ago
    This is running pglite in a node subprocess. Why not just run Postgres itself as a subprocess with a data directory in a tempdir?
    • wey-gu 16 hours ago
      I think the ultimate version in such use case would be carefully wire-up the baremetal one with ad-hoc in-mem-disk or tempdir :), this could be a future backend of py-pglite(planned in v2).

      For now, it's more accessible for me to hack it in hours and it works.

    • isoprophlex 16 hours ago
      Exactly; in the past I've had reasonable success with test fixtures based on temp dirs and a templated docker compose file. Just needs docker in the environment, which is not too far fetched.
  • selimnairb 12 hours ago
    I just use pytest-docker-compose, then I don’t need to bother with NPM. I usually don’t like “magic”, but pytest’s fixtures are so powerful I’m okay with a little bit of “magic”.
  • perrygeo 9 hours ago
    For Clojure and Java apps, check out Zonky (https://github.com/zonkyio/embedded-postgres). It provides a similar experience on the JVM, but instead of containers or WASM, you're running an embedded native binary.
    • wey-gu 9 hours ago
      Thanks! This is the ultimate shape I am going to pursue!
  • murkt 18 hours ago
    I wonder if it’s possible to compile Postgres directly into a Python extension instead of WASM. Just import it and go forth, no node dependency, nothing.
    • samwillis 15 hours ago
      This is something I've explored as part of my work on PGlite. It's possible but needs quite a bit of work, and would come with some limitations until Postgres upstream make some changes.

      You will need to use an unrolled main loop similar to what we have in PGlite, using the "single user mode" (you likely don't want to fork sub processes like a normal Postgres). The problems come with then trying to run multiple instances in a single process, Postgres makes heavy use of global vars for state (they can as they fork for each session), these would clash if you had multiple instances. There is work happening to possibly make Postgres multi-threaded, this will solve that problem.

      The long term ambition of the PGlite project is to create a libpglite, a low level embedded Postgres with a C api, that will enable all this. We quite far off though - happy to have people join the project to help make it happen!

    • wey-gu 18 hours ago
      wow, thanks! it should be feasible! and as I recall there are such thing from some databases(chromadb, milvus-lite) in the py-first communities.

      we could think big to someday do that within py-pglite project actually.

      let me put it as the roadmap of v2(much more work to do!)

      • heinrichhartman 16 hours ago
        Can you explain this? What is the compilation target? What is the compiler? How does being a "python extension" help?
        • wey-gu 16 hours ago
          the target would be just baremetal binary with some changes needed and the purpose was just to be python dx optimized
  • wey-gu 20 hours ago
  • lucideer 12 hours ago
    This seems great but:

    > The library automatically manages PGlite npm dependencies.

    I'm sorry what? Please don't do this. There is no way you can do this in a way that:

    (a) is compatible with every pre-existing in-repo NodeJS setup

    (b) plays nicely with all SCA automation configurations (Dependabot, etc.)

    ---

    Edit:

    After closer inspection there's a config[0] to disable this which defaults to True, but no documentation on how to manage required Node deps when setting that value to false.

    I would strongly suggest the authors default this to False with accompanying docs in the README

    Overall though I do like the simplicity & readability of the codebase - it makes these gotchas really easy to find & verify even without docs - will definitely be using this.

    [1] https://github.com/wey-gu/py-pglite/blob/b821cf2cfeb2e4bc58b...

  • buremba 8 hours ago
    Amazing, this is what I was trying to find for the last few weeks! I wonder if it's possible to run WASM directly from Python instead of the subprocess approach, though?
  • wg0 14 hours ago
    What a library! Just five years ago couldn't have imagined Postgress could be running in browser.

    And now this.

    Going to use right away.

  • jodiug 15 hours ago
    This is great, running tests with a database covers a ton of behaviors that are otherwise hard to test, and having the database in memory allows CI/CD to run those tests without downloading and building containers.

    It does depend on SQLAlchemy. Can this also be used with asyncpg? Is it on the roadmap?

  • ArcaneMoose 16 hours ago
    I recently struggled with PGlite for a while and ended up using full postgres via https://github.com/leinelissen/embedded-postgres which worked like a charm!
    • hardwaresofton 16 hours ago
      TIL this exists, thank you! It's seems obvious in retrospect.

      Would love to see a benchmark between this, pglite and testcontainers.

    • wey-gu 16 hours ago
      Thanks, duno this before and it looks like the ultimate shape py-pglite in js world~~
  • ewhauser421 17 hours ago
    Nice! Ive been wanting to try this ever since PGLite came out.

    Anything that won’t work if you tried this as a drop in replacement for a full PG instance in tests? Maybe extensions? Any benchmarks?

  • wey-gu 21 hours ago
    I need this in my use case, so built it after downstream e2e verified working!
  • gleenn 18 hours ago
    I would love something like this for databases like Snowflake where there isn't an easy way to test without hitting a prod SF server
  • ForHackernews 11 hours ago
    Is this about testing Python that uses Postgres?

    Because if you're really interested in testing postgres you can test PG in PG: https://pgtap.org/

    • jacurtis 5 hours ago
      It is for testing python projects that connect to postgres.

      So often what you do is when unit testing, you need to test CRUD-type operations between the application and the database. So you generally have to spin up a temporary database just to delete it at the end of the tests. Commonly this is done with SQLite during testing, even if you built the application for postgres in production. Because it is fast and easy and you can interact with it as a file, not requiring connection configurations and added bloat.

      But then sometimes your app gets complicated enough that you are using Postgres features that SQLite doesn't have comparables to. So now you need a PG instance just for testing, which is a headache.

      So this project bridges the gap. Giving you the feature completedness and consistency across envs of using postgres, but with the conveniences of using SQLite.

  • anikozxx 16 hours ago
    [dead]