I can't imagine a wasm running inside nodejs being faster than native code that's been optimised for decades.
> No PostgreSQL install needed—just Node.js
postgres is 32MB, nodejs is 63MB. I know that 31MB isn't a huge deal for most folks, but it's hard to see as "doesn't require postgres" as a selling point when you need something else that's twice the size instead.
This isn’t because of the size. Frankly, this is appealing because we already use both python and node package managers, so not needing to reach for another binary install mechanism is really appealing.
There’s also https://testcontainers.com/ not sure about the speed difference but testcontainers has never felt slow to me when using it from node js unittests.
Ding ding ding. Testcontainers is a fanatstic way to write the most important tests (arguably) for your app. Don't test E2E with a mock, just use a real database.
If that feels hard to you (to set up your app pointing to another DB, run a single E2E-testable part of your app with it's own DB connection, etc), fix that.
If you feel you need E2E tests, that means that you have implementation details leaking across different areas of your application. That's what you really need to fix. Bandaids may be found useful, but even better is not cut yourself in the first place.
Normally you would write tests at the user boundaries (like a set of APIs) for the purposes of documenting the behavioural intent for the user as to not leave unspecified behaviour. While you could theoretically do that in Word instead, testing, as we call it, offers a couple of advantages:
1. It is written in code, which is what coders want to read. There is nothing worse than having to read documentation in English (or insert your natural language of choice).
2. It is able to automatically prove that what is documented is true. Programmers tend to have a penchant for lying if you let them, so this is a surprisingly big advantage.
You might not need them per se, but unless you are trying to be an asshole, it is socially expected that you will be kind to your users by filling them in on important details; to not leave them guessing.
Now, if your application only has one user boundary (a.k.a. spaghetti code) then any tests you have will technically end up being E2E tests by there being only one end (although calling that E2E would be rather silly), that is true, but if that's what your code looks like then you're going to have trouble connecting to a test database anyway, and per the earlier comment you need to fix that for the sake of E2E tests. But if you reach that point, you may as well go all the way and fix it properly.
The last GH Actions jobs with SQLite and Python 3.9 took 3m 41s, and the same tests with Postgres took 4m 11s. Running a single test locally in PyCharm also executes in less than 1 second. You notice some bootstrap happening, but once the container image is downloaded locally, it's really quite fast.
yeah, w/o py-pglite attempt this should be the only approach, the pglite ideally could make it more flexibly/lightweight in unittest cases, but as you mentioned it's never felt slow, it should be fine to working on it.
And actually, more e2e cases I think it's way better to not use the lite backend.
the non-container solutions would do more like the lifecycle mgmt/isolated env prep/tear-down with elegantly designed abstractions. While I think similar abstractions could be done on top of containers.
Maybe we ideally could have unified abstractions on both container-based, wasm evantually to boost dx yet with different expectation of speed vs compatibility.
It's remarkably easy to integrate. We have a tests/db module with a single conftest.py file that does everything. Every test suite run spins up a fresh container (no state), runs all migrations against it to prepare it for duty, and then the test suite continues. You could do this on a per-test basis for ultra isolation, but this works for us. That change would simply be to modify the `scope` value.
Also a convenient `conn` fixture at the end which allows any test in the module to get a new db connection.
This is an example of a unit test of an API route on a fully isolated WASM backed Postgres - very few lines of code, and all your API unit tests can run fully in parallel without any shared state: https://github.com/ben-pr-p/bprp-react-router-starter/blob/m...
Thanks for sharing this, I'm not a Node/Bun dev but it's quite understandable. One question: does each test/database need to wait for migrations to run, or are you somehow using template databases to run them once and then quickly clone a new database for each test?
I work on PGlite, we have an experimental WASI build that can run in a WASI runtime, it should enable dropping the Node requirement. It lacks error handling at the moment (WASM has no long jump, and Postgres uses that for error handling - Emscripten has hacks that fix this via JS), and so we haven't yet pushed it far.
Do ping me on our Discord and I can point you towards it.
Happy to answers any PGlite questions while I'm here!
How does running PostgreSQL compiled to WebAssembly reduce "the overhead of a full PostgreSQL installation"? Couldn't a native version be configured similarly and avoid the additional overhead of WebAssembly, Node.js and npm?
Yes, it can. It's not especially hard to start up a Postgres instance, and with a couple of config tweaks you can improve the startup time. I've had this working nicely at a previous job, it's under a couple of seconds to start.
I think the ultimate version in such use case would be carefully wire-up the baremetal one with ad-hoc in-mem-disk or tempdir :), this could be a future backend of py-pglite(planned in v2).
For now, it's more accessible for me to hack it in hours and it works.
Exactly; in the past I've had reasonable success with test fixtures based on temp dirs and a templated docker compose file. Just needs docker in the environment, which is not too far fetched.
I just use pytest-docker-compose, then I don’t need to bother with NPM. I usually don’t like “magic”, but pytest’s fixtures are so powerful I’m okay with a little bit of “magic”.
For Clojure and Java apps, check out Zonky (https://github.com/zonkyio/embedded-postgres). It provides a similar experience on the JVM, but instead of containers or WASM, you're running an embedded native binary.
I wonder if it’s possible to compile Postgres directly into a Python extension instead of WASM. Just import it and go forth, no node dependency, nothing.
This is something I've explored as part of my work on PGlite. It's possible but needs quite a bit of work, and would come with some limitations until Postgres upstream make some changes.
You will need to use an unrolled main loop similar to what we have in PGlite, using the "single user mode" (you likely don't want to fork sub processes like a normal Postgres). The problems come with then trying to run multiple instances in a single process, Postgres makes heavy use of global vars for state (they can as they fork for each session), these would clash if you had multiple instances. There is work happening to possibly make Postgres multi-threaded, this will solve that problem.
The long term ambition of the PGlite project is to create a libpglite, a low level embedded Postgres with a C api, that will enable all this. We quite far off though - happy to have people join the project to help make it happen!
> The library automatically manages PGlite npm dependencies.
I'm sorry what? Please don't do this. There is no way you can do this in a way that:
(a) is compatible with every pre-existing in-repo NodeJS setup
(b) plays nicely with all SCA automation configurations (Dependabot, etc.)
---
Edit:
After closer inspection there's a config[0] to disable this which defaults to True, but no documentation on how to manage required Node deps when setting that value to false.
I would strongly suggest the authors default this to False with accompanying docs in the README
Overall though I do like the simplicity & readability of the codebase - it makes these gotchas really easy to find & verify even without docs - will definitely be using this.
Amazing, this is what I was trying to find for the last few weeks! I wonder if it's possible to run WASM directly from Python instead of the subprocess approach, though?
This is great, running tests with a database covers a ton of behaviors that are otherwise hard to test, and having the database in memory allows CI/CD to run those tests without downloading and building containers.
It does depend on SQLAlchemy. Can this also be used with asyncpg? Is it on the roadmap?
aha, SF is pure managed service, thus 100% local/or even lite-local of SF seems not feasible, maybe clickhouse/databend could be consider to enable such flexibility of lightweight testing?
It is for testing python projects that connect to postgres.
So often what you do is when unit testing, you need to test CRUD-type operations between the application and the database. So you generally have to spin up a temporary database just to delete it at the end of the tests. Commonly this is done with SQLite during testing, even if you built the application for postgres in production. Because it is fast and easy and you can interact with it as a file, not requiring connection configurations and added bloat.
But then sometimes your app gets complicated enough that you are using Postgres features that SQLite doesn't have comparables to. So now you need a PG instance just for testing, which is a headache.
So this project bridges the gap. Giving you the feature completedness and consistency across envs of using postgres, but with the conveniences of using SQLite.
> No PostgreSQL install needed—just Node.js
postgres is 32MB, nodejs is 63MB. I know that 31MB isn't a huge deal for most folks, but it's hard to see as "doesn't require postgres" as a selling point when you need something else that's twice the size instead.
> Effortless Setup: No PostgreSQL install needed—just Node.js(I know)!
Was just to have kind of SQLite dx in 1 hour thus did so.
And then I thought why not open source it?
Maybe in v2 I could abstract actual binary with same dx
If that feels hard to you (to set up your app pointing to another DB, run a single E2E-testable part of your app with it's own DB connection, etc), fix that.
1. It is written in code, which is what coders want to read. There is nothing worse than having to read documentation in English (or insert your natural language of choice).
2. It is able to automatically prove that what is documented is true. Programmers tend to have a penchant for lying if you let them, so this is a surprisingly big advantage.
You might not need them per se, but unless you are trying to be an asshole, it is socially expected that you will be kind to your users by filling them in on important details; to not leave them guessing.
Now, if your application only has one user boundary (a.k.a. spaghetti code) then any tests you have will technically end up being E2E tests by there being only one end (although calling that E2E would be rather silly), that is true, but if that's what your code looks like then you're going to have trouble connecting to a test database anyway, and per the earlier comment you need to fix that for the sake of E2E tests. But if you reach that point, you may as well go all the way and fix it properly.
The last GH Actions jobs with SQLite and Python 3.9 took 3m 41s, and the same tests with Postgres took 4m 11s. Running a single test locally in PyCharm also executes in less than 1 second. You notice some bootstrap happening, but once the container image is downloaded locally, it's really quite fast.
And actually, more e2e cases I think it's way better to not use the lite backend.
the non-container solutions would do more like the lifecycle mgmt/isolated env prep/tear-down with elegantly designed abstractions. While I think similar abstractions could be done on top of containers.
Maybe we ideally could have unified abstractions on both container-based, wasm evantually to boost dx yet with different expectation of speed vs compatibility.
Also a convenient `conn` fixture at the end which allows any test in the module to get a new db connection.
linky: https://gist.github.com/whalesalad/6ecd284460ac3836a6c2b9ca8...
This is an example of a unit test of an API route on a fully isolated WASM backed Postgres - very few lines of code, and all your API unit tests can run fully in parallel without any shared state: https://github.com/ben-pr-p/bprp-react-router-starter/blob/m...
This is all of the code needed to use Postgres in prod and PGLite in test/dev: https://github.com/ben-pr-p/bprp-react-router-starter/blob/m...
I like it because I can do full stack development, including the database, with a single system level dependency (Bun).
I work on PGlite, we have an experimental WASI build that can run in a WASI runtime, it should enable dropping the Node requirement. It lacks error handling at the moment (WASM has no long jump, and Postgres uses that for error handling - Emscripten has hacks that fix this via JS), and so we haven't yet pushed it far.
Do ping me on our Discord and I can point you towards it.
Happy to answers any PGlite questions while I'm here!
This is a nice project idea. But, you should use a Python WASM interpreter to run the PostgreSQL WASM.
For now, it's more accessible for me to hack it in hours and it works.
You will need to use an unrolled main loop similar to what we have in PGlite, using the "single user mode" (you likely don't want to fork sub processes like a normal Postgres). The problems come with then trying to run multiple instances in a single process, Postgres makes heavy use of global vars for state (they can as they fork for each session), these would clash if you had multiple instances. There is work happening to possibly make Postgres multi-threaded, this will solve that problem.
The long term ambition of the PGlite project is to create a libpglite, a low level embedded Postgres with a C api, that will enable all this. We quite far off though - happy to have people join the project to help make it happen!
we could think big to someday do that within py-pglite project actually.
let me put it as the roadmap of v2(much more work to do!)
https://github.com/wey-gu/py-pglite/issues/4
> The library automatically manages PGlite npm dependencies.
I'm sorry what? Please don't do this. There is no way you can do this in a way that:
(a) is compatible with every pre-existing in-repo NodeJS setup
(b) plays nicely with all SCA automation configurations (Dependabot, etc.)
---
Edit:
After closer inspection there's a config[0] to disable this which defaults to True, but no documentation on how to manage required Node deps when setting that value to false.
I would strongly suggest the authors default this to False with accompanying docs in the README
Overall though I do like the simplicity & readability of the codebase - it makes these gotchas really easy to find & verify even without docs - will definitely be using this.
[1] https://github.com/wey-gu/py-pglite/blob/b821cf2cfeb2e4bc58b...
And now this.
Going to use right away.
It does depend on SQLAlchemy. Can this also be used with asyncpg? Is it on the roadmap?
Would love to see a benchmark between this, pglite and testcontainers.
Anything that won’t work if you tried this as a drop in replacement for a full PG instance in tests? Maybe extensions? Any benchmarks?
https://www.reddit.com/r/SQL/comments/10s4608/comment/j722e1...
Because if you're really interested in testing postgres you can test PG in PG: https://pgtap.org/
So often what you do is when unit testing, you need to test CRUD-type operations between the application and the database. So you generally have to spin up a temporary database just to delete it at the end of the tests. Commonly this is done with SQLite during testing, even if you built the application for postgres in production. Because it is fast and easy and you can interact with it as a file, not requiring connection configurations and added bloat.
But then sometimes your app gets complicated enough that you are using Postgres features that SQLite doesn't have comparables to. So now you need a PG instance just for testing, which is a headache.
So this project bridges the gap. Giving you the feature completedness and consistency across envs of using postgres, but with the conveniences of using SQLite.