How well does this support random-access queries to the file names and content at a certain revision? Like:
- "Checking out" a specific branch (which can be reasonably slow)
- Query all files and folders in path `/src`
- Query all files and folders in path `/src/*` (and maybe with extra pattern matches)
- Be able to read contents of a file from a certain offset for a certain length
These are similar to file system queries to a working directory
Still halfway through reading, but what you've made can unlock a lot of use cases.
> I tried SQLite first, but its extension API is limited and write performance with custom storage was painfully slow
For many use cases, write performance does not matter much. Other than the initial import, in many cases we don't change text that fast. But the simpler logistics of having a sqlite database, with the dual (git+SQL) access to text is huge.
That said, for the specific use case I have in mind, postgres is perfectly fine
SQLite is fine right up until you want concurrent writers. Once you need multiple users, cross-host access, or anything that looks like shared infra instead of a local cache, the file-locking model stops being cute and starts setting the rules for the whole design. For collaborative versioning, Postgres makes more sense.
I did actually look into writing the extension for duckdb. But similar to SQLite the extension possibilities are not great for what I needed. Though duckdb is a great database.
I love it. I love having agents write SQL. It's very efficient use of context and it doesn't try to reinvent informal retrieval part of following the context.
Did you find you needed to give agents the schema produced by this or they just query it themselves from postgres?
so most analyses already have a CLI function you can just call with parameters. for those that don't, in my case, the agent just looked at the --help of the commands and was able to perform the queries.
- "Checking out" a specific branch (which can be reasonably slow) - Query all files and folders in path `/src` - Query all files and folders in path `/src/*` (and maybe with extra pattern matches) - Be able to read contents of a file from a certain offset for a certain length
These are similar to file system queries to a working directory
> I tried SQLite first, but its extension API is limited and write performance with custom storage was painfully slow
For many use cases, write performance does not matter much. Other than the initial import, in many cases we don't change text that fast. But the simpler logistics of having a sqlite database, with the dual (git+SQL) access to text is huge.
That said, for the specific use case I have in mind, postgres is perfectly fine
https://fossil-scm.org/
even humans don’t do this unless there’s a crazy bug causing them to search around every possible angles.
that said, this sound like a great and fun project to work on.
Did you find you needed to give agents the schema produced by this or they just query it themselves from postgres?