1 comments

  • san0n 1 day ago
    Hi HN, author here.

    I started KektorDB as a personal challenge to learn Go and database internals. Soon, however, I got hooked: I wanted the project to have some dignity beyond a simple "toy project".

    I didn’t follow a rigid roadmap; I iterated based on what felt right. I started by implementing caching and a semantic firewall, and from there, the step towards an integrated RAG pipeline was natural.

    To be honest, the choice to integrate RAG comes from my laziness. I tried building a system using Python and LangChain, but I hated managing external scripts and dependencies just to make data talk to the LLM. I wanted a "batteries-included" solution.

    However, the first results of my "naive" RAG were disappointing. That’s why I decided to integrate a Lightweight Graph (to semantically link chunks) and techniques like HyDe directly into the engine. All while keeping a fixed constraint: it must remain a single binary, easily embeddable as a Go library.

    While KektorDB is a general-purpose embeddable Vector + Graph database, its RAG pipeline is intentionally designed as a practical default. It's not a replacement for complex, heavily customized RAG infrastructures, but a way to get a local system working quickly.

    Here is a quick overview of the features:

    - HNSW Indexing: With support for Float32, Float16, and Int8 quantization.

    - Hybrid Search: Combines vector similarity with BM25 keyword scoring for better accuracy.

    - Graph Layer: Maintains a generic adjacency graph alongside vectors. Although the RAG pipeline uses it to link chunks, the system exposes APIs to define arbitrary relationships enabling semantic traversal.

    - Persistence: AOF (Append-Only File) + Snapshot.

    - RAG Features: Background worker for document ingestion + integrated proxy for query rewriting and Grounded HyDe (OpenAI-compatible).

    Current Limitations:

    1. It is currently RAM-bound (graph and vectors live in memory). I am working on a hybrid disk-storage engine.

    2. Ingestion parsing can be improved (especially regarding tables in PDFs).

    The code is pure Go (with optional Rust kernels for specific SIMD operations), all contained in a single binary.

    The project started out of a desire to learn, but I would like to continue developing it seriously. For this reason, I would appreciate any kind of technical advice or feedback.

    Thanks for reading.

    Repository: https://github.com/sanonone/kektordb