I built a small OCR tool in C that lets you select a region of an image and send it to an LLM for text recognition.
It uses SDL2 for rendering and libcurl for the network part, and works with both local servers (llama.cpp-style) and in theory remote APIs.
The workflow is:
open image -> zoom/pan -> draw rectangle -> send -> get text
I wanted something lightweight and easy to understand, without large frameworks, and also as a way to experiment with vision-capable models in a simple pipeline.
Some features:
rectangle selection UI
zoom and pan
cancel running requests
minimal dependencies
It uses SDL2 for rendering and libcurl for the network part, and works with both local servers (llama.cpp-style) and in theory remote APIs.
The workflow is: open image -> zoom/pan -> draw rectangle -> send -> get text
I wanted something lightweight and easy to understand, without large frameworks, and also as a way to experiment with vision-capable models in a simple pipeline.
Some features:
rectangle selection UI zoom and pan cancel running requests minimal dependencies
It’s still pretty early, but usable. https://github.com/haschka/ocr_tool