I fine-tuned a 0.5B LLM to classify support tickets for $10/month

(silentworks.tech)

1 points | by molchanovartem 2 hours ago

1 comments

molchanovartem 2 hours ago
Hi HN,
I built a support ticket classifier using a fine-tuned Qwen2.5-0.5B model. It determines intent, category, urgency, sentiment, and routing — all in a single inference.
*Why I built this:* A company needed to automate ticket routing but couldn't use cloud LLM APIs due to data privacy requirements. Self-hosted was the only option.
*Stack:* - Qwen2.5-0.5B-Instruct (fine-tuned, not LoRA) - GGUF Q4_K_M quantization (350MB) - llama-cpp-python + FastAPI - Docker on a $10/mo VPS
*Results:* - ~90% accuracy on intent/category (on synthetic ~4K dataset — with real data and 5-10K examples, accuracy improves) - 150ms on Apple Silicon, 3-5s on budget VPS (old Xeon without AVX2)
*When this makes sense vs cloud APIs:* - Data must stay on-premise - High volume (>10K/month) where API costs add up - Narrow classification task (not general chat)
*Try it:* - Demo: https://silentworks.tech/test - API docs: https://silentworks.tech/docs
Happy to discuss the implementation details, training approach, or deployment setup.
---
Contact: https://t.me/var_molchanov