1 comments

  • molchanovartem 2 hours ago
    Hi HN,

    I built a support ticket classifier using a fine-tuned Qwen2.5-0.5B model. It determines intent, category, urgency, sentiment, and routing — all in a single inference.

    *Why I built this:* A company needed to automate ticket routing but couldn't use cloud LLM APIs due to data privacy requirements. Self-hosted was the only option.

    *Stack:* - Qwen2.5-0.5B-Instruct (fine-tuned, not LoRA) - GGUF Q4_K_M quantization (350MB) - llama-cpp-python + FastAPI - Docker on a $10/mo VPS

    *Results:* - ~90% accuracy on intent/category (on synthetic ~4K dataset — with real data and 5-10K examples, accuracy improves) - 150ms on Apple Silicon, 3-5s on budget VPS (old Xeon without AVX2)

    *When this makes sense vs cloud APIs:* - Data must stay on-premise - High volume (>10K/month) where API costs add up - Narrow classification task (not general chat)

    *Try it:* - Demo: https://silentworks.tech/test - API docs: https://silentworks.tech/docs

    Happy to discuss the implementation details, training approach, or deployment setup.

    ---

    Contact: https://t.me/var_molchanov