thanks, yeah, the problem is just handling scale, we don't have the infra ready to go, but anyone can do that. Its easy for people to run on their laptops straight up. Will try the VPS route.
I know we all think of bad things when we hear "short form video" but short demos can do a LOT for any project, shows the user how its used, what it looks like, what it solves, etc all in anywhere from 15 seconds to a couple of minutes, doesn't need to be ultra fancy, screen recording is fine. :)
Since there is no GUI here, I feel like a simple plaintext chat transcript would be both 100x smaller and 100x easier to read. (Not to mention accessible.)
Hmm.. this might make it feasible to build something like a command line program where you can optionally just specify the arguments in natural language. Although I know people will object to including an extra 14 MB and the computation for "parsing" and it could be pretty bad if everyone started doing that.
But it's really interesting to me that that may be possible now. You can include a fine-tuned model that understands how to use your program.
E.g. `> toolcli what can you do` runs `toolcli --help summary`, `toolcli add tom to teamfutz group` = `toolcli --gadd teamfutz tom`
Come to think of it, this could be a nice model to have as the first pass in a more complex agent system where Needle hands of the results of a tool call to a larger model.
I source old, defective high-end radios with timeless designs from brands like Grundig or Braun, and replace the original hardware with a Raspberry Pi while using the original audio parts to build custom smart speakers. Reliable hotword detection and voice command recognition have been a persistent challenge over the years, but whisper and other small models have helped enormously. At the moment I have ollama running on my server with qwen 9b which works fine but a 26M that could be deployed on the pi itself would be amazing.
This is pretty much exactly what I want for Home Assistant. I yell out, "Computer! Lights!" and it toggles the lamp in the room on or off. (I mean I can do that now, I think, but probably with a much larger model.)
I haven't played with it yet, but does it ever return anything other than a tool call? What are the failure modes? What if it doesn't understand the request? Does it ever say it can't find a tool? Does it get confused if there are two similar (but different) tools? Can it chain tools together (e.g. one tool to look up and address and another to get directions to the address)?
I mean, I plan on downloading the model later tonight and finding out for myself, but since I'm stuck at work right now, I figured I'd ask anyway...
I find this stuff super fascinating and been thinking about it myself. Maybe one could bootstrap tiny models on a rather 'pure' procedural data set. Neglecting [0] of course...
Got a bunch of errors trying to run it on CPU though. Very likely connected to me running this in a container (unpriv LXC), but figured for 26M CPU would suffice.
Can this be a Siri-like core? Set me a timer, tell me what’s the weather, etc. Here is transcribed text and available list of tools for the model to call, and voice the output.
Many people tease that they will, and start... but then kinda stop. But mostly just been building my own bespoke thing on my own bespoke platform, and kinda running out of steam because I need to make $$ instead.
FYI, distilling Gemini is explicitly against the ToS:
"You may not use the Services to develop models that compete with the Services (e.g., Gemini API or Google AI Studio). You also may not attempt to reverse engineer, extract or replicate any component of the Services, including the underlying data or models (e.g., parameter weights)."
Yeah I think Google should shove that somewhere. They effectively distilled all the internet's knowledge into these models...without asking & without permission
You can check the very simple docker file there.
But it's really interesting to me that that may be possible now. You can include a fine-tuned model that understands how to use your program.
E.g. `> toolcli what can you do` runs `toolcli --help summary`, `toolcli add tom to teamfutz group` = `toolcli --gadd teamfutz tom`
Come to think of it, this could be a nice model to have as the first pass in a more complex agent system where Needle hands of the results of a tool call to a larger model.
I will defiantly play around with this!
I haven't played with it yet, but does it ever return anything other than a tool call? What are the failure modes? What if it doesn't understand the request? Does it ever say it can't find a tool? Does it get confused if there are two similar (but different) tools? Can it chain tools together (e.g. one tool to look up and address and another to get directions to the address)?
I mean, I plan on downloading the model later tonight and finding out for myself, but since I'm stuck at work right now, I figured I'd ask anyway...
[0]: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
> Repository Not Found for url: http s://huggingface.co/api/datasets/Cactus-Compute/needle-tokenizer/revision/main.
Got a bunch of errors trying to run it on CPU though. Very likely connected to me running this in a container (unpriv LXC), but figured for 26M CPU would suffice.
https://pastebin.com/PYZJKTNk
"You may not use the Services to develop models that compete with the Services (e.g., Gemini API or Google AI Studio). You also may not attempt to reverse engineer, extract or replicate any component of the Services, including the underlying data or models (e.g., parameter weights)."
That said, we need more people distilling models IMO, just be ready for a C&D and a ban