Building voice agents with Nvidia open models

(daily.co)

116 points | by kwindla 1 day ago

8 comments

smusamashah 13 hours ago
Do any of the top models let you pause and think while speaking? I have to speak non-stop to Gemini assitant and ChatGPT, which is very very useless/unnatural for voice mode. Specially for non-english speakers probably. I sometimes have to think more to translate my thoughts to english.
[-]
- fragmede 12 hours ago
  Have you tried talking to ChatGPT in your native tongue? I was blown away by my mother speaking her native tongue to ChatGPT and having it respond in that language. (It's ever so slightly not a mainstream one.)
  [-]
  - smusamashah 2 hours ago
    Even in my own language I can't talk without any pauses.
amelius 1 day ago
I've been using festival under Linux.
https://manpages.ubuntu.com/manpages/trusty/man1/festival.1....
But it is quite old now and pre-dates the DL/AI era.
Does anybody know of a good modern replacement that I can "apt install"?
[-]
- sigmonsays 1 day ago
  I used piper with a model I found online. It's _ALOT_ better than festival afaik. I'm not sure you can apt install it though.
  echo "hello" | piper --model ~/.local/share/piper/en_US-lessac-medium.onnx --output_file - | aplay
  [-]
  - gunalx 1 day ago
    You can in fact apt install piper.
    [-]
    - amelius 1 day ago
      That's a different piper.
      piper - GTK application to configure gaming devices
rickydroll 18 hours ago
<pedantic>Voice recognition identifies who you are, speech recognition identifies what you say. </pedantic>
Example:
Voice recognition: arrrrrrgh! (Oh, I know that guy. He always gets irritated when someone uses terms speech and voice recognition wrong)
Speech Recognition: "Why can't you guys keep it straight? It is as simple as knowing the difference between hypothesis and theory."
jjcm 1 day ago
These have gotten good enough to really make command-by-voice interactions pleasant. I'd love to try this with Cursor - just use it fully with voice.
nowittyusername 1 day ago
This is perfect for me. I just started working on the voice related stuff for my agent framework and this will be of real use. Thanks.
atonse 17 hours ago
Can't wait for this to land in MacWhisper. I like the idea of the streaming dictation especially when dictating long prompts to Claude Code.
deckar01 1 day ago
It supports Turing T4, but not Ampere…
[-]
- nsbk 22 hours ago
  Any ideas on how to add Ampere support? I have a use case in mind that I would love to try on my 3090 rig
  [-]
  - deckar01 13 hours ago
    Magpie-TTS needs a kernel compiled targeting Ampere, but it appears to be closed source. It was compiled for the 2018 T4, but not 2020-2024 consumer cards, just 2025 consumer cards.
jauntywundrkind 1 day ago
There's also the excellent also open source unmute.sh. which alas is also Nvidia only at this point. https://unmute.sh/
[-]
- vikboyechko 1 day ago
  The game show is pretty good. Have a feeling this project will consume all my attention this week, thanks for the tip.