1 comments

  • marques576 21 hours ago
    Some features:

    169M parameters

    Streaming support

    Zero-shot voice cloning

    0.25 RTF on CPU, meaning it generates 30 seconds of audio in 7.5 seconds

    Requires 3-12 seconds of reference audio for voice cloning

    Apache 2.0 license

    The model was trained on a single L40S GPU. It’s not SOTA in most cases, can be a bit unstable, and sometimes fails to capture voice likeness.