[https://huggingface.co/bosonai/higgs-audio-v2-generation-3B-base](https://huggingface.co/bosonai/higgs-audio-v2-generation-3B-base)

The model demonstrates strong performance in automatic prosody adjustment and generating natural multi-speaker dialogues across languages .

Notably, it achieved a 75.7% win rate over GPT-4o-mini-tts in emotional expression on the EmergentTTS-Eval benchmark . The total parameter count for this model is approximately 5.8 billion (3.6B for the LLM and 2.2B for the Audio Dual FFN)

Posted in