Open-Source Text-to-Speech (TTS) Models
1. ChatTTS
- Description: A powerful conversational text-to-speech model with the ability to mix Chinese and English, and support for multiple speakers. It can be configured for six languages including Chinese, English, and Japanese.
- Demo: Details and demo
2. ToucanTTS
- Description: An open-source text-to-speech model supporting speech synthesis in over 7000 languages, with multi-speaker capabilities and the ability to simulate rhythm, stress, and intonation.
- Demo: Details and demo
3. Fish Speech
- Description: An open-source TTS model supporting Chinese, English, and Japanese, with voice processing close to human level, trained with about 150,000 hours of trilingual data.
- Demo: Details and demo
4. FunAudioLLM
- Description: An open-source TTS model by Alibaba, designed to facilitate natural interaction between humans and LLMs through voice understanding and generation.
- Demo: Details and demo
5. Parler-TTS
- Description: A lightweight text-to-speech model generating high-quality, natural speech in the style of a given speaker (gender, pitch, speaking style, etc.).
- Demo: Details and demo
6. F5-TTS
- Description: An open-source TTS from Shanghai Jiao Tong University/Cambridge offering zero-shot voice cloning, real-time inference, and support for speech speed control and seamless transitions between languages/dialects.
- Demo: Details and demo
7. MaskGCT
- Description: A zero-shot, fully non-autoregressive TTS model supporting cross-lingual dubbing, voice cloning, language conversion, and emotion control.
- Demo: Details and demo
8. Smol TTS
- Description: An open-source TTS model based on the LLaMa architecture, offering zero-shot voice cloning.
- Demo: Details and demo
9. Kokoro
- Description: An open-source TTS model with 82 million parameters, trained on less than 100 hours of audio, supporting multiple languages.
- Demo: Details and demo
10. OuteTTS
- Description: An open-source TTS model supporting six languages: English, Japanese, Korean, Chinese, French, and German, with enhanced naturalness and coherence through punctuation support.
- Demo: Details and demo
11. Llasa
- Description: A zero-shot voice cloning and TTS model capable of generating speech from input text or using a given voice prompt.
- Demo: Details and demo