Qwen3-TTS: The Next-Generation Open-Source AI Speech Model

Generate hyper-realistic speech, clone voices instantly, and design unique audio personas with the power of Qwen3-TTS. The most advanced open-source text-to-speech model, now available online.

Our Technology

Why Choose Qwen3-TTS?

Powered by the Qwen3-TTS-Tokenizer-12Hz multi-codebook speech encoder, our platform delivers SOTA performance.

High-Fidelity Reconstruction: Unlike traditional models, Qwen3-TTS utilizes a lightweight non-DiT architecture with a 12Hz tokenizer. This ensures efficient acoustic compression while preserving paralinguistic information like breath and ambient tone.

Dual-Track Modeling: Experience extreme low-latency streaming. The model can output the first audio packet after processing just a single character, with end-to-end latency as low as 97ms.

Intelligent Understanding: The model deeply integrates text semantic understanding, allowing it to adapt tone, rhythm, and emotion based on natural language instructions.

10+
Languages Supported
97ms
Ultra-low Latency
1.7B
Parameter Model
12Hz
Tokenizer Frequency

Key Features

Multilingual Support

Native support for Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian.

Zero-Shot Cloning

Clone any voice using just 3 seconds of reference audio with high speaker similarity (0.789 score).

Voice Design

Create new voices from scratch using natural language prompts describing age, gender, and personality.

Instruction Control

Control prosody, emotion, and style via text instructions (e.g., "speak sadly", "whisper").

Real-Time Streaming

Dual-Track hybrid architecture enables instant playback with minimal buffering.

Robust Text Handling

Handles complex text, special symbols, and mixed languages effortlessly.

How to Use Qwen3-TTS Online

01

Choose Your Mode

Select between Custom Voice for standard TTS, Voice Design to create a persona, or Voice Clone to replicate a voice.

02

Input Text & Config

Type the text you want the AI to speak. Optionally, add style instructions or upload reference audio.

03

Generate & Download

Click generate to process the audio in the cloud. Listen to the high-fidelity result and download it.

Frequently Asked Questions

Is Qwen3-TTS Online free to use?

Yes, Qwen3-TTS is an open-source project. Our online demo allows you to experience its capabilities for free.

What languages does Qwen3-TTS support?

It supports 10 languages: Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian.

What is the difference between the 1.7B and 0.6B models?

The 1.7B model offers peak performance and complex instruction following, while the 0.6B model is optimized for efficiency and speed.

Can I use the generated audio for commercial purposes?

The models are open-sourced by the Qwen Team. Please refer to the specific license on their GitHub repository for commercial usage terms.

How does Qwen3-TTS compare to other models?

In benchmarks, it outperforms MiniMax in voice design and shows higher speaker similarity than ElevenLabs in multilingual cloning tasks.