JP TTS — Preview

Listening interfaces from the ongoing research. Pick a session below.

Blind A/B — 24 voices

All current candidates rendering the same JP passage. Engines · blind reveal · pipeline tabs.

Latest · Irodori v3 (3) · VoxCPM2 (4) · Qwen3-TTS (2) · Google Chirp3-HD ×5 · Supertonic 3 ×6 · Fish S2 Pro ×2

Emotion sweep — instruct mode

Qwen3-TTS vs VoxCPM2 across 5 emotions (calm / sad / happy / angry / anxious).

Tests soft instruct steering · same speaker · same passage

New this round: Supertonic 3 (6 voices — MIT/OpenRAIL, 99M ONNX, no diffusion = no MLX-diffusion gap) and Fish Audio S2 Pro (Dual-AR + RL, commercial-NC, the ceiling reference).

Listening order from RESEARCH.md: VoxCPM2 Voice Design → Irodori-TTS v3 → Qwen3-TTS → Google Chirp3-HD (cloud) → Supertonic 3 → Fish S2 Pro (NC-licensed ceiling). Round-3 consensus says Irodori-TTS v3 fixes the fluent-foreigner accent that bit earlier rounds.