1、语音大模型:从级联到端到端杨学锐目录010203040605LLM 如何重塑语音技术如何构建端到端语音模型表征如何构建端到端语音模型架构如何构建端到端语音模型训推如何构建端到端语音模型任务模型评估:什么是好模型01LLM 如何重塑语音技术LLM 如何重塑语音技术 传统语音技术 流水线式架构:误差逐级传递,信息流失;理解:只能处理简单指令,无法进行多轮、多模态的深度推理;表达:TTS声音机械,韵律模板化,没有真正的“人感”;LLM 如何重塑语音技术 ASR Open-source Whisper*SenseVoice FireredASR Close-source/API SeedASR Ste
2、pASR*Radford A,Kim J W,Xu T,et al.Robust speech recognition via large-scale weak supervisionCLLM 如何重塑语音技术 ASR Non-LLM Paraformer*Open-source Whisper SenseVoice FireredASR Close-source/API SeedASR StepASR*Gao Z,Zhang S,McLoughlin I,et al.Paraformer:Fast and accurate parallel transformer for non-autor
3、egressive end-to-end speech recognitionJ.LLM 如何重塑语音技术 ASR Non-LLM Paraformer*Open-source Whisper SenseVoice FireredASR Close-source/API SeedASR StepASR*Gao Z,Zhang S,McLoughlin I,et al.Paraformer:Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognitionJ.LLM 如何重塑语音技
4、术 ASR Open-source Whisper SenseVoice FireredASR Close-source/API SeedASR StepASR*Text token outputContext/Hotwords*https:/ 如何重塑语音技术 ASR Open-source Whisper SenseVoice-L FireredASR Close-source/API SeedASR StepASR1 Yang X,Li J,Zhou X.A novel pyramidal-FSMN architecture with lattice-free MMI for speec
5、h recognitionJ.arXiv preprint arXiv:1810.11352,2018.2 Gulati A,Qin J,Chiu C C,et al.Conformer:Convolution-augmented transformer for speech recognitionJ.arXiv preprint arXiv:2005.08100,2020.3 Bai Y,Chen J,Chen J,et al.Seed-asr:Understanding diverse speech and contexts with llm-based speech recognitio
6、nJ.arXiv preprint arXiv:2407.04675,2024.4 Wu B,Yan C,Hu C,et al.Step-audio 2 technical reportJ.arXiv preprint arXiv:2507.16632,2025.LLM 如何重塑语音技术*Xie T,Rong Y,Zhang P,et al.Towards controllable speech synthesis in the era of large language models:A surveyJ.arXiv e-prints,2024:arXiv:2412.06602.LLM 如何重