Lost in Phonation: Voice Quality Variation as an Evaluation Dimension for Speech Foundation Models

Preprint

Original Research Article

Public

Lost in Phonation: Voice Quality Variation as an Evaluation Dimension for Speech Foundation Models

Shared by Gary on Oct 31, 2025 • 04:02 AM UTC

Abstract

Recent advances in speech foundation models (SFMs) have enabled the direct processing of spoken language from raw audio, bypassing intermediate textual representations. This capability allows SFMs to be exposed to, and potentially respond to, rich paralinguistic variations embedded in the input spee...

Subject

Speech Foundation Models

Voice Quality

Phonation Types

Highlights

-First systematic study of SFM behaviour under controlled phonation variation.

-Introduces VQ-Bench; a corpus with modal, breathy, creaky, and end-creak phonation variants for parallel prompts.

-Open-ended generation tasks and SER are used to probe paralinguistic sensitivity in speech foundation models.

NobleBlocks' AI Take

Research Assistant

AI chat, annotations, notes & similar papers

Finding related papers...

Discussions

(0)

No comments yet

Be the first to share your thoughts!

Subject

Highlights

Strengths & Weaknesses

Commercial Applications

Societal Benefits & Applications

Discussions