Speech Processing

Text-to-speech, voice conversion and voice transformation, automatic speech recognition, classifiers and other pattern recognition

Speech Processing is the analysis of human speech (using digital signal processing techniques). There are several aspects of speech processing, according to the focus of analysis: speech synthesis, voice or speech recognition, speaker recognition, voice analysis, speech coding and compression, speech enhancement, speaker diarization, etc.

Vicomtech research focuses on the following lines:

  • Text-to-Speech (TTS) technologies are used to synthesize computer-generated spoken text, which resembles human speech, with written text as an input. TTS produces an artificial human voice.
  • Voice Conversion and Voice Transformation allows the digital transformation of any given voice (source voice) to perceptually mimic the voice of a specific speaker (target voice).
  • Automatic Speech Recognition (ASR) deals with the automatic, computer-supported conversion of spoken input in human language into the corresponding written text.
  • Classifiers and other Pattern Recognition technologies allow classifying information according to predefined criteria. The information may vary in nature: visual, textual, acoustic, etc.


Outstanding projects