Time-frequency reassignment for acoustic signal processing: from speech to singing voice applications

PhD Candidate Georgia Tryfou

June 19, 2017
Versione stampabile

Time: June 19, 2017, h. 10:00 am
Location: Room Ofek, Polo scientifico e tecnologico "Fabio Ferrari", Building Povo 1, Via Sommarive 5, Povo (Trento)

PhD Candidate

Dr. Georgia Tryfou

Abstract of Dissertation

The various time-frequency (TF) representations of acoustic signals share the common objective to describe the temporal evolution of the spectral content of the signal i.e., how the energy, or intensity, of the signal is changing in time. Many TF representations have been proposed in the past, and among them the short-time Fourier transform (STFT) is the one most commonly found in the core of acoustic signal processing techniques. However, certain problems that arise from the use of the STFT have been extensively discussed in the literature. These problems concern the unavoidable trade-off between the time and frequency resolution, and the fact that the selected resolution is fixed over the whole spectrum.

In order to improve upon the spectrogram, several variations have been proposed over the time. One of these variations, stems from a promising method called reassignment. According to this method, the traditional spectrogram, as obtained from the STFT, is reassigned to a sharper representation called the Reassigned Spectrogram (RS). In this thesis we elaborate on approaches that utilize the RS as the TF representation of acoustic signals, and we exploit this representation in the context of different applications, as for instance speech recognition and melody extraction.

The first contribution of this work is a method for speech parametrization, which results in a set of acoustic features called time-frequency reassigned cepstral coefficients (TFRCC). Experimental results show the ability of TFRCC features to present higher level characteristics of speech, a fact that leads to advantages in phone-level speech segmentation and speech recognition. The second contribution is the use of the RS as the basis to extract objective quality measures, and in particular the reassigned cepstral distance and the reassigned point-wise distance. Both measures are used for channel selection (CS), following our proposal to perform objective quality measure based CS for improving the accuracy of speech recognition in a multi-microphone reverberant environment. The final contribution of this work, is a method to detect harmonic pitch contours from singing voice signals, using a dominance weighting of the RS. This method has been exploited in the context of melody extraction from polyphonic music signals.