The Centre for Speech Technology Research, The university of Edinburgh

Publications by Michael Berger

[1] Michael A. Berger, Gregor Hofer, and Hiroshi Shimodaira. Carnival - combining speech technology and computer animation. IEEE Computer Graphics and Applications, 31:80-89, 2011. [ bib | DOI ]
[2] Daniel Felps, Christian Geng, Michael Berger, Korin Richmond, and Ricardo Gutierrez-Osuna. Relying on critical articulators to estimate vocal tract spectra in an articulatory-acoustic database. In Proc. Interspeech, pages 1990-1993, September 2010. [ bib | .pdf ]
We present a new phone-dependent feature weighting scheme that can be used to map articulatory configurations (e.g. EMA) onto vocal tract spectra (e.g. MFCC) through table lookup. The approach consists of assigning feature weights according to a feature's ability to predict the acoustic distance between frames. Since an articulator's predictive accuracy is phone-dependent (e.g., lip location is a better predictor for bilabial sounds than for palatal sounds), a unique weight vector is found for each phone. Inspection of the weights reveals a correspondence with the expected critical articulators for many phones. The proposed method reduces overall cepstral error by 6% when compared to a uniform weighting scheme. Vowels show the greatest benefit, though improvements occur for 80% of the tested phones.

Keywords: speech production, speech synthesis
[3] Michael Berger, Gregor Hofer, and Hiroshi Shimodaira. Carnival: a modular framework for automated facial animation. Poster at SIGGRAPH 2010, 2010. Bronze award winner, ACM Student Research Competition. [ bib | .pdf ]
[4] Gregor Hofer, Korin Richmond, and Michael Berger. Lip synchronization by acoustic inversion. Poster at Siggraph 2010, 2010. [ bib | .pdf ]
[5] Richard S. McGowan and Michael A. Berger. Acoustic-articulatory mapping in vowels by locally weighted regression. Journal of the Acoustical Society of America, 126(4):2011-2032, 2009. [ bib | .pdf ]
A method for mapping between simultaneously measured articulatory and acoustic data is proposed. The method uses principal components analysis on the articulatory and acoustic variables, and mapping between the domains by locally weighted linear regression, or loess [Cleveland, W. S. (1979) J. Am. Stat. Assoc. 74, 829-836]. The latter method permits local variation in the slopes of the linear regression, assuming that the function being approximated is smooth. The methodology is applied to vowels of four speakers in the Wisconsin X-ray Microbeam Speech Production Database, with formant analysis. Results are examined in terms of (1) examples of forward (articulation-to-acoustics) mappings and inverse mappings, (2) distributions of local slopes and constants, (3) examples of correlations among slopes and constants, (4) root-mean-square error, and (5) sensitivity of formant frequencies to articulatory change. It is shown that the results are qualitatively correct and that loess performs better than global regression. The forward mappings show different root-mean-square error properties than the inverse mappings indicating that this method is better suited for the forward mappings than the inverse mappings, at least for the data chosen for the current study. Some preliminary results on sensitivity of the first two formant frequencies to the two most important articulatory principal components are presented.