Lau Chee Yong, Oliver Watts, and Simon King. Combining lightly-supervised learning and user feedback to construct and improve a statistical parametric speech synthesizer for malay. Research Journal of Applied Sciences, Engineering and Technology, 11(11):1227-1232, December 2015. [ bib | .pdf | Abstract ]

Cassia Valentini-Botinhao, Markus Toman, Michael Pucher, Dietmar Schabus, and Junichi Yamagishi. Intelligibility of time-compressed synthetic speech: Compression method and speaking style. Speech Communication, October 2015. [ bib | DOI | Abstract ]

P. Swietojanski, P. Bell, and S. Renals. Structured output layer with auxiliary targets for context-dependent acoustic modelling. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | DOI | .pdf | Abstract ]

C. Valentini-Botinhao, Z. Wu, and S. King. Towards minimum perceptual error training for DNN-based speech synthesis. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

M. Pucher, M. Toman, D. Schabus, C. Valentini-Botinhao, J. Yamagishi, B. Zillinger, and E Schmid. Influence of speaker familiarity on blind and visually impaired children's perception of synthetic voices in audio games. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Thomas Merritt, Junichi Yamagishi, Zhizheng Wu, Oliver Watts, and Simon King. Deep neural network context embeddings for model selection in rich-context HMM synthesis. In Proc. Interspeech, Dresden, September 2015. [ bib | .pdf | Abstract ]

Manuel Sam Ribeiro, Junichi Yamagishi, and Robert A. J. Clark. A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Peter Bell and Steve Renals. Complementary tasks for context-dependent deep neural network acoustic models. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Peter Bell, Catherine Lai, Clare Llewellyn, Alexandra Birch, and Mark Sinclair. A system for automatic broadcast news summarisation, geolocation and translation. In Proc. Interspeech (demo session), Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Alessandra Cervone, Catherine Lai, Silvia Pareti, and Peter Bell. Towards automatic detection of reported speech in dialogue using prosodic cues. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Mirjam Wester, Cassia Valentini-Botinhao, and Gustav Eje Henter. Are we using enough listeners? No! An empirically-supported critique of Interspeech 2014 TTS evaluations. In Proc. Interspeech, pages 3476-3480, Dresden, September 2015. [ bib | .pdf | Abstract ]

Mirjam Wester, Matthew Aylett, Marcus Tomalin, and Rasmus Dall. Artificial personality and disfluency. In Proc. Interspeech, Dresden, September 2015. [ bib | .pdf | Abstract ]

Mirjam Wester, Zhizheng Wu, and Junichi Yamagishi. Human vs machine spoofing detection on wideband and narrowband data. In Proc. Interspeech, Dresden, September 2015. [ bib | .pdf | Abstract ]

Qiong Hu, Zhizheng Wu, Korin Richmond, Junichi Yamagishi, Yannis Stylianou, and Ranniery Maia. Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Siva Reddy Gangireddy, Steve Renals, Yoshihiko Nankaku, and Akinobu Lee. Prosodically-enahanced recurrent neural network language models. In Proc. Interspeech, page 2390—2394, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Oliver Watts, Srikanth Ronanki, Zhizheng Wu, Tuomo Raitio, and Antti Suni. The NST-GlottHMM entry to the Blizzard Challenge 2015. In Proc. Blizzard Challenge Workshop (Interspeech Satellite), Berlin, Germany, September 2015. [ bib | .pdf | Abstract ]

Oliver Watts, Srikanth Ronanki, Zhizheng Wu, Tuomo Raitio, and A. Suni. The nst-glotthmm entry to the blizzard challenge 2015. In Proceedings of Blizzard Challenge 2015, September 2015. [ bib | .pdf | Abstract ]

Oliver Watts, Zhizheng Wu, and Simon King. Sentence-level control vectors for deep neural network speech synthesis. In INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association, pages 2217-2221. International Speech Communication Association, September 2015. [ bib | .pdf | Abstract ]

Mirjam Wester, M. Luisa Garcia Lecumberri, and Martin Cooke. /u/-fronting in English speakers' L1 but not in their L2. In Proc. ICPhS, Glasgow, August 2015. [ bib | .pdf | Abstract ]

Marcus Tomalin, Mirjam Wester, Rasmus Dall, Bill Byrne, and Simon King. A lattice-based approach to automatic filled pause insertion. In Proc. DiSS 2015, Edinburgh, August 2015. [ bib | .pdf | Abstract ]

Mirjam Wester, Martin Corley, and Rasmus Dall. The temporal delay hypothesis: Natural, vocoded and synthetic speech. In Proc. DiSS 2015, Edinburgh, August 2015. [ bib | .pdf | Abstract ]

Rasmus Dall, Mirjam Wester, and Martin Corley. Disfluencies in change detection in natural, vocoded and synthetic speech. In Proc. DiSS 2015, Edinburgh, August 2015. [ bib | .pdf | Abstract ]

Alexander Hewer, Ingmar Steiner, Timo Bolkart, Stefanie Wuhrer, and Korin Richmond. A statistical shape space model of the palate surface trained on 3D MRI scans of the vocal tract. In The Scottish Consortium for ICPhS 2015, editor, Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow, United Kingdom, August 2015. retrieved from http://www.icphs2015.info/pdfs/Papers/ICPHS0724.pdf. [ bib | .pdf | Abstract ]

Z. Wu, C. Valentini-Botinhao, O. Watts, and S. King. Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis. In Proc. ICASSP, pages 4460-4464, Brisbane, Australia, April 2015. [ bib | .pdf | Abstract ]

B. Uria, I. Murray, S. Renals, C. Valentini-Botinhao, and J. Bridle. Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE. In Proc. ICASSP, pages 4465-4469, Brisbane, Australia, April 2015. [ bib | .pdf | Abstract ]

Thomas Merritt, Javier Latorre, and Simon King. Attributing modelling errors in HMM synthesis by stepping gradually from natural to modelled speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 4220-4224, Brisbane, April 2015. [ bib | .pdf | Abstract ]

Manuel Sam Ribeiro and Robert A. J. Clark. A multi-level representation of f0 using the continuous wavelet transform and the discrete cosine transform. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Brisbane, Australia, April 2015. [ bib | .pdf | Abstract ]

P. Bell and S. Renals. Regularization of context-dependent deep neural networks with context-independent multi-task training. In Proc. ICASSP, Brisbane, Australia, April 2015. [ bib | .pdf | Abstract ]

Qiong Hu, Yannis Stylianou, Ranniery Maia, Korin Richmond, and Junichi Yamagishi. Methods for applying dynamic sinusoidal models to statistical parametric speech synthesis. In Proc. ICASSP, Brisbane, Austrilia, April 2015. [ bib | .pdf | Abstract ]

P. Swietojanski and S. Renals. Differentiable pooling for unsupervised speaker adaptation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015. [ bib | .pdf | Abstract ]

Ling-Hui Chen, T. Raitio, C. Valentini-Botinhao, Z. Ling, and J. Yamagishi. A deep generative architecture for postfiltering in statistical parametric speech synthesis. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 23(11):2003-2014, 2015. [ bib | DOI | Abstract ]

H. Kamper, M. Elsner, A. Jansen, and S. J. Goldwater. Unsupervised neural network based feature extraction using weak top-down constraints. In Proc. ICASSP, 2015. [ bib | .pdf | Abstract ]

Herman Kamper, S. J. Goldwater, and Aren Jansen. Fully unsupervised small-vocabulary speech recognition using a segmental Bayesian model. In Proc. Interspeech, 2015. [ bib | .pdf | Abstract ]

Aleksandr Sizov, Elie Khoury, Tomi Kinnunen, Zhizheng Wu, and Sebastien Marcel. Joint speaker verification and antispoofing in the-vector space. IEEE Transactions on Information Forensics and Security, 10(4):821-832, 2015. [ bib | .pdf ]

Zhizheng Wu and Simon King. Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features. In Interspeech, 2015. [ bib | .pdf ]

Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, and Simon King. A study of speaker adaptation for DNN-based speech synthesis. In Interspeech, 2015. [ bib | .pdf ]

Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Cemal Hanilci, Md Sahidullah, and Aleksandr Sizov. ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In Interspeech, 2015. [ bib | .pdf ]

Xiaohai Tian, Zhizheng Wu, Siu-Wa Lee, Quy Hy Nguyen, Minghui Dong, and Eng Siong Chng. System fusion for high-performance voice conversion. In Interspeech, 2015. [ bib | .pdf ]

Zhizheng Wu, Cassia Valentini-Botinhao, Oliver Watts, and Simon King. Deep neural network employing multi-task learning and stacked bottleneck features for speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015. [ bib | .pdf ]

Zhizheng Wu, Ali Khodabakhsh, Cenk Demiroglu, Junichi Yamagishi, Daisuke Saito, Tomoki Toda, and Simon King. SAS: A speaker verification spoofing database containing diverse attacks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015. [ bib | .pdf ]

Xiaohai Tian, Zhizheng Wu, Siu-Wa Lee, Quy Hy Nguyen, Eng Siong Chng, and Minghui Dong. Sparse representation for frequency warping based voice conversion. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015. [ bib | .pdf ]

Liang Lu, Xingxing Zhang, KyungHyun Cho, and Steve Renals. A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition. In Proc. Interspeech, 2015. [ bib | .pdf | Abstract ]

Liang Lu and Steve Renals. Feature-space speaker adaptation for probabilistic linear discriminant analysis acoustic models. In Proc. Interspeech, 2015. [ bib | .pdf | Abstract ]

Liang Lu and Steve Renals. Multi-frame factorisation for long-span acoustic modelling. In Proc. ICASSP, 2015. [ bib | .pdf | Abstract ]

Leimin Tian, Catherine Lai, and Johanna D. Moore. Recognizing emotions in dialogue with disfluences and non-verbal vocalisations. In Proceedings of the 4th Interdisciplinary Workshop on Laughter and Other Non-verbal Vocalisations in Speech, volume 14, page 15, 2015. [ bib | .pdf | Abstract ]

Leimin Tian, Johanna D. Moore, and Catherine Lai. Emotion Recognition in Spontaneous and Acted Dialogues. In Proceedings of ACII 2015, Xi'an, China, 2015. [ bib | .pdf | Abstract ]

Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Rubén San-Segundo, Javier Ferreiros, Junichi Yamagishi, and Juan M. Montero. Emotion transplantation through adaptation in hmm-based speech synthesis. Computer Speech & Language, 34(1):292 - 307, 2015. [ bib | DOI | http | Abstract ]

Alexander Hewer, Stefanie Wuhrer, Ingmar Steiner, and Korin Richmond. Tongue mesh extraction from 3D MRI data of the human vocal tract. In Michael Breuß, Alfred M. Bruckstein, Petros Maragos, and Stefanie Wuhrer, editors, Perspectives in Shape Analysis, Mathematics and Visualization. Springer, 2015. (in press). [ bib ]

Korin Richmond, Zhen-Hua Ling, and Junichi Yamagishi. The use of articulatory movement data in speech synthesis applications: An overview - application of articulatory movements using machine learning algorithms [invited review]. Acoustical Science and Technology, 36(6):467-477, 2015. [ bib | DOI ]

Korin Richmond, Junichi Yamagishi, and Zhen-Hua Ling. Applications of articulatory movements based on machine learning. Journal of the Acoustical Society of Japan, 70(10):539-545, 2015. [ bib ]

Peter Bell and Steve Renals. A system for automatic alignment of broadcast media captions using weighted finite-state transducers. In Proc. ASRU, 2015. [ bib | .pdf | Abstract ]

Ahmed Ali, Walid Magdy, Peter Bell, and Steve Renals. Multi-reference WER for evaluating ASR for languages with no orthographic rules. In Proc. ASRU, 2015. [ bib | .pdf | Abstract ]

Peter Bell, Mark Gales, Thomas Hain, Jonathan Kilgour, Pierre Lanchantin, Xunying Liu, Andrew McParland, Steve Renals, Oscar Saz, Mirjam Wester, and Phil Woodland. The MGB challenge: Evaluating multi-genre broadcast media recognition. In Proc. ASRU, 2015. [ bib | .pdf | Abstract ]

Victor Poblete, Felipe Espic, Simon King, Richard M. Stern, Fernando Huenupan, Josue Fredes, and Nestor Becerra Yoma. A perceptually-motivated low-complexity instantaneous linear channel normalization technique applied to speaker verification. Computer Speech & Language, 31(1):1 - 27, 2015. [ bib | DOI | http | .pdf | Abstract ]

Rosie Kay, Oliver Watts, Roberto Barra-Chicote, and Cassie Mayo. Knowledge versus data in tts: evaluation of a continuum of synthesis systems. In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015, pages 3335-3339, 2015. [ bib | .pdf | Abstract ]