Modelling Speech Dynamics with Trajectory-HMMs
This page contains information on a PhD project completed at Edinburgh, with
downloadable papers, source code, and sample programs.
Thesis
Thesis: Modelling Speech Dynamics with Trajectory-HMMs (pdf).
References
Le Zhang and Steve Renals.
Acoustic-Articulatory Modelling with the Trajectory HMM.
IEEE Signal Processing Letters, 15:245-248, 2008.
pdf
Le Zhang and Steve Renals.
Phone Recognition Analysis for Trajectory HMM.
In Proc. Interspeech 2006, Pittsburgh, USA, September 2006.
pdf
Source Code
The source code (written in C) for training, decoding and scoring Trajectory-HMMs can be
obtained from trajectory-20090427.tar.bz2. They were
used in my PhD project and are now available for the interest of general public
under BSD license. The code and program are provided AS IS, so there is
no support.
Binary and Sample Programs
Statically-linked Binary
Pre-built statically-linked binaries for Linux are included in the source tar
ball, which includes trajectory_train, trajectory_score and trajectory_decode
for performing training, scoring and decoding a Trajectory-HMM in HTK's model
format. These tools can handle monophone HMMs built with HTK. The
training program can also perform simple triphone update as used in Chapter 5 of the
thesis. The decoding algorithm can handle Bigram network built by HBuild,
although only phoneloop network was used in the experiments. In addition, an hmm_decode is provided to do
normal HMM token-passing inference (compatible with HVite, albeit slower).
Sample Programs
Sample script and data for training, scoring, force-aligning or decoding a
Trajectory-HMM can be obtained from trajectory_example.tar.bz2. The data
are 14 channel EMA data processed from MOCHA-TIMIT
corpus, with delta and delta-deltas appended using a 3-frame
dynamic window. The files are in HTK format and can be examined using HList.
Have fun!
April, 2009.