Comp 14112 Lab 3: Hidden Markov Models Tim Morris Academic session: 2014-2015 1 Introduction This lab involves the implementation of a hidden Markov model (HMM) for the task of differentiating between utterances of the words “yes” and “no”, as discussed in the lecture notes. You will train simple HMMs on data and investigate the Forward and Viterbi algorithms for inference. Directories containing code and data can be found in the following directory /opt/info/courses/COMP14112/labs/lab3 Make a copy of this directory in ~/COMP14112/ex3 in your file space. Make sure your CLASSPATH variable is set to include it. The data in this lab is from exactly the same speech examples used in lab 2. However, this time the speech signals have not been cropped in order to remove the silence before and after the spoken word. This cropping was carried out using an HMM and in this lab you will see how that was done. Without this cropping, the simple naive Bayes classifier approach does not work as well and an HMM-based approach is therefore more appropriate. 2 Getting started There are some classes in the markov package that have main methods you can run to plot the data. The command > java markov.PlotSound 12 will plot the sound wave for the 12th example from the data set. This is exactly as in the previous lab, except the data is now uncropped and you will observe that there is a longer region of silence before and after the spoken word. The other plot produced when you run this method shows the 1st MFCC for each segment of the same signal. As before, the examples are “yes” from 1–82 and “no” from 83–165. The command > java markov.PlotLabels 12 will plot the state labels identifying different segments of the signal. State 0 corresponds to silence before the word, state 1 corresponds to the word and state 2 corresponds to silence after the word. You will see later how this labelling can be carried out automatically using the Viterbi algorithm. 3 Guide to the code You can look at the html documentation for the markov package to see what the various classes do. In this lab you will be adapting two classes: HiddenMarkovModel and Classifier, so you should look at the code for these in particular. 4 The tasks You have four tasks. Use the main method of the Answer class to write code demonstrating your results for each task. 1. Training (5 marks) It is possible to learn the parameters of a hidden Markov model from training data. In the class HiddenMarkovModel, look at the body of the following constructor method public HiddenMarkovModel (int n, ArrayListdataList, ArrayList labelsList) The transition probability parameters of an HMM are trained in a similar way to a Markov chain model, as described in Section 3.9 of the lecture notes. In the code you will see that the number of times each state transition appears in the training data has been summed up. After this some code is incorrect and needs to be modified. You should complete the code so that the array transitionProbability[i][j] contains the probability of making a transition from state i to state j. Use your modified constructor to build two HMMs similar to the one in Section 4.2 of the lecture notes (Figure 16). The HMMs have three states: 0 for leading silence, 1 for yes or no, and 2 for trailing silence. Construct one HMM for the word “yes” and one HMM for the word “no”. Print the matrix of transition probabilities on the screen for each model. Verify that the transition probabilities are properly normalised (note: they will not be exactly the same as those shown in Figure 16). 2. Classification using the Forward Algorithm (5 marks) The Forward Algorithm is given by equation (11) in Section 4.2 of the notes. It is implemented by the forward method of the HiddenMarkovModel class. Have a look at this method and see if you can understand how it implements the recursion relation in equation (11). Notice that the probabilities have been represented using the BigDecimal class. If we had simply used doubles then the values would have been too small to represent, resulting in underflow. The output of the Forward Algorithm can be used to construct a classifier by using Bayes’ theorem, as explained in Section 4.2 of the notes. Complete the classify method of the Classifier class so that it returns the probability that the input sequence of MFCC vectors corresponds to a “yes”. You will have to use methods from the BigDecimal class in order to carry out arithmetic using the results from the Forward Algorithm (note: you can use the doubleValue method from the BigDecimal class to return the final result as a double). Evaluate the performance of your classifier on the training data by computing the percentage of incorrect classifications. 3. Building a combined yes/no model (5 marks) It is possible to combine two models for each word into a model of both words like the one shown in Figure 15 of the lecture notes. In order to do this you will have to work out sensible choices for the transition and emission probabilities of the new model by using the parameters of the previous “yes” and “no” models (these can be obtained using the get methods from the HiddenMarkovModel class). First you should construct an HMM with four states: 0 for leading silence, 1 for yes, 2 for no, and 3 for trailing silence. You can do that by using the constructor method: public HiddenMarkovModel(int n) Then you should use the set methods in the HiddenMarkovModel class to set the appropriate transition and emission probabilities. For the transition probabilities you must ensure that they remain properly normalised. Print the matrix of transition probabilities on the screen. For the emission probabilities you may find it helpful to use the combine method of the Normal class for the two silent states. 4. Viterbi algorithm for cropping and classification (5 marks) The Viterbi algorithm is implemented by the viterbi method of the HiddenMarkovModel class. Use the combined model constructed in task 3 to show how the Viterbi algorithm can be used to (1) crop the data and (2) classify the data. Evaluate the performance of the method for both of these tasks. You may assume that the labelling given for training can be considered a “gold standard” for the purposes of evaluation, although it may not be perfect in reality. 5 Evaluation For each question you will get 3 marks for a correct working implementation, 1 mark for sensible and well documented code and 1 mark for a full understanding of what you have done. You should submit your modified files: HiddenMarkovModel.java, Classifier.java and Answer.java. You should also run labprint. 6 Discussion The HMM you have implemented is a bit simpler than the ones used in real applications, mainly because the emission probabilities are assumed to come from a univariate normal distribution for each word. Looking at the MFFCs for each word it is clear that this is not always a reasonable assumption. Much better results can be achieved by using more states, e.g. phoneme states, and by using more flexible emission probability distributions. A popular choice is a distribution known as a mixture of normal distributions. This choice requires a more complex training algorithm known as an EM-algorithm, which you will see in the 2nd year if you take COMP20411.