Unsupervised Language Model Adaptation for Speech Recognition with no Extra Resources
* Presenting author
Classically, automatic speech recognition (ASR) models are decomposedinto acoustic models and language models (LM). LMs usually exploit thelinguistic structure on a purely textual level and usually contributestrongly to an ASR systems performance.LMs are estimated on large amounts of textual data covering the targetdomain. However, most utterances cover more specific topics,e.g. influencing the vocabulary used. Therefore, it's desirable to havethe LM adjusted to an utterance's topic.Previous work achieves this by crawling extra data from the web or byusing significant amounts of previous speech data to traintopic-specific LM on. We propose a way of adapting the LM directlyusing the target utterance to be recognized.The corresponding adaptation needs to be done in an unsupervised orautomatically supervised way based on the speech input. To deal withcorresponding errors robustly, we employ topic encodings from therecently proposed Subspace Multinomial Model. This model also avoidsany need of explicit topic labelling during training or recognition,making the proposed method straight-forward to use.We demonstrate the performance of the method on the Librispeechcorpus, which consists of read fiction books, and we discuss it'sbehaviour qualitatively.