Nonuniform Markov Models

Report ID: TR-536-96
Author: Ristad, Eric Sven / Thomas, Robert G.
Date: 1996-11-00
Pages: 17
Download Formats: |Postscript|
Abstract:

A statistical language model assigns probability to strings of arbitrary length. Unfortunately, it is not possible to gather reliable statistics on strings of arbitrary length from a finite corpus. Therefore, a statistical language model must decide that each symbol in a string depends on at most a small, finite number of other symbols in the string. In this report we propose a new way to model conditional independence in Markov models. The central feature of our nonuniform Markov model is that it makes predictions of varying lengths using contexts of varying lengths. Experiments on the Wall Street Journal reveal that the nonuniform model performs slightly better than the classic interpolated Markov model.

This technical paper has been published as
Nonuniform Markov Models. Eric Sven Ristad and Robert G. Thomas, Internat. Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, April 20-24, 1997.