Main menu

Exploiting Syntactic, Semantic and Lexical Regularities in Statistical Language Modeling

This project aims to build a statistical language model that is able to capture various kinds of regularities of natural language, mainly local lexical and long range syntactic, or semantic regularities to improve the performance of various natural language applications. It is conducted under the directed Markov random field paradigm to sequentially embed more advanced syntactic structure and/or semantic topic components plus to form complex distributions for natural language. By exploiting the particular structure of each composite language model, the seemingly complex statistical representations are decomposed into simpler ones; this enables the estimation and inference algorithms for the simpler composite language models to become internal building blocks for the estimation of complex composite language models, thus finally solving the estimation problem for extremely complex, high-dimensional distributions.

  • Funding: Project 1. NSF IIS RI-Small
  • PI/Contact: Prof. Shaojun Wang