Peter Venable Modeling Syntax for Parsing and Translation Degree Type: Ph.D. in Computer Science Advisor(s): John Lafferty Graduated: December 2003 Keywords: Statistical, syntax, parsing, translation Abstract Syntactic structure is an important component of natural language utterances, for both form and content. Therefore, a variety of applications can benefit from the integration of syntax into their statistical models of language. In this thesis, two new syntax-based models are presented, along with their training algorithms: a monolingual generative model of sentence structure, and a model of the relationship between the structure of a sentence in one language and the structure of its translation into another language. After these models are trained and tested on the respective tasks of monolingual parsing and word-level bilingual corpus alignment, they are demonstrated in two additional applications. First, a new statistical parser is automatically induced for a language in which none was available, using a bilingual corpus. Second, a statistical translation system is augmented with syntax-based models. Thus the contributions of this thesis include: a statistical parsing system; a bilingual parsing system, which infers a structural relationship between two languages using a bilingual corpus; a method for automatically building a parser for a language where no parser is available; and a translation model that incorporates phrase structure. Thesis Committee John Lafferty (Chair) Daniel Sleator Jaime Carbonell Michael Collins (MIT) Randy Bryant, Head, Computer Science Department James Morris, Dean, School of Computer Science Thesis Document CMU-CS-03-216.pdf (2.55 MB) (130 pages) Copyright Notice Return to Degrees List Thesis Repositories SCS Technical Reports Kilthub Proquest (requires CMU login)