SCS Undergraduate Thesis Topics
|Malcolm Greaves||William W. Cohen (MLD,LTI)||Leveraging Dependency Parse Structures for Large-Scale Noun Phrase Classification|
How do you construct a program that can read the web and learn about the world? How could a program process billions of sentences, learn the semantic meaning of hundreds of thousands of noun phrases, and use this knowledge to classify unobserved noun phrases? We developed a novel three-stage algorithm for noun phrase classification and present empirical results on a 3 TB corpus of parsed English text. Each sentence is represented as a directed, acyclic graph, where tokens are vertices and labeled edges represent the syntactic dependency relationship between tokens. The algorithm's first stage learns graph walk strategies for associating semantically related noun phrases. The second stage uses these strategies to build a rich noun phrase feature space. The third stage learns a logistic regression model on this constructed feature space that is highly effective at noun phrase classification. We present this novel three-stage algorithm in detail and report its performance on several noun phrase classification experiments.