SCS Undergraduate Thesis Topics

2009-2010
Rishav Bhowmick Noah Smith/Kemal Oflazer Rich Named Entity Recognition
     

There is a need to identify named entities in free text, including references to people, organizations, and locations. The standard datasets for building named entity recognizers are limited to these three coarse-grained category types, but I am interested in a larger set of more specific labels (e.g., languages, geopolitical entities, nationalities, diseases). There are proprietary tools for named entity recognition that provide these labels (but with some level of error). The goal of this project is to build a named entity recognizer that provides rich labels, using a large unlabeled dataset and the output of the proprietary tool. This will involve a little bit of machine learning and parallel processing using cloud computing also.


Close this window