SCS Undergraduate Thesis Topics

Jon Chu Luis von Ahn/Anthony Tomasic Generating Giant Word Corpora with Human Computation to Solve Word Sense Disambiguation

Given a word within a sentence, how can a computer determine the meaning of that word? If there is only one given definition of the word, the solution can be easily determined. If multiple definitions exist however, this problem becomes magnitudes more difficult. The open problem of Word Sense Disambiguation, hereafter referred to as WSD, has yet to be adequately solved. The applicability of machine learning to this problem is obvious. However, the major issue with such an approach is a lack of data with which we can train a machine learning algorithm. Our solution to this issue is to apply Human Computation. We will create a game that will generate a giant corpus of (word, sentence) pairs tagged with a disambiguated definition for that word from previously untagged sentences. Using this corpus, we will gain the ability to successfully train a machine learning algorithm to be robust enough to solve WSD on a domain as large as an entire language. Thus, an effective solution to WSD can be created.

