|
|
My interests span several areas of Artificial Intelligence, Language Technologies such as Machine Translation, Machine Learning and Computational Biology. In particular, my current research is focused on areas such as text mining (extraction, categorization, novelty detection) and in new theoretical frameworks such as a unified utility-based theory bridging information retrieval, summarization, question-answering, personalized search and related tasks. My research style typically combines theory, experimentation and system building, often in collaboration with students, research staff, and/or faculty colleagues. More specifically, the following illustrate recent research areas: Active Learning : Supervised machine learning methods require labeled training instances, but in many practical applications the difficulty or cost of such obtaining such labels presents a major barrier. Active Learning seeks to identify the fewest and most significant instances to label, conditioned on learning method, estimated data distribution, learned decision function thus far, and other factors. I am interested in ensemble-based active learning, in active learning for highly skewed class distributions (a common phenomenon), and in differential labeling cost models. Machine Translation: I am working on learning-based methods for Machine Translation, including generalized example-based MT (learns from parallel text), Context-Based MT (which requires only monolingual training text), and rule-learning for rare languages which may lack sufficientl quantities of electronic text, parallel or otherwise. Data and Text Mining : I am working on detecting the onset of novel patterns both in text streams and structured data streams. Novelty detection goes beyond discovering outliers, to determine emergent coherent groupings of data (e.g. clusters, temporally-related patterns) that extend, change or are unrelated to historical data patterns. |
||||