SCS Undergraduate Thesis Topics
|Natalie Castellana||Russell Schwartz||Haplotype Motif Partitioning for Association Studies|
Since the first full genome was sequenced in 1995, the amount of available genomic data has grown exponentially. Utilizing patterns of variation, called haplotypes, has allowed scientists to begin drawing correlations between an organism's genetic code and the characteristics that manifest themselves, i.e. hair color, height, tendency towards depression.
While several methods for finding haplotypes have been explored, one model, the haplotype motif model, is especially promising. Motifs are intended to capture conserved variation while relaxing some of the constraints imposed by previous methods. The model is designed to test whether correlation information in haplotypes is lost by the more rigid models.
Finding the minimum number of motifs is an APX hard problem. So, the focus of my research has been to find an approximation to the solution. One approach is to use an integer programming formulation of the problem. To test how well the approximation algorithm finds useful motifs, this paper looks at compression ability and performance in association testing. The results are compared to two haplotype block models.