Undergraduate ProjectsProject Details

Multiple Sequence Alignment for DNA Storage Clusters

Student[s]:
Duna Mazzawi, Sima Qudsi, Butross Dallah

Multiple Sequence Alignment refers to the process of aligning a cluster of strands, usually protein or DNA, to achieve a maximal regions of similarity. More specifically, given a set S of m erroneous strands of DNA with different lengths, that assumingly originated from one common reference, the outcome of the MSA algorithm, is to be m strands with gap insertions into each strand, such that all conform to a length 𝐿≥max{𝑛𝑖||𝑆𝑖|=𝑛_𝑖}, and no index 0≤𝑖≤𝐿, yields a column consisting of only gaps, and the alignment maximizes the common substrings of the strands. MSA has been shown to be NP-complete problem. This work shows a new Sequence Alignment method, that leverage existing sequencing algorithms, in order to achieve a speedup of x80 over state of the art Multiple Sequencing Alignment algorithms. The method uses an existing pairwise alignment algorithm called FOGSAA [1], as the base of an Iterative Algorithm, along with two other phases. Further applications and Enhancements of the suggested method can further contribute to Modify the sequencing and alignment phases, in order to achieve a robust and efficient DNA data storage system.