You are here: ABI » ThesesHome » BScBlastInSeqAn

BScBlastInSeqAn

A study comparing the classic NCBI BLAST implementation with a straightforward implementation in SeqAn.

Background

BLAST [1] is the best known approach for finding local similarities between sequences. It works by rapidly identifying k-mers shared by two sequences. Group the k-mers into high-scoring sequence pairs (HSPs), i.e. similarities that are close to each other. These pairs are used as seeds and then extended.

The NCBI hosts the current "state of the art" BLAST implementation. However, the implementation is highly complex and evolved over a long time with multiple extensions in the implementation, algorithmics, and statistics involved.

Topic

The task for this thesis is to compare the NCBI BLAST implementation with a simpler implementation using SeqAn. NCBI BLAST includes many heuristics and optimizations but is hard to understand. The SeqAn library should allow a more straightforward implementation at a small cost of performance (compared to an implementation with years of optimization work).

The task for this thesis is to implement a variant of the BLAST algorithm using the SeqAn library. The result should be a SeqAn module for identifying local matches using the BLAST algorithm and tests. The practical applicability should be demonstrated by a driver program that uses the BLAST library module. The library module should be written in a generic fashion such that both DNA and Amino Acid sequences can be searched for local similarities.

The resulting BLAST implementation should then be compared to the existing NCBI BLAST implementation in terms of running time and memory usage as well as the complexity of the source code. One aim is a better maintainability (code structure, using a library, documentation, tests) while providing competetive resource usage and good performance.

A benchmark for BLAST program already exists as prior work by Hannes Hauswedell.

Comments

The computation of e-Values is not part of the task since it is too involved for a BSc thesis. An extended version could also become a topic for a MSc thesis.

Literature

  • [1] Altschul, Gish, Miller, Myers, Lipman Basic local alignment search tool, J. Mol. Biol., Band 215, 1990, S. 403–410.
  • [2] Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389-402.
  • [3] NCBI BLAST Source Code ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback