About me

I'm Chris, currently pursuing my PhD at the Freie Universität Berlin and work at the Max-Planck-Institute of molecular genetics in Berlin. I'm a computer scientist by training who is now working in the fields of bioinformatics. My research is focused on enumerating and finding lncRNA (long non-coding RNA) in genomes. Therefore I currently work on string indices, especially bidirectional indices such as the 2FM index. I'm working on SeqAn, an open source C++ library for sequence analysis. You can also find there my latest work, a constant-time bidirectional FM index that outperforms other open source implementations of bidirectional indices in speed. For my professional website, visit www.christopher-pockrandt.de.

Publications

FAMOUS: Fast Approximate string Matching using OptimUm search Schemes

Kiavash Kianfar, Christopher Pockrandt, Bahman Torkamandi, Haochen Luo, Knut Reinert, 2017

Finding approximate occurrences of a pattern in a text using a full-text index is a central problem in bioinformatics. Bidirectional indices have opened new possibilities for solving the problem as they allow the search to be started from anywhere within the pattern and extended in both directions. In particular, use of search schemes (partitioning the pattern into several pieces and searching the pieces in certain orders with bounds on the number of errors in each piece) has shown significant potential in speeding up approximate matching. However, finding the optimal search scheme to maximize the search speed is a difficult combinatorial optimization problem. In this paper, we propose, for the first time, a method to solve the optimal search scheme problem for Hamming distance with given number of pieces. Our method is based on formulating the problem as a mixed integer program (MIP). We show that the optimal solutions found by our MIP significantly improve upon previously published ad-hoc solutions. Our MIP can solve problems of considerable size to optimality in reasonable time and has the attractive property of finding near-optimal solutions for much larger problems in a very short amount of time. In addition, we present FAMOUS (Fast Approximate string Matching using OptimUm search Schemes), a bidirectional search (for Hamming and edit distance) implemented in SeqAn that performs the search based on the optimal search schemes from our MIP. We show that FAMOUS is up to 35 times faster than standard backtracking and anticipate that it will improve many tools as a new core component for approximate matching and NGS data analysis. We exemplify this by searching Illumina reads completely in our index at a speed comparable to or faster than current read mapping tools. Finally, we pose several open problems regarding our MIP formulation and use of its solutions in bidirectional search.

The SeqAn C++ template library for efficient sequence analysis: A resource for programmers

Knut Reinert et al., 2017

Background:
The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome (Venter et al., 2001) would not have been possible without advanced assembly algorithms and the development of practical BWT based read mappers have been instrumental for NGS analysis. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there was a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. We previously addressed this by introducing the SeqAn library of efficient data types and algorithms in 2008 (Döring et al., 2008).
Results:
The SeqAn library has matured considerably since its first publication 9 years ago. In this article we review its status as an established resource for programmers in the field of sequence analysis and its contributions to many analysis tools.
Conclusions:
We anticipate that SeqAn will continue to be a valuable resource, especially since it started to actively support various hardware acceleration techniques in a systematic manner.

EPR-dictionaries: A practical and fast data structure for constant time searches in unidirectional and bidirectional FM-indices

Christopher Pockrandt, Marcel Ehrhardt, Knut Reinert, 2016

We introduce a new, practical method for conducting an exact search in a uni- and bidirectional FM index in \(\mathcal{O}(1)\) time per step while using \(\mathcal{O}(\log \sigma \cdot n) + o(\log \sigma \cdot \sigma \cdot n)\) bits of space. This is done by replacing the wavelet tree by a new data structure, the Enhanced Prefixsum Rank dictionary (EPR-dictionary). We implemented this method in the SeqAn C++ library and experimentally validated our theoretical results. In addition we compared our implementation with other freely available implementations of bidirectional indices and show that we are between \(\approx 2.7-4.6\) times faster. This will have a large impact for many bioinformatics applications that rely on practical implementations of (2)FM indices e.g. for read mapping. To our knowledge this is the first implementation of a constant time method for a search step in 2FM indices.

Master Thesis: Generic Implementation of a Bidirectional FM-Index in SeqAn and Applications

Christopher Pockrandt, 2015

Many alignment-tools are using unidirectional string indices, that only allow searching sequences into one direction. In this master thesis we are implementing a bidirectional FM index in SeqAn, a C++ library providing efficient algorithms and data structures for sequence analysis. This index can search a pattern in both directions, i.e. extend it character by character to the left as well as to the right in an arbitrary order. It allows to design more efficient search strategies for exact and approximate string matching, that we will implement and analyze by comparing it to common search strategies used by Lambda, a local alignment tool, optimized for protein alphabets.

Bachelor Thesis: Planarity Testing via PQ-Trees: Then and Now

Christopher Pockrandt, 2013

Graphs that can be drawn on the plane without its edges intersecting, are called planar graphs. They play an important part in computer science: several algorithmic problems can be solved more efficiently on planar graphs than on arbitrary ones, such as finding shortest paths or calculating maximum flows. There are numerous algorithms that decide in linear time, whether a graph is planar or not and if so, return an embedding in the plane. A fundamental milestone was the algorithm by Booth and Lueker based on a new data structure called PQ-trees. Many simpler algorithms were presented, which had different approaches or used a modification of the data structure. This thesis presents PQ-trees and outlines two algorithms: the original algorithm by Booth and Lueker (1976) and a recently developed, simplified one by Haeupler and Tarjan (2008) based on a reinterpretation of PQ-trees.

Teaching

Instructor/lecturer

  • summer 17 - ProInformatik III "Object Oriented Programming"
  • summer 17 - Software project "Usable Security for Smartphones"

Teaching assistant

  • winter 15/16 - Algorithms
  • summer 15 - ProInf: Functional Programming (with lectureship)
  • summer 14 - Software Engineering
  • winter 13/14 - Algorithms and Data Structures
  • summer 13 - Software Engineering
  • summer 12 - Software Engineering
  • winter 11/12 - Functional Programming

Honors

  • 2011 - 2015 Deutschlandstipendium (National Scholarship Program)
  • 2014 Promos Study Abroad Scholarship at National Taiwan University
  • 2012 Promos Study Abroad Scholarship at National University of Singapore