Sven.Rahmann - Publications List

Bioinformatics for High-Throughput Technologies
at the Chair of Algorithm Engineering (Ls11)
Faculty of Computer Science, TU Dortmund
D-44221 Dortmund, Germany

Office: Room 214,
Otto-Hahn-Str. 14
D-44227 Dortmund, Germany

Sven.Rahmann@tu-dortmund.de

Home page

Sven Rahmann became a professor for Bioinformatics for High-Throughput Technologies at the Chair of Algorithm Engineering, Computer Science Department, TU Dortmund in October 2007.

Between August and December 2007, he spent four months at HHMI Janelia Farm Research Campus as a visiting scientist in Gene Myers' lab.

Sven was an independent Junior Research Group leader of the Computational Methods for Emerging Technologies (COMET) group, formerly known as the Algorithms and Statistics for Systems Biology group, at Bielefeld University from March 2004 till September 2007. The group closely collaborated with the Genome Informatics group at the Faculty of Technology (Technische Fakultät) at Bielefeld University. During the same time, Sven was also a member of the Institute of Bioinformatics (IfB) at the Center for Biotechnology (CeBiTec), and part of the faculty of the Graduate School in Bioinformatics and Genome Research at Bielefeld University.

From January 2001 till February 2004, Sven wrote his doctoral thesis on oligonucleotide design for microarrays in the Computational Molecular Biology group at the Max Planck Institute for Molecular Genetics in Berlin.

Between 1994 and the end of 2000, Sven studied mathematics and computer science with a focus on statistical methods in bioinformatics at the universities of Göttingen, UC Santa Cruz, and Heidelberg. During this time, he worked as a freelance programmer for the Gothaer insurance company and as a student assistant in the Theoretical Bionformatics group of the German National Cancer Research Center (DKFZ), where he wrote his Diploma (M.Sc.) thesis on word statistics in random texts.


Journal articles

2009	DOI Jan Baumbach, Sven Rahmann, Andreas Tauch (2009) Reliable transfer of transcriptional gene regulatory networks between taxonomically related organisms. BMC Syst Biol 3: Abstract: BACKGROUND: Transcriptional regulation of gene activity is essential for any living organism. Transcription factors therefore recognize specific binding sites within the DNA to regulate the expression of particular target genes. The genome-scale reconstruction of the emerging regulatory networks is important for biotechnology and human medicine but cost-intensive, time-consuming, and impossible to perform for any species separately. By using bioinformatics methods one can partially transfer networks from well-studied model organisms to closely related species. However, the prediction quality is limited by the low level of evolutionary conservation of the transcription factor binding sites, even within organisms of the same genus. RESULTS: Here we present an integrated bioinformatics workflow that assures the reliability of transferred gene regulatory networks. Our approach combines three methods that can be applied on a large-scale: re-assessment of annotated binding sites, subsequent binding site prediction, and homology detection. A gene regulatory interaction is considered to be conserved if (1) the transcription factor, (2) the adjusted binding site, and (3) the target gene are conserved. The power of the approach is demonstrated by transferring gene regulations from the model organism Corynebacterium glutamicum to the human pathogens C. diphtheriae, C. jeikeium, and the biotechnologically relevant C. efficiens. For these three organisms we identified reliable transcriptional regulations for 40% of the common transcription factors, compared to 5% for which knowledge was available before. CONCLUSION: Our results suggest that trustworthy genome-scale transfer of gene regulatory networks between organisms is feasible in general but still limited by the level of evolutionary conservation. Notes:
	DOI Julia C Engelmann, Sven Rahmann, Matthias Wolf, Jörg Schulz, Epameinondas Fritzilas, Susanne Kneitz, Thomas Dandekar, Tobias Müller (2009) Modeling cross-hybridization on phylogenetic rDNA microarrays increases the detection power of closely related species Molecular Ecology Resources 9: 1. 83-93 Abstract: DNA microarrays are a popular technique for the detection of microorganisms. Several approaches using specific oligomers targeting one or a few marker genes for each species have been proposed. Data analysis is usually limited to call a species present when its oligomer exceeds a certain intensity threshold. While this strategy works reasonably well for distantly related species, it does not work well for very closely related species: Cross-hybridization of nontarget DNA prevents a simple identification based on signal intensity. The majority of species of the same genus has a sequence similarity of over 90%. For biodiversity studies down to the species level, it is therefore important to increase the detection power of closely related species. We propose a simple, cost-effective and robust approach for biodiversity studies using DNA microarray technology and demonstrate it on scenedesmacean green algae. The internal transcribed spacer 2 (ITS2) rDNA sequence was chosen as marker because it is suitable to distinguish all eukaryotic species even though parts of it are virtually identical in closely related species. We show that by modelling hybridization behaviour with a matrix algebra approach, we are able to identify closely related species that cannot be distinguished with a threshold on signal intensity. Thus this proof-of-concept study shows that by adding a simple and robust data analysis step to the evaluation of DNA microarrays, species detection can be significantly improved for closely related species with a high sequence similarity. Notes:
	Roland Schwarz, Philipp N Seibel, Sven Rahmann, Christoph Schoen, Mirja Huenerberg, Clemens Müller-Reible, Thomas Dandekar, Rachel Karchin, Jörg Schultz, Tobias Müller (2009) Detecting species-site dependencies in large multiple sequence alignments Nucleic Acids Res (accepted) Abstract: Notes:
	DOI Jan Baumbach, Andreas Tauch, Sven Rahmann (2009) Towards the integrated analysis, visualization and reconstruction of microbial gene regulatory networks. Brief Bioinform 10: 1. 75-83 Jan Abstract: To handle changing environmental surroundings and to manage unfavorable conditions, microbial organisms have evolved complex transcriptional regulatory networks. To comprehensively analyze these gene regulatory networks, several online available databases and analysis platforms have been implemented and established. In this article, we address the typical cycle of scientific knowledge exploration and integration in the area of procaryotic transcriptional gene regulation. We briefly review five popular, publicly available systems that support (i) the integration of existing knowledge, (ii) visualization capabilities and (iii) computer analysis to predict promising wet lab targets. We exemplify the benefits of such integrated data analysis platforms by means of four application cases exemplarily performed with the corynebacterial reference database CoryneRegNet. Notes:
	DOI PMID Tobias Marschall, Sven Rahmann (2009) Efficient exact motif discovery. Bioinformatics 25: 12. i356--i364 Abstract: MOTIVATION: The motif discovery problem consists of finding over-represented patterns in a collection of biosequences. It is one of the classical sequence analysis problems, but still has not been satisfactorily solved in an exact and efficient manner. This is partly due to the large number of possibilities of defining the motif search space and the notion of over-representation. Even for well-defined formalizations, the problem is frequently solved in an ad hoc manner with heuristics that do not guarantee to find the best motif. RESULTS: We show how to solve the motif discovery problem (almost) exactly on a practically relevant space of IUPAC generalized string patterns, using the p-value with respect to an i.i.d. model or a Markov model as the measure of over-representation. In particular, (i) we use a highly accurate compound Poisson approximation for the null distribution of the number of motif occurrences. We show how to compute the exact clump size distribution using a recently introduced device called probabilistic arithmetic automaton (PAA). (ii) We define two p-value scores for over-representation, the first one based on the total number of motif occurrences, the second one based on the number of sequences in a collection with at least one occurrence. (iii) We describe an algorithm to discover the optimal pattern with respect to either of the scores. The method exploits monotonicity properties of the compound Poisson approximation and is by orders of magnitude faster than exhaustive enumeration of IUPAC strings (11.8 h compared with an extrapolated runtime of 4.8 years). (iv) We justify the use of the proposed scores for motif discovery by showing our method to outperform other motif discovery algorithms (e.g. MEME, Weeder) on benchmark datasets. We also propose new motifs on Mycobacterium tuberculosis. Availability and Implementation: The method has been implemented in Java. It can be obtained from http://ls11-www.cs.tu-dortmund.de/people/marschal/paa_md/. Notes:

	DOI E Fritzilas, M Milanic, S Rahmann, Y A Rios-Solis (2009) Structural identifiability in low-rank matrix factorization Algorithmica Abstract: Notes:
	DOI Michael Zeschnigk, Marcel Martin, Gisela Betzl, Andreas Kalbe, Caroline Sirsch, Karin Buiting, Stephanie Gross, Epameinondas Fritzilas, Bruno Frey, Sven Rahmann, Bernhard Horsthemke (2009) Massive parallel bisulfite sequencing of CG-rich DNA fragments reveals that methylation of many X-chromosomal CpG islands in female blood DNA is incomplete. Hum Mol Genet 18: 8. 1439-1448 Apr Abstract: Methylation of CpG islands (CGIs) plays an important role in gene silencing. For genome-wide methylation analysis of CGIs in female white blood cells and in sperm, we used four restriction enzymes and a size selection step to prepare DNA libraries enriched with CGIs. The DNA libraries were treated with sodium bisulfite and subjected to a modified 454/Roche Genome Sequencer protocol. We obtained 163 034 and 129 620 reads from blood and sperm, respectively, with an average read length of 133 bp. Bioinformatic analysis revealed that 12 358 (7.6%) blood library reads and 10 216 (7.9%) sperm library reads map to 6167 and 5796 different CGIs, respectively. In blood and sperm DNA, we identified 824 (13.7%) and 482 (8.5%) fully methylated autosomal CGIs, respectively. Differential methylation, which is characterized by the presence of methylated and unmethylated reads of the same CGI, was observed in 53 and 52 autosomal CGIs in blood and sperm DNA, respectively. Remarkably, methylation of X-chromosomal CGIs in female blood cells was most often incomplete (25-75%). Such incomplete methylation was mainly found on the X-chromosome, suggesting that it is linked to X-chromosome inactivation. Notes:
2008	Jan Baumbach, Tobias Wittkop, Jochen Weile, Thomas Kohl, Sven Rahmann (2008) MoRAine - A web server for fast computational transcription factor binding motif re-annotation. Journal of Integrative Bioinformatics Abstract: Notes:
	DOI Utz J Pape, Sven Rahmann, Martin Vingron (2008) Natural similarity measures between position frequency matrices with an application to clustering Bioinformatics 24: 3. 350-357 Feb Abstract: MOTIVATION: Transcription factors (TFs) play a key role in gene regulation by binding to target sequences. In silico prediction of potential binding of a TF to a binding site is a well-studied problem in computational biology. The binding sites for one TF are represented by a position frequency matrix (PFM). The discovery of new PFMs requires the comparison to known PFMs to avoid redundancies. In general, two PFMs are similar if they occur at overlapping positions under a null model. Still, most existing methods compute similarity according to probabilistic distances of the PFMs. Here we propose a natural similarity measure based on the asymptotic covariance between the number of PFM hits incorporating both strands. Furthermore, we introduce a second measure based on the same idea to cluster a set of the Jaspar PFMs. RESULTS: We show that the asymptotic covariance can be efficiently computed by a two dimensional convolution of the score distributions. The asymptotic covariance approach shows strong correlation with simulated data. It outperforms three alternative methods. The Jaspar clustering yields distinct groups of TFs of the same class. Furthermore, a representative PFM is given for each class. In contrast to most other clustering methods, PFMs with low similarity automatically remain singletons. AVAILABILITY: A website to compute the similarity and to perform clustering, the source code and Supplementary Material are available at http://mosta.molgen.mpg.de. Notes:
	Cees Elzinga, Sven Rahmann, Hui Wang (2008) Algorithms for subsequence combinatorics Theor. Comput. Sci. 409: 3. 394-404 Abstract: Notes:
	Sérgio A de de Jr, Sven Rahmann (2008) Better genechip microarray layouts by combining probe placement and embedding. J Bioinform Comput Biol 6: 3. 623-641 Jun Abstract: The microarray layout problem is a generalization of the border length minimization problem, and asks to distribute oligonucleotide probes on a microarray and to determine their embeddings in the deposition sequence in such a way that the overall quality of the resulting synthesized probes is maximized. Because of its inherent computational complexity, it is traditionally attacked in several phases: partitioning, placement, and re-embedding. We present the first algorithm, Greedy+, that combines placement and embedding and that results in improved layouts in terms of border length and conflict index (a more realistic measure of probe quality), both on arrays of random probes and on existing Affymetrix GeneChip((R)) arrays. We also present a detailed study on the layouts of the latest GeneChip arrays, and show how Greedy+ can further improve layout quality by as much as 12% in terms of border length and 35% in terms of conflict index. Notes:
	Julia C Engelmann, Sven Rahmann, Matthias Wolf, Jörg Schultz, Epameinondas Fritzilas, Susanne Kneitz, Thomas Dandekar, Tobias Müller (2008) Modeling cross-hybridization on phylogenetic rDNA microarrays increases the detection power of closely related species Molecular Ecology Resources 9: 1. 83-93 Abstract: Notes:
	DOI Utz J Pape, Sven Rahmann, Fengzhu Sun, Martin Vingron (2008) Compound poisson approximation of the number of occurrences of a position frequency matrix (PFM) on both strands J Comput Biol 15: 6. 547-564 Abstract: Transcription factors play a key role in gene regulation by interacting with specific binding sites or motifs. Therefore, enrichment of binding motifs is important for genome annotation and efficient computation of the statistical significance, the p-value, of the enrichment of motifs is crucial. We propose an efficient approximation to compute the significance. Due to the incorporation of both strands of the DNA molecules and explicit modeling of dependencies between overlapping hits, we achieve accurate results for any DNA motif based on its Position Frequency Matrix (PFM) representation. The accuracy of the p-value approximation is shown by comparison with the simulated count distribution. Furthermore, we compare the approach with a binomial approximation, (compound) Poisson approximation, and a normal approximation. In general, our approach outperforms these approximations or is equally good but significantly faster. An implementation of our approach is available at http://mosta.molgen.mpg.de. Notes:
2007	PMID Daniel Gerlach, Matthias Wolf, Thomas Dandekar, Tobias Müller, Andreas Pokorny, Sven Rahmann (2007) Deep metazoan phylogeny. In Silico Biol 7: 2. 151-154 Abstract: We reconstructed a robust phylogenetic tree of the Metazoa, consisting of almost 1,500 taxa, by profile neighbor joining (PNJ), an automated computational method that inherits the efficiency of the neighbor joining algorithm. This tree supports the one proposed in the latest review on metazoan phylogeny. Our main goal is not to discuss aspects of the phylogeny itself, but rather to point out that PNJ can be a valuable tool when the basal branching pattern of a large phylogenetic tree must be estimated, whereas traditional methods would be computationally impractical. Notes:
	Jan Baumbach, Alexander Bunkowski, Sita Lange, Timm Oberwahrenbrock, Nils Kleinbölting, Sven Rahmann, Jörg Ingo Baumbach (2007) IMS2 – An integrated medical software system for early lung cancer detection using ion mobility spectrometry data of human breath Journal of Integrative Bioinformatics 4: Abstract: IMS2 is an Integrated Medical Software system for the analysis of Ion Mobility Spectrometry (IMS) data. It assists medical staff with the following IMS data processing steps: acquisition, visualization, classification, and annotation. IMS2 provides data analysis and interpretation features on the one hand, and also helps to improve the classification by increasing the number of the pre-classified datasets on the other hand. It is designed to facilitate early detection of lung cancer, one of the most common cancer types with one million deaths each year around the world. After reviewing the IMS technology, we first describe the software architecture of IMS2 and then the integrated classification module, including necessary pre-processing steps and different classification methods. The Lung Hospital Hemer (Germany) provided IMS data of 35 patients suffering from lung cancer and 72 samples of healthy persons. IMS2 correctly classifies 99% of the samples, evaluated using 10-fold cross-validation. Notes:

	DOI Tobias Wittkop, Jan Baumbach, Francisco P Lobo, Sven Rahmann (2007) Large scale clustering of protein sequences with FORCE – A layout based heuristic for weighted cluster editing. BMC Bioinformatics 8: Abstract: ABSTRACT: BACKGROUND: Detecting groups of functionally related proteins from their amino acid sequence alone has been a long-standing challenge in computational genome research. Several clustering approaches, following different strategies, have been published to attack this problem. Today, new sequencing technologies provide huge amounts of sequence data that has to be efficiently clustered with constant or increased accuracy, at increased speed. RESULTS: We advocate that the model of weighted cluster editing, also known as transitive graph projection is well-suited to protein clustering. We present the FORCE heuristic that is based on transitive graph projection and clusters arbitrary sets of objects, given pairwise similarity measures. In particular, we apply FORCE to the problem of protein clustering and show that it outperforms the most popular existing clustering tools (Spectral clustering, TribeMCL, GeneRAGE, Hierarchical clustering, and Affinity Propagation). Furthermore, we show that FORCE is able to handle huge datasets by calculating clusters for all 192,187 prokaryotic protein sequences (66 organisms) obtained from the COG database. Finally, FORCE is integrated into the corynebacterial reference database CoryneRegNet. CONCLUSIONS: FORCE is an applicable alternative to existing clustering algorithms. Its theoretical foundation, weighted cluster editing, can outperform other clustering paradigms on protein homology clustering. FORCE is open source and implemented in Java. The software, including the source code, the clustering results for COG and CoryneRegNet, and all evaluation datasets are available at http://gi.cebitec.uni-bielefeld.de/comet/force/. Notes:
	Gunnar W Klau, Sven Rahmann, Alexander Schliep, Martin Vingron, Knut Reinert (2007) Integer linear programming approaches for non-unique probe selection Discrete Applied Mathematics 155: 6–7. 840-856 Abstract: Notes:
	DOI Heiko Neuweger, Jan Baumbach, Stefan Albaum, Thomas Bekel, Michael Dondrup, Andrea T Hüser, Jörn Kalinowski, Sebastian Oehm, Alfred Pühler, Sven Rahmann, Jochen Weile, Alexander Goesmann (2007) CoryneCenter - an online resource for the integrated analysis of corynebacterial genome and transcriptome data BMC Systems Biology 1: Abstract: BACKGROUND: The introduction of high-throughput genome sequencing and post-genome analysis technologies, e.g. DNA microarray approaches, has created the potential to unravel and scrutinize complex gene-regulatory networks on a large scale. The discovery of transcriptional regulatory interactions has become a major topic in modern functional genomics. RESULTS: To facilitate the analysis of gene-regulatory networks, we have developed CoryneCenter, a web-based resource for the systematic integration and analysis of genome, transcriptome, and gene regulatory information for prokaryotes, especially corynebacteria. For this purpose, we extended and combined the following systems into a common platform: (1) GenDB, an open source genome annotation system, (2) EMMA, a MAGE compliant application for high-throughput transcriptome data storage and analysis, and (3) CoryneRegNet, an ontology-based data warehouse designed to facilitate the reconstruction and analysis of gene regulatory interactions. We demonstrate the potential of CoryneCenter by means of an application example. Using microarray hybridization data, we compare the gene expression of Corynebacterium glutamicum under acetate and glucose feeding conditions: Known regulatory networks are confirmed, but moreover CoryneCenter points out additional regulatory interactions. CONCLUSION: CoryneCenter provides more than the sum of its parts. Its novel analysis and visualization features significantly simplify the process of obtaining new biological insights into complex regulatory systems. Although the platform currently focusses on corynebacteria, the integrated tools are by no means restricted to these species, and the presented approach offers a general strategy for the analysis and verification of gene regulatory networks. CoryneCenter provides freely accessible projects with the underlying genome annotation, gene expression, and gene regulation data. The system is publicly available at http://www.CoryneCenter.de. Notes:
	Gunnar W Klau, Sven Rahmann, Alexander Schliep, Martin Vingron, Knut Reinert (2007) Integer linear programming approaches for non-unique probe selection Discrete Applied Mathematics 155: 6–7. 840-856 Abstract: Notes:
	Jan Baumbach, Tobias Wittkop, Sven Rahmann, Karina Brinkrolf, Andreas Tauch (2007) CoryneRegNet 3.0 – An interactive bioinformatics platform for the analysis of gene regulatory networks in corynebacteria and Escherichia coli Journal of Biotechnology 129: 279-289 Abstract: Notes:
2006	Jan Baumbach, Karina Brinkrolf, Tobias Wittkop, Andreas Tauch, Sven Rahmann (2006) CoryneRegNet 2 : An integrative bioinformatics approach for reconstruction and comparison of transcriptional regulatory networks in prokaryotes Journal of Integrative Bioinformatics 3: 2. Abstract: Notes:
	DOI Jan Baumbach, Karina Brinkrolf, Lisa Czaja, Sven Rahmann, Andreas Tauch (2006) CoryneRegNet : An ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. BMC Genomics 7: Abstract: ABSTRACT: BACKGROUND: The application of DNA microarray technology in post-genomic analysis of bacterial genome sequences has allowed the generation of huge amounts of data related to regulatory networks. This data along with literature-derived knowledge on regulation of gene expression has opened the way for genome-wide reconstruction of transcriptional regulatory networks. These large-scale reconstructions can be converted into in silico models of bacterial cells that allow a systematic analysis of network behavior in response to changing environmental conditions. Description CoryneRegNet was designed to facilitate the genome-wide reconstruction of transcriptional regulatory networks of corynebacteria relevant in biotechnology and human medicine. During the import and integration process of data derived from experimental studies or literature knowledge CoryneRegNet generates links to genome annotations, to identified transcription factors and to the corresponding cis-regulatory elements. CoryneRegNet is based on a multi-layered, hierarchical and modular concept of transcriptional regulation and was implemented by using the relational database management system MySQL and an ontology-based data structure. Reconstructed regulatory networks can be visualized by using the yFiles JAVA graph library. As an application example of CoryneRegNet, we have reconstructed the global transcriptional regulation of a cellular module involved in SOS and stress response of corynebacteria. CONCLUSIONS: CoryneRegNet is an ontology-based data warehouse that allows pertinent data management of regulatory interactions along with the genome-scale reconstruction of transcriptional regulatory networks. These models can further be combined with metabolic networks to build integrated models of cellular function including both metabolism and its transcriptional regulation. Notes:
	Alexander Schliep, Sven Rahmann (2006) Decoding non-unique oligonucleotide hybridization experiments of targets related by a phylogenetic tree Bioinformatics 22: 14. Abstract: Notes: Joint first authorship.
2005	Stefan Röpcke, Steffen Grossmann, Sven Rahmann, Martin Vingron (2005) T-Reg Comparator : an analysis tool for the comparison of position weight matrices Nucleic Acids Research 33: Abstract: Notes:
2004	Hans-Helmut Niller, D Salamon, Sven Rahmann, K Ilg, A Koroknai, F Banati, F Schwarzmann, H Wolf, J Minarovits (2004) A 30 kb region of the Epstein-Barr virus genome is colinear with the rearranged human immunoglobulin gene loci : implications for a "ping-pong evolution" model for persisting viruses and their hosts. A review. Acta Microbiol. Immunol. Hung. 51: 4. 469-484 Abstract: Notes:

	Gunnar W Klau, Sven Rahmann, Alexander Schliep, Martin Vingron, Knut Reinert (2004) Optimal robust non-unique probe selection using integer linear programming Bioinformatics 20: Suppl.1. Abstract: Notes:
	DOI PMID Tobias Müller, Sven Rahmann, Thomas Dandekar, Matthias Wolf (2004) Accurate and robust phylogeny estimation based on profile distances: a study of the Chlorophyceae (Chlorophyta). BMC Evol Biol 4: 20 Abstract: BACKGROUND: In phylogenetic analysis we face the problem that several subclade topologies are known or easily inferred and well supported by bootstrap analysis, but basal branching patterns cannot be unambiguously estimated by the usual methods (maximum parsimony (MP), neighbor-joining (NJ), or maximum likelihood (ML)), nor are they well supported. We represent each subclade by a sequence profile and estimate evolutionary distances between profiles to obtain a matrix of distances between subclades. RESULTS: Our estimator of profile distances generalizes the maximum likelihood estimator of sequence distances. The basal branching pattern can be estimated by any distance-based method, such as neighbor-joining. Our method (profile neighbor-joining, PNJ) then inherits the accuracy and robustness of profiles and the time efficiency of neighbor-joining. CONCLUSIONS: Phylogenetic analysis of Chlorophyceae with traditional methods (MP, NJ, ML and MrBayes) reveals seven well supported subclades, but the methods disagree on the basal branching pattern. The tree reconstructed by our method is better supported and can be confirmed by known morphological characters. Moreover the accuracy is significantly improved as shown by parametric bootstrap. Notes:
	Sven Rahmann, Christine Gräfe (2004) Mean and variance of the Gibbs free energy of oligonucleotides in the nearest neighbor model under varying conditions Bioinformatics 20: 17. 2928-2933 Abstract: Notes:
	DOI Benjamin Schuster-Böckler, Jörg Schultz, Sven Rahmann (2004) HMM Logos for visualization of protein families BMC Bioinformatics 5: Abstract: Notes:
	DOI Benjamin Schuster-Böckler, Jörg Schultz, Sven Rahmann (2004) HMM Logos for visualization of protein families BMC Bioinformatics 5: Abstract: Notes:
	Christoph Dieterich, Sven Rahmann, Martin Vingron (2004) Functional inference from non-random distributions of conserved predicted transcription factor binding sites Bioinformatics 20: Suppl.1. Abstract: Notes:
2003	Sven Rahmann, Tobias Müller, Martin Vingron (2003) On the power of profiles for transcription factor binding site detection Statistical Applications in Genetics and Molecular Biology 2: 1. Abstract: Notes:
	Sven Rahmann (2003) Fast large-scale oligonucleotide selection using the longest common factor approach Journal of Bioinformatics and Computational Biology 1: 2. 343-361 Abstract: Notes:
	Eric Rivals, Sven Rahmann (2003) Combinatorics of periods in strings Journal of Combinatorial Theory Series A 104: 1. 95-113 Abstract: Notes:
	Sven Rahmann, Eric Rivals (2003) On the distribution of the number of missing words in random texts Combinatorics, Probability and Computing 12: 73-87 Abstract: Notes:
	Sven Rahmann (2003) The shortest common supersequence problem in a microarray production setting Bioinformatics 19: Suppl.2. Abstract: Notes:
2001	PMID Tobias Müller, Sven Rahmann, Marc Rehmsmeier (2001) Non-symmetric score matrices and the detection of homologous transmembrane proteins. Bioinformatics 17 Suppl 1: S182-S189 Abstract: Given a transmembrane protein, we wish to find related ones by a database search. Due to the strongly hydrophobic amino acid composition of transmembrane domains, suboptimal results are obtained when general-purpose scoring matrices such as BLOSUM are used. Recently, a transmembrane-specific score matrix called PHAT was shown to perform much better than BLOSUM. In this article, we derive a transmembrane score matrix family, called SLIM, which has several distinguishing features. In contrast to currently used matrices, SLIM is non-symmetric. The asymmetry arises because different background compositions are assumed for the transmembrane query and the unknown database sequences. We describe the mathematical model behind SLIM in detail and show that SLIM outperforms PHAT both on simulated data and in a realistic setting. Since non-symmetric score matrices are a new concept in database search methods, we discuss some important theoretical and practical issues. Notes:
Book chapters

2009	Tobias Marschall, Marcel Martin, Sven Rahmann (2009) Extending SeqAn with a BWT-based suffix array construction In: Biological Sequence Analysis Using the SeqAn C++ Library Edited by:Andreas Gogol-Döring, Knut Reinert. Chapman & Hall Abstract: Notes:
2008	Sven Rahmann, Gunnar W Klau (2008) Integer Linear Programming Techniques for Discovering Approximate Gene Clusters In: Bioinformatics Algorithms : Techniques and Applications Edited by:Ion Mandoiu, Alexander Zelikovsky. 203-222 Wiley Abstract: Notes:
	Sérgio A de de Jr, Sven Rahmann (2008) Modeling and optimizing oligonucleotide microarray layout In: Bioinformatics Algorithms : Techniques and Applications Edited by:Ion Mandoiu, Alexander Zelikowsky. 279-302 Wiley Abstract: Notes:
2006	Sven Rahmann, Tobias Müller, Thomas Dandekar, Matthias Wolf (2006) Efficient and robust analysis of large phylogenetic datasets In: Advanced Data Mining Technologies in Bioinformatics Edited by:Hui-Huang Hsu. 104-117 Hershey, PA, USA: Idea Group, Inc. Abstract: The goal of phylogenetics is to reconstruct ancestral relationships between different taxa, e.g., different species in the tree of life, by means of certain characters, such as genomic sequences. We consider the prominent problem of reconstructing the basal phylogenetic tree topology when several subclades have already been identified or are well known by other means, such as morphological characteristics. Whereas most available tools attempt to estimate a fully resolved tree from scratch, the profile neighbor-joining (PNJ) method focuses directly on the mentioned problem and has proven a robust and efficient method for large-scale data sets, especially when used in an iterative way. We describe an implementation of this idea, the ProfDist software package, which is freely available, and apply the method to estimate the phylogeny of the eukaryotes. Overall, the PNJ approach provides a novel effective way to mine large sequence datasets for relevant phylogenetic information. Notes:
2005	Mathew J Palakal, Sven Rahmann, Birong Liao (2005) Editorial message : special track on bioinformatics In: Proceedings of the ACM Symposium on Applied Computing (SAC) ACM Abstract: Notes:
	Sven Rahmann (2005) Algorithmen für das Design von DNA-Microarrays In: Ausgezeichnete Informatikdissertationen 2004 Edited by:Dorthea Wagner, others. 145-154 GI Abstract: Notes:
	Sven Rahmann (2005) Algorithmen für das Design von DNA-Microarrays In: Ausgezeichnete Informatikdissertationen 2004 Edited by:Dorthea Wagner, others. 145-154 GI Abstract: Notes:
2002	Sven Rahmann, Tobias Müller (2002) A new formulation of evolutionary models In: Currents in Computational Molecular Biology 2002 Edited by:Liliana Florea, Brian Walenz, Sridhar Hannenhalli. 156-157 Washington D.C., USA: RECOMB 2002 Abstract: Notes:
1999	Sven Rahmann, Eric Rivals, Martin Vingron (1999) A generalization of Jokinen’s and Ukkonen’s fundamental $q$-gram lemma In: ISMB’99 poster abstracts Heidelberg: GI Abstract: Notes:
Conference papers

2009	Sven Rahmann, Tobias Marschall, Frank Behler, Oliver Kramer (2009) Modeling evolutionary fitness for DNA motif discovery In: Genetic and Evolutionary Computation Conference, GECCO 2009, Proceedings, Montreal, Québec, Canada, July 8-12, 2009 Edited by:Franz Rothlauf. 225-232 ACM Abstract: Notes:
2008	Tobias Marschall, Sven Rahmann (2008) Probabilistic Arithmetic Automata and their Application to Pattern Matching Statistics In: Combinatorial Pattern Matching (CPM’08) Edited by:Paolo Ferragina, Gad Landau. 95-106 Springer Abstract: Notes:
	Inke Herms, Sven Rahmann (2008) Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata In: Algorithms in Bioinformatics (WABI’08) Edited by:Keith Crandall, Jens Lagergren. 318-329 Springer Abstract: Notes:
	Epameinondas Fritzilas, Yasmin A Rios-Solis, Sven Rahmann (2008) Structural Identifiability in Low-Rank Matrix Factorization In: COCOON (Computing and Combinatorics) Edited by:Xiaodong Hu, Jie Wang. 140-148 Springer Abstract: Notes:
2007	Christian Höner zu Siederdissen, Susanne Ragg, Sven Rahmann (2007) Discovering biomarkers for myocardial infarction from SELDI-TOF spectra In: Advances in Data Analysis –- Proceedings of the 30th Annual Conference of the Gesellschaft für Klassifikation e.V. Edited by:R Decker, H J Lenz. 569-576 Springer Abstract: Notes:

	Sven Rahmann, Tobias Wittkop, Jan Baumbach, Marcel Martin, Anke Truß, Sebastian Böcker (2007) Exact and Heuristic Algorithms for Weighted Cluster Editing In: Computational Systems Bioinformatics Conference 391-401 Abstract: Clustering objects according to given similarity or distance values is a ubiquitous problem in computational biology with diverse applications, e.g., in defining families of orthologous genes, or in the analysis of microarray experiments. While there exists a plenitude of methods, many of them produce clusterings that can be further improved. "Cleaning up" initial clusterings can be formalized as projecting a graph on the space of transitive graphs; it is also known as the cluster editing or cluster partitioning problem in the literature. In contrast to previous work on cluster editing, we allow arbitrary weights on the similarity graph. To solve the so-defined weighted transitive graph projection problem, we present (1) the first exact fixed-parameter algorithm, (2) a polynomial-time greedy algorithm that returns the optimal result on a well-defined subset of "close-to-transitive" graphs and works heuristically on other graphs, and (3) a fast heuristic that uses ideas similar to those from the Fruchterman-Reingold graph layout algorithm. We compare quality and running times of these algorithms on both artificial graphs and protein similarity graphs derived from the 66 organisms of the COG dataset. Notes:
	Hans-Michael Kaltenbach, Sebastian Böcker, Sven Rahmann (2007) Markov Additive Chains and Applications to Fragment Statistics for Peptide Mass Fingerprinting In: Proceedings of the RECOMB Satellite Conference on Systems Biology and Computational Proteomics 2006 29-41 Springer Abstract: Notes:
	Sérgio Anibal de de Jr, Sven Rahmann (2007) Improving the design of GeneChip arrays by combining placement and embedding In: Computational Systems Bioformatics Conference 417-427 Abstract: The microarray layout problem is a generalization of the border length minimization problem and asks to distribute oligonucleotide probes on a microarray and to determine their embeddings in the deposition sequence in such a way that the overall quality of the resulting synthesized probes is maximized. Because of its inherent computational complexity, it is traditionally attacked in several phases: partitioning, placement, and re-embedding. We present the first algorithm, Greedy+, that combines placement and embedding and results in improved layouts in terms of border length and conflict index (a more realistic measure of probe quality), both on arrays of random probes and on existing Affymetrix GeneChip((R)) arrays. We also present a large-scale study on how the layouts of GeneChip arrays have improved over time, and show how Greedy+ can further improve layout quality by as much as 8% in terms of border length and 34% in terms of conflict index. Notes:
2006	Sven Rahmann (2006) Subsequence combinatorics and applications to microarray production, DNA sequencing and chaining algorithms In: Combinatorial Pattern Matching (CPM) Edited by:Moshe Lewenstein, Gabriel Valiente. 153-164 Abstract: Notes:
	Sven Rahmann, Gunnar W Klau (2006) Integer linear programs for discovering approximate gene clusters In: Proceedings of the 6th Workshop on Algorithms in Bioinformatics (WABI) Edited by:Philipp Bucher, Bernard Moret. 298-309 Springer Abstract: Notes:
	Sérgio A de de Jr, Sven Rahmann (2006) Microarray layout as a quadratic assignment problem In: Proceedings of the German Conference on Bioinformatics Edited by:Daniel Huson, Oliver Kohlbacher, Andrei Lupas, Kay Nieselt, Andreas Zell. 11-20 GI Abstract: Notes:
	Sérgio A de de Jr, Sven Rahmann (2006) Improving the layout of oligonucleotide microarrays : Pivot Partitioning In: Proceedings of the 6th Workshop of Algorithms in Bioinformatics Edited by:Philipp Bucher, Bernard Moret. 321-332 Springer Abstract: Notes:
2003	Alexander Schliep, David C Torney, Sven Rahmann (2003) Group testing with DNA chips : Generating designs and decoding experiments In: Proceedings of the 2nd IEEE Computer Society Bioinformatics Conference (CSB 2003) 84-93 IEEE Abstract: Notes:
	Tobias Müller, Sven Rahmann, Thomas Dandekar, Matthias Wolf (2003) Robust estimation of the phylogeny of Chlorophyceae (Chlorophyta) based on profile distances In: Proceedings of the German Conference on Bioinformatics (GCB’03) Edited by:H W Mewes, D Frishman, V Heun, S Kramer. 97-101 Munich: belleville Abstract: Notes:
	Sven Rahmann (2003) Dynamic programming algorithms for two statistical problems in computational biology In: Proceedings of the 3rd Workshop of Algorithms in Bioinformatics (WABI) 151-164 Springer Abstract: Notes:
	Sven Rahmann (2003) Fast and sensitive probe selection for DNA chips using jumps in matching statistics In: Proceedings of the 2nd IEEE Computer Society Bioinformatics Conference (CSB 2003) 57-64 IEEE Abstract: Notes:
2002	Sven Rahmann (2002) Rapid large-scale oligonucleotide selection for microarrays In: Proceedings of the First IEEE Computer Society Bioinformatics Conference (CSB 2002) 54-63 IEEE Abstract: Notes: An abstract of this work also appeared in Proceedings of WABI 2002, LNCS vol. 2452, p. 434. Springer, 2002.
2001	Eric Rivals, Sven Rahmann (2001) Combinatorics of periods in strings In: Proceedings of the 28th International Colloquium on Automata, Languages, and Programming (ICALP 2001) Edited by:P Orejas, P G Spirakis, J van Leuween. 615-626 Springer Abstract: Notes:
2000	Sven Rahmann, Eric Rivals (2000) Exact and efficient computation of the expected number of missing and common words in random texts In: Proceedings of the 11th Symposium on Combinatorial Pattern Matching (CPM 2000) Edited by:David Sankoff, Raffaele Giancarlo. 375-387 Springer Abstract: Notes:
Other

2006	Cedric Chauve, Yoan Diekmann, Steffen Heber, Julia Mixtacki, Sven Rahmann, Jens Stoye (2006) On common intervals with errors Abstract: Notes: ISSN 0946-7831.
2005	Hans-Michael Kaltenbach, Henner Sudek, Sebastian Böcker, Sven Rahmann (2005) Statistics of cleavage fragments in random weighted strings Abstract: Notes:
	Sérgio A de de Jr, Sven Rahmann (2005) Searching for the shortest common supersequence Abstract: Notes:
2003	Sven Rahmann (2003) Spezielle Methoden und Anwendungen der Statistik in der Bioinformatik Lecture notes for the Akademie für Weiterbildung der Universitäten Heidelberg und Mannheim Abstract: Notes: Lecture notes for a course given at the Akademie fÃ¼r Weiterbildung der UniversitÃ¤ten Heidelberg und Mannheim. In German
2002–2004	Sven Rahmann (2002–2004) Using and designing generalized regular expressions to discover bad diction in scientific writing Abstract: Notes: Unpublished manuscript
	Sven Rahmann (2002–2004) Using and designing generalized regular expressions to discover bad diction in scientific writing Abstract: Notes: Unpublished manuscript
1999	Betty Lazareva, Sven Rahmann, David Haussler (1999) Towards an accurate EST consensus Abstract: Notes: Unpublished manuscript
PhD theses

2004	Sven Rahmann (2004) Algorithms for probe selection and DNA microarray design Freie Universität Berlin Abstract: Notes:
Masters theses

2000	Sven Rahmann (2000) Word statistics in random texts and applications to computational molecular biology Universität Heidelberg Abstract: Notes:

Journal articles

Book chapters

Conference papers

Other

PhD theses

Masters theses