hosted by
publicationslist.org
    

Tobias Müller

Dr. Tobias Müller
Senior Lecturer (Akademischer Oberrat)
Department of Bioinformatics
Biocenter (Theodor-Boveri-Institute)
Bayerische Julius-Maximilians University of Würzburg
Am Hubland, D-97074 Würzburg, Germany
Tobias.Mueller@biozentrum.uni-wuerzburg.de
Short Curriculum
2011 - today Senior Lecturer, Department of Bioinformatics, University of Würzburg, Germany
2003 - 2011 Lecturer, Department of Bioinformatics, University of Würzburg, Germany
2001 - 2003 Postdoc, Max Planck Institute for Molecular Genetics, Computational Biology, Berlin, Germany
1999 - 2000 Pre doctoral Fellow, Max Planck Institute for Molecular Genetics, Computational Biology, Berlin, Germany
1996 - 1999 Pre doctoral Fellow, German Cancer Research Center (DKFZ), Division of Theoretical Bioinformatics, Heidelberg, Germany
1991 - 1995 Teaching Assistant at the Mathematical Institute of the University of Bonn

Journal articles

2012
Benjamin Merget, Christian Koetschan, Thomas Hackl, Frank Förster, Thomas Dandekar, Tobias Müller, Jörg Schultz, Matthias Wolf (2012)  The ITS2 Database.   J Vis Exp 61. 03  
Abstract: The internal transcribed spacer 2 (ITS2) has been used as a phylogenetic marker for more than two decades. As ITS2 research mainly focused on the very variable ITS2 sequence, it confined this marker to low-level phylogenetics only. However, the combination of the ITS2 sequence and its highly conserved secondary structure improves the phylogenetic resolution(1) and allows phylogenetic inference at multiple taxonomic ranks, including species delimitation(2-8). The ITS2 Database(9) presents an exhaustive dataset of internal transcribed spacer 2 sequences from NCBI GenBank(11) accurately reannotated(10). Following an annotation by profile Hidden Markov Models (HMMs), the secondary structure of each sequence is predicted. First, it is tested whether a minimum energy based fold(12) (direct fold) results in a correct, four helix conformation. If this is not the case, the structure is predicted by homology modeling(13). In homology modeling, an already known secondary structure is transferred to another ITS2 sequence, whose secondary structure was not able to fold correctly in a direct fold. The ITS2 Database is not only a database for storage and retrieval of ITS2 sequence-structures. It also provides several tools to process your own ITS2 sequences, including annotation, structural prediction, motif detection and BLAST(14) search on the combined sequence-structure information. Moreover, it integrates trimmed versions of 4SALE(15,16) and ProfDistS(17) for multiple sequence-structure alignment calculation and Neighbor Joining(18) tree reconstruction. Together they form a coherent analysis pipeline from an initial set of sequences to a phylogeny based on sequence and secondary structure. In a nutshell, this workbench simplifies first phylogenetic analyses to only a few mouse-clicks, while additionally providing tools and data for comprehensive large-scale analyses.
Notes:
Christian Koetschan, Thomas Hackl, Tobias Müller, Matthias Wolf, Frank Förster, Jörg Schultz (2012)  ITS2 Database IV: Interactive taxon sampling for internal transcribed spacer 2 based phylogenies.   Mol Phylogenet Evol Feb  
Abstract: The first step of any molecular phylogenetic analysis is the selection of the species and sequences to be included, the taxon sampling. Already here different pitfalls exist. Sequences can contain errors, annotations in databases can be inaccurate and even the taxonomic classification of a species can be wrong. Usually, these artefacts become evident only after calculation of the phylogenetic tree. Following, the taxon sampling has to be corrected iteratively. This can become tedious and time consuming, as in most cases the taxon sampling is de-coupled from the further steps of the phylogenetic analysis. Here, we present the ITS2 Workbench (http://its2.bioapps.biozentrum.uni-wuerzburg.de/), which eliminates this problem by a tight integration of taxon sampling, secondary structure prediction, multiple alignment and phylogenetic tree calculation. The ITS2 Workbench has access to more than 280,000 ITS2 sequences and their structures provided by the ITS2 database enabling sequence-structure based alignment and tree reconstruction. This allows the interactive improvement of the taxon sampling throughout the whole phylogenetic tree reconstruction process. Thus, the ITS2 Workbench enables a fast, interactive and iterative taxon sampling leading to more accurate ITS2 based phylogenies.
Notes:
2011
Biju Joseph, Roland F Schwarz, Burkhard Linke, Jochen Blom, Anke Becker, Heike Claus, Alexander Goesmann, Matthias Frosch, Tobias Müller, Ulrich Vogel, Christoph Schoen (2011)  Virulence evolution of the human pathogen Neisseria meningitidis by recombination in the core and accessory genome.   PLoS One 6: 4. e18441  
Abstract: Neisseria meningitidis is a naturally transformable, facultative pathogen colonizing the human nasopharynx. Here, we analyze on a genome-wide level the impact of recombination on gene-complement diversity and virulence evolution in N. meningitidis. We combined comparative genome hybridization using microarrays (mCGH) and multilocus sequence typing (MLST) of 29 meningococcal isolates with computational comparison of a subset of seven meningococcal genome sequences.
Notes:
Marco Albrecht, Cynthia M Sharma, Marcus T Dittrich, Tobias Müller, Richard Reinhardt, Jörg Vogel, Thomas Rudel (2011)  The Transcriptional Landscape of Chlamydia pneumoniae   Genome Biology 12: R98  
Abstract: Background: Gene function analysis of the obligate intracellular bacterium Chlamydia pneumoniae is hampered by the facts that this organism is inaccessible to genetic manipulations and not cultivable outside the host. The genomes of several strains have been sequenced; however, very little information is available on the gene structure and transcriptome of C. pneumoniae. Results: Using a differential RNA-sequencing approach with specific enrichment of primary transcripts, we defined the transcriptome of purified elementary bodies and reticulate bodies of C. pneumoniae strain CWL-029. 565 transcriptional start sites of annotated genes and novel transcripts were mapped. Analysis of adjacent genes for co-transcription revealed 246 polycistronic transcripts. In total, a distinct transcription start site or an affiliation to an operon could be assigned to 862 out of 1074 annotated protein coding genes. Semi-quantitative analysis of mapped cDNA reads revealed significant differences for 288 genes in the RNA levels of genes isolated from elementary bodies and reticulate bodies. We have identified and in part confirmed 75 novel putative non-coding RNAs. The detailed map of transcription start sites at single nucleotide resolution allowed for the first time a comprehensive and saturating analysis of promoter consensus sequences in Chlamydia. Conclusions: The precise transcriptional landscape as a complement to the genome sequence will provide new insights into the organization, control and function of genes. Novel non-coding RNAs and identified common promoter motifs will help to understand gene regulation of this important human pathogen.
Notes:
Andreas Floren, Tobias Müller, Christa Deeleman-Reinhold, Karl Eduard Linsenmair (2011)  Effects of forest fragmentation on canopy spider communities in SE-Asian rain forests   ECOTROPICA 7: 15-26  
Abstract: Species diversity is by far highest in arthropods, and is in the tropics exceptionally high in the tree canopy. Until today canopy diversity has been neglected in research, however, so that the real impact of anthropogenic forest destruction on species diversity, as well as its functional importance can hardly be assessed. We collected canopy arthropods by insecticidal knockdown fogging in SE-Asian lowland rain forests between 1992 and 2001, and here we use spiders to investigate the consequences of slash-and-burn cultivation. We measured species diversity in a primary forest and in six forest fragments of different age and degree of isolation. Our statistical analysis suggests that neither year of sampling nor tree species significantly affected spider communities. By contrast, spider communities were clearly determined by forest isolation followed by forest age, both resulting in spider communities specific to different forest types. In respect of guild composition, spider communities in the isolated forests were most clearly affected with the result that orb-web weavers had increased at the expense of sheet-web weavers, agile hunters, and cursorial hunters. Species richness was positively correlated with forest fragment age only under conditions where colonization was possible. In those gradient forests which adjoined primary forests, communities approximated those in the primary forest within 40 years. In contrast, a distance of 10 km effectively prevented re-immigration, resulting in low-diversity communities that showed hardly any development in 50 years. Our data suggest that most primary forest spiders are habitat specialists with restricted dispersal ability, indicating that the erosion of biodiversity can only be stopped by a high degree of habitat connectivity
Notes:
2010
Alexander Keller, Frank Förster, Tobias Müller, Thomas Dandekar, Jörg Schultz, Matthias Wolf (2010)  Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees.   Biol Direct 5: 1. 01  
Abstract: In several studies, secondary structures of ribosomal genes have been used to improve the quality of phylogenetic reconstructions. An extensive evaluation of the benefits of secondary structure, however, is lacking.
Notes:
Torben Friedrich, Christian Koetschan, Tobias Müller (2010)  Optimisation of HMM Topologies Enhances DNA and Protein Sequence Modelling   Statistical Applications in Genetics and Molecular Biology 9: 1. Article 6  
Abstract: Hidden Markov models (HMMs) play a major role in applications to unravel biomolecular functionality. Though HMMs are technically mature and widely applied in computational biology, there is a potential of methodical optimisation concerning its modelling of biological data sources with varying sequence lengths. Single building blocks of these models, the states, are associated with a certain holding time, being the link to the length distribution of represented sequence motifs. An adaptation of regular HMM topologies to bell-shaped sequence lengths is achieved by a serial chain-linking of hidden states, while residing in the class of conventional hidden Markov models. The factor of the repetition of states (r) and the parameter for state-specific duration of stay (p) are determined by fitting the distribution of sequence lengths with the method of moments (MM) and maximum likelihood (ML). Performance evaluations of differently adjusted HMM topologies underline the impact of an optimisation for HMMs based on sequence lengths. Secondary structure prediction on internal transcribed spacer 2 sequences demonstrates exemplarily the general impact of topological optimisations. In summary, we propose a general methodology to improve the modelling behaviour of HMMs by topological optimisation with ML and a fast and easily implementable moment estimator.
Notes:
Daniela Beisser, Gunnar W Klau, Thomas Dandekar, Tobias Müller, Marcus T Dittrich (2010)  BioNet: an R-Package for the functional analysis of biological networks.   Bioinformatics 26: 8. 1129-1130 Apr  
Abstract: MOTIVATION: Increasing quantity and quality of data in transcriptomics and interactomics create the need for integrative approaches to network analysis. Here, we present a comprehensive R-package for the analysis of biological networks including an exact and a heuristic approach to identify functional modules. RESULTS: The BioNet package provides an extensive framework for integrated network analysis in R. This includes the statistics for the integration of transcriptomic and functional data with biological networks, the scoring of nodes as well as methods for network search and visualization. AVAILABILITY: The BioNet package and a tutorial are available from http://bionet.bioapps.biozentrum.uni-wuerzburg.de.
Notes:
Christian Koetschan, Frank Förster, Alexander Keller, Tina Schleicher, Benjamin Ruderisch, Roland Schwarz, Tobias Müller, Matthias Wolf, Jörg Schultz (2010)  The ITS2 Database III--sequences and structures for phylogeny.   Nucleic Acids Res 38: Database issue. D275-D279 Jan  
Abstract: The internal transcribed spacer 2 (ITS2) is a widely used phylogenetic marker. In the past, it has mainly been used for species level classifications. Nowadays, a wider applicability becomes apparent. Here, the conserved structure of the RNA molecule plays a vital role. We have developed the ITS2 Database (http://its2.bioapps.biozentrum.uni-wuerzburg.de) which holds information about sequence, structure and taxonomic classification of all ITS2 in GenBank. In the new version, we use Hidden Markov models (HMMs) for the identification and delineation of the ITS2 resulting in a major redesign of the annotation pipeline. This allowed the identification of more than 160,000 correct full length and more than 50,000 partial structures. In the web interface, these can now be searched with a modified BLAST considering both sequence and structure, enabling rapid taxon sampling. Novel sequences can be annotated using the HMM based approach and modelled according to multiple template structures. Sequences can be searched for known and newly identified motifs. Together, the database and the web server build an exhaustive resource for ITS2 based phylogenetic analyses.
Notes:
Torben Friedrich, Sven Rahmann, Wilfried Weigel, Wolfgang Rabsch, Angelika Fruth, Eliora Ron, Florian Gunzer, Thomas Dandekar, Jörg Hacker, Tobias Müller, Ulrich Dobrindt (2010)  High-throughput microarray technology in diagnostics of enterobacteria based on genome-wide probe selection and regression analysis.   BMC Genomics 11: 1. 10  
Abstract: The Enterobacteriaceae comprise a large number of clinically relevant species with several individual subspecies. Overlapping virulence-associated gene pools and the high overall genome plasticity often interferes with correct enterobacterial strain typing and risk assessment. Array technology offers a fast, reproducible and standardisable means for bacterial typing and thus provides many advantages for bacterial diagnostics, risk assessment and surveillance. The development of highly discriminative broad-range microbial diagnostic microarrays remains a challenge, because of marked genome plasticity of many bacterial pathogens.
Notes:
Roland Schwarz, Biju Joseph, Gabriele Gerlach, Anja Schramm-Glück, Kathrin Engelhard, Matthias Frosch, Tobias Müller, Christoph Schoen (2010)  Evaluation of one- and two-color gene expression arrays for microbial comparative genome hybridization analyses in routine applications.   J Clin Microbiol 48: 9. 3105-3110 Sep  
Abstract: DNA microarray technology has already revolutionized basic research in infectious diseases, and whole-genome sequencing efforts have allowed for the fabrication of tailor-made spotted microarrays for an increasing number of bacterial pathogens. However, the application of microarrays in diagnostic microbiology is currently hampered by the high costs associated with microarray experiments and the specialized equipment needed. Here, we show that a thorough bioinformatic postprocessing of the microarray design to reduce the amount of unspecific noise also allows the reliable use of spotted gene expression microarrays for gene content analyses. We further demonstrate that the use of only single-color labeling to halve the costs for dye-labeled nucleotides results in only a moderate decrease in overall specificity and sensitivity. Therefore, gene expression microarrays using only single-color labeling can also reliably be used for gene content analyses, thus reducing the costs for potential routine applications such as genome-based pathogen detection or strain typing.
Notes:
2009
Roland Schwarz, Matthias Wolf, Tobias Müller (2009)  A probabilistic model of cell size reduction in Pseudo-nitzschia delicatissima (Bacillariophyta).   J Theor Biol 258: 2. 316-322 May  
Abstract: The pennate planktonic diatom Pseudo-nitzschia delicatissima is very common in temperate marine waters and often responsible for blooms. Due to its surrounding rigid silicate frustrule the diatom undergoes successive size reduction as its vegetative reproduction cycle proceeds. Since a long time the life cycle of diatoms has raised scientific interest and some years ago extensive samples of Pseudo-nitzschia have been taken from coastal waters. Mating and cell size reduction experiments were carried out and served us as a data basis for a probabilistic model of cell size reduction. We applied a homogenous non-stationary continuous-time Markov chain to model the development of individual diatoms from an initial size of about 80 microm until cell death which occurred when the size reached its low at about 18 microm. In contrast to conventional curve fitting models we are capable of calculating confidence intervals for estimates of the population ages as well as integrate the process of auxospore formation into the model. We thus propose a unique way to describe the stationary size distribution in a diatom population in terms of cell division and auxospore formation probabilities of its individuals.
Notes:
Alexander Keller, Tina Schleicher, Jörg Schultz, Tobias Müller, Thomas Dandekar, Matthias Wolf (2009)  5.8S-28S rRNA interaction and HMM-based ITS2 annotation.   Gene 430: 1-2. 50-57 Feb  
Abstract: The internal transcribed spacer 2 (ITS2) of the nuclear ribosomal repeat unit is one of the most commonly applied phylogenetic markers. It is a fast evolving locus, which makes it appropriate for studies at low taxonomic levels, whereas its secondary structure is well conserved, and tree reconstructions are possible at higher taxonomic levels. However, annotation of start and end positions of the ITS2 differs markedly between studies. This is a severe shortcoming, as prediction of a correct secondary structure by standard ab initio folding programs requires accurate identification of the marker in question. Furthermore, the correct structure is essential for multiple sequence alignments based on individual structural features. The present study describes a new tool for the delimitation and identification of the ITS2. It is based on hidden Markov models (HMMs) and verifies annotations by comparison to a conserved structural motif in the 5.8S/28S rRNA regions. Our method was able to identify and delimit the ITS2 in more than 30000 entries lacking start and end annotations in GenBank. Furthermore, 45000 ITS2 sequences with a questionable annotation were re-annotated. Approximately 30000 entries from the ITS2-DB, that uses a homology-based method for structure prediction, were re-annotated. We show that the method is able to correctly annotate an ITS2 as small as 58 nt from Giardia lamblia and an ITS2 as large as 1160 nt from humans. Thus, our method should be a valuable guide during the first and crucial step in any ITS2-based phylogenetic analysis: the delineation of the correct sequence. Sequences can be submitted to the following website for HMM-based ITS2 delineation: http://its2.bioapps.biozentrum.uni-wuerzburg.de.
Notes:
Roland Schwarz, Philipp N Seibel, Sven Rahmann, Christoph Schoen, Mirja Huenerberg, Clemens Müller-Reible, Thomas Dandekar, Rachel Karchin, Jörg Schultz, Tobias Müller (2009)  Detecting species-site dependencies in large multiple sequence alignments.   Nucleic Acids Res 37: 18. 5959-5968 Oct  
Abstract: Multiple sequence alignments (MSAs) are one of the most important sources of information in sequence analysis. Many methods have been proposed to detect, extract and visualize their most significant properties. To the same extent that site-specific methods like sequence logos successfully visualize site conservations and sequence-based methods like clustering approaches detect relationships between sequences, both types of methods fail at revealing informational elements of MSAs at the level of sequence-site interactions, i.e. finding clusters of sequences and sites responsible for their clustering, which together account for a high fraction of the overall information of the MSA. To fill this gap, we present here a method that combines the Fisher score-based embedding of sequences from a profile hidden Markov model (pHMM) with correspondence analysis. This method is capable of detecting and visualizing group-specific or conflicting signals in an MSA and allows for a detailed explorative investigation of alignments of any size tractable by pHMMs. Applications of our methods are exemplified on an alignment of the Neisseria surface antigen LP2086, where it is used to detect sites of recombinatory horizontal gene transfer and on the vitamin K epoxide reductase family to distinguish between evolutionary and functional signals.
Notes:
Frank Förster, Chunguang Liang, Alexander Shkumatov, Daniela Beisser, Julia C Engelmann, Martina Schnölzer, Marcus Frohme, Tobias Müller, Ralph O Schill, Thomas Dandekar (2009)  Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades.   BMC Genomics 10: 1. 10  
Abstract: BACKGROUND: Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. RESULTS: To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. CONCLUSION: Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences.
Notes:
Julia C Engelmann, Sven Rahmann, Matthias Wolf, Jörg Schultz, Epameinondas Fritzilas, Susanne Kneitz, Thomas Dandekar, Tobias Müller (2009)  Modelling cross-hybridization on phylogenetic DNA microarrays increases the detection power of closely related species.   Mol Ecol Resour 9: 1. 83-93 Jan  
Abstract: DNA microarrays are a popular technique for the detection of microorganisms. Several approaches using specific oligomers targeting one or a few marker genes for each species have been proposed. Data analysis is usually limited to call a species present when its oligomer exceeds a certain intensity threshold. While this strategy works reasonably well for distantly related species, it does not work well for very closely related species: Cross-hybridization of nontarget DNA prevents a simple identification based on signal intensity. The majority of species of the same genus has a sequence similarity of over 90%. For biodiversity studies down to the species level, it is therefore important to increase the detection power of closely related species. We propose a simple, cost-effective and robust approach for biodiversity studies using DNA microarray technology and demonstrate it on scenedesmacean green algae. The internal transcribed spacer 2 (ITS2) rDNA sequence was chosen as marker because it is suitable to distinguish all eukaryotic species even though parts of it are virtually identical in closely related species. We show that by modelling hybridization behaviour with a matrix algebra approach, we are able to identify closely related species that cannot be distinguished with a threshold on signal intensity. Thus this proof-of-concept study shows that by adding a simple and robust data analysis step to the evaluation of DNA microarrays, species detection can be significantly improved for closely related species with a high sequence similarity.
Notes:
2008
Alexander Keller, Tina Schleicher, Frank Förster, Benjamin Ruderisch, Thomas Dandekar, Tobias Müller, Matthias Wolf (2008)  ITS2 data corroborate a monophyletic chlorophycean DO-group (Sphaeropleales).   BMC Evol Biol 8: 07  
Abstract: Within Chlorophyceae the ITS2 secondary structure shows an unbranched helix I, except for the 'Hydrodictyon' and the 'Scenedesmus' clade having a ramified first helix. The latter two are classified within the Sphaeropleales, characterised by directly opposed basal bodies in their flagellar apparatuses (DO-group). Previous studies could not resolve the taxonomic position of the 'Sphaeroplea' clade within the Chlorophyceae without ambiguity and two pivotal questions remain open: (1) Is the DO-group monophyletic and (2) is a branched helix I an apomorphic feature of the DO-group? In the present study we analysed the secondary structure of three newly obtained ITS2 sequences classified within the 'Sphaeroplea' clade and resolved sphaeroplealean relationships by applying different phylogenetic approaches based on a combined sequence-structure alignment.
Notes:
Matthias Wolf, Benjamin Ruderisch, Thomas Dandekar, Jörg Schultz, Tobias Müller (2008)  ProfDistS: (profile-) distance based phylogeny on sequence--structure alignments.   Bioinformatics 24: 20. 2401-2402 Oct  
Abstract: MOTIVATION: The Profile Neighbor Joining (PNJ) algorithm as implemented in the software ProfDist is computationally efficient in reconstructing very large trees. Besides the huge amount of sequence data the structure is important in RNA alignment analysis and phylogenetic reconstruction. RESULTS: For this ProfDistS provides a phylogenetic workflow that uses individual RNA secondary structures in reconstructing phylogenies based on sequence-structure alignments-using PNJ with manual or iterative and automatic profile definition. Moreover, ProfDistS can deal also with protein sequences.
Notes:
Julia C Engelmann, Rosalia Deeken, Tobias Müller, Günter Nimtz, M Rob G Roelfsema, Rainer Hedrich (2008)  Is gene activity in plant cells affected by UMTS-irradiation? A whole genome approach.   Computational Biology and Chemistry: Advances and Applications 1. 71 - 83  
Abstract: Mobile phone technology makes use of radio frequency (RF) electromagnetic fields transmitted through a dense network of base stations in Europe. Possible harmful effects of RF fields on humans and animals are discussed, but their effect on plants has received little attention. In search for physiological processes of plant cells sensitive to RF fields, cell suspension cultures of Arabidopsis thaliana were exposed for 24 h to a RF field protocol representing typical microwave exposition in an urban environment. mRNA of exposed cultures and controls was used to hybridize Affymetrix-ATH1 whole genome microarrays. Differential expression analysis revealed significant changes in transcription of 10 genes, but they did not exceed a fold change of 2.5. Besides that 3 of them are dark-inducible, their functions do not point to any known responses of plants to environmental stimuli. The changes in transcription of these genes were compared with published microarray datasets and revealed a weak similarity of the microwave to light treatment experiments. Considering the large changes described in published experiments, it is questionable if the small alterations caused by a 24 h continuous microwave exposure would have any impact on the growth and reproduction of whole plants.
Notes:
Philipp N Seibel, Tobias Müller, Thomas Dandekar, Matthias Wolf (2008)  Synchronous visual analysis and editing of RNA sequence and secondary structure alignments using 4SALE.   BMC Res Notes 1: 1. 10  
Abstract: The function of a noncoding RNA sequence is mainly determined by its secondary structure and therefore a family of noncoding RNA sequences is much more conserved on the structural level than on the sequence level. Understanding the function of noncoding RNA sequence families requires two things: a hand-crafted or hand-improved alignment and detailed analyses of the secondary structures. There are several tools available that help performing these tasks, but all of them are specialized and focus on only one aspect, editing the alignment or plotting the secondary structure. The problem is both these tasks need to be performed simultaneously.
Notes:
Marcus T Dittrich, Gunnar W Klau, Andreas Rosenwald, Thomas Dandekar, Tobias Müller (2008)  Identifying functional modules in protein-protein interaction networks: an integrated exact approach.   Bioinformatics 24: 13. i223-i231 Jul  
Abstract: MOTIVATION: With the exponential growth of expression and protein-protein interaction (PPI) data, the frontier of research in systems biology shifts more and more to the integrated analysis of these large datasets. Of particular interest is the identification of functional modules in PPI networks, sharing common cellular function beyond the scope of classical pathways, by means of detecting differentially expressed regions in PPI networks. This requires on the one hand an adequate scoring of the nodes in the network to be identified and on the other hand the availability of an effective algorithm to find the maximally scoring network regions. Various heuristic approaches have been proposed in the literature. RESULTS: Here we present the first exact solution for this problem, which is based on integer-linear programming and its connection to the well-known prize-collecting Steiner tree problem from Operations Research. Despite the NP-hardness of the underlying combinatorial problem, our method typically computes provably optimal subnetworks in large PPI networks in a few minutes. An essential ingredient of our approach is a scoring function defined on network nodes. We propose a new additive score with two desirable properties: (i) it is scalable by a statistically interpretable parameter and (ii) it allows a smooth integration of data from various sources. We apply our method to a well-established lymphoma microarray dataset in combination with associated survival data and the large interaction network of HPRD to identify functional modules by computing optimal-scoring subnetworks. In particular, we find a functional interaction module associated with proliferation over-expressed in the aggressive ABC subtype as well as modules derived from non-malignant by-stander cells. AVAILABILITY: Our software is available freely for non-commercial purposes at http://www.planet-lisa.net.
Notes:
Christoph Schoen, Jochen Blom, Heike Claus, Anja Schramm-Glück, Petra Brandt, Tobias Müller, Alexander Goesmann, Biju Joseph, Sebastian Konietzny, Oliver Kurzai, Corinna Schmitt, Torben Friedrich, Burkhard Linke, Ulrich Vogel, Matthias Frosch (2008)  Whole-genome comparison of disease and carriage strains provides insights into virulence evolution in Neisseria meningitidis.   Proc Natl Acad Sci U S A 105: 9. 3473-3478 Mar  
Abstract: Neisseria meningitidis is a leading cause of infectious childhood mortality worldwide. Most research efforts have hitherto focused on disease isolates belonging to only a few hypervirulent clonal lineages. However, up to 10% of the healthy human population is temporarily colonized by genetically diverse strains mostly with little or no pathogenic potential. Currently, little is known about the biology of carriage strains and their evolutionary relationship with disease isolates. The expression of a polysaccharide capsule is the only trait that has been convincingly linked to the pathogenic potential of N. meningitidis. To gain insight into the evolution of virulence traits in this species, whole-genome sequences of three meningococcal carriage isolates were obtained. Gene content comparisons with the available genome sequences from three disease isolates indicate that there is no core pathogenome in N. meningitidis. A comparison of the chromosome structure suggests that a filamentous prophage has mediated large chromosomal rearrangements and the translocation of some candidate virulence genes. Interspecific comparison of the available Neisseria genome sequences and dot blot hybridizations further indicate that the insertion sequence IS1655 is restricted only to N. meningitidis; its low sequence diversity is an indicator of an evolutionarily recent population bottleneck. A genome-based phylogenetic reconstruction provides evidence that N. meningitidis has emerged as an unencapsulated human commensal from a common ancestor with Neisseria gonorrhoeae and Neisseria lactamica and consecutively acquired the genes responsible for capsule synthesis via horizontal gene transfer.
Notes:
Julia C Engelmann, Roland Schwarz, Steffen Blenk, Torben Friedrich, Philipp N Seibel, Thomas Dandekar, Tobias Müller (2008)  Unsupervised meta-analysis on diverse gene expression datasets allows insight into gene function and regulation.   Bioinformatics and Biology Insights 2. 271-286 May  
Abstract: Over the past years, microarray databases have increased rapidly in size. While they offer a wealth of data, it remains challenging to integrate data arising from different studies. Here we propose an unsupervised approach of a large-scale meta-analysis on Arabidopsis thaliana whole genome expression datasets to gain additional insights into the function and regulation of genes. Applying kernel principal component analysis and hierarchical clustering we found three major groups of experimental contrasts sharing a common biological trait. Genes associated to two of these clusters are known to play an important role in indole-3-acetic acid (IAA) mediated plant growth and development or pathogen defense. Novel functions could be assigned to genes including a cluster of serine/threonine kinases that carry two uncharacterized domains (DUF26) in their receptor part implicated in host defence. With the approach shown here, hidden interrelations between genes regulated under different conditions can be unraveled.
Notes:
Christian Selig, Matthias Wolf, Tobias Müller, Thomas Dandekar, Jörg Schultz (2008)  The ITS2 Database II: homology modelling RNA structure for molecular systematics.   Nucleic Acids Res 36: Database issue. D377-D380 Jan  
Abstract: An increasing number of phylogenetic analyses are based on the internal transcribed spacer 2 (ITS2). They mainly use the fast evolving sequence for low-level analyses. When considering the highly conserved structure, the same marker could also be used for higher level phylogenies. Furthermore, structural features of the ITS2 allow distinguishing different species from each other. Despite its importance, the correct structure is only rarely found by standard RNA folding algorithms. To overcome this hindrance for a wider application of the ITS2, we have developed a homology modelling approach to predict the structure of RNA and present the results of modelling the ITS2 in the ITS2 Database. Here, we describe the database and the underlying algorithms which allowed us to predict the structure for 86 784 sequences, which is more than 55% of all GenBank entries concerning the ITS2. These are not equally distributed over all genera. There is a substantial amount of genera where the structure of nearly all sequences is predicted whereas for others no structure at all was found despite high sequence coverage. These genera might have evolved an ITS2 structure diverging from the standard one. The current version of the ITS2 Database can be accessed via http://its2.bioapps.biozentrum.uni-wuerzburg.de.
Notes:
Steffen Blenk, Julia C Engelmann, Stefan Pinkert, Markus Weniger, Jörg Schultz, Andreas Rosenwald, Hans K Müller-Hermelink, Tobias Müller, Thomas Dandekar (2008)  Explorative data analysis of MCL reveals gene expression networks implicated in survival and prognosis supported by explorative CGH analysis.   BMC Cancer 8: 1. 04  
Abstract: BACKGROUND: Mantle cell lymphoma (MCL) is an incurable B cell lymphoma and accounts for 6% of all non-Hodgkin's lymphomas. On the genetic level, MCL is characterized by the hallmark translocation t(11;14) that is present in most cases with few exceptions. Both gene expression and comparative genomic hybridization (CGH) data vary considerably between patients with implications for their prognosis. METHODS: We compare patients over and below the median of survival. Exploratory principal component analysis of gene expression data showed that the second principal component correlates well with patient survival. Explorative analysis of CGH data shows the same correlation. RESULTS: On chromosome 7 and 9 specific genes and bands are delineated which improve prognosis prediction independent of the previously described proliferation signature. We identify a compact survival predictor of seven genes for MCL patients. After extensive re-annotation using GEPAT, we established protein networks correlating with prognosis. Well known genes (CDC2, CCND1) and further proliferation markers (WEE1, CDC25, aurora kinases, BUB1, PCNA, E2F1) form a tight interaction network, but also non-proliferative genes (SOCS1, TUBA1B CEBPB) are shown to be associated with prognosis. Furthermore we show that aggressive MCL implicates a gene network shift to higher expressed genes in late cell cycle states and refine the set of non-proliferative genes implicated with bad prognosis in MCL. CONCLUSION: The results from explorative data analysis of gene expression and CGH data are complementary to each other. Including further tests such as Wilcoxon rank test we point both to proliferative and non-proliferative gene networks implicated in inferior prognosis of MCL and identify suitable markers both in gene expression and CGH data.
Notes:
2007
Matthias Wolf, Christian Selig, Tobias Müller, Nicole Philippi, Thomas Dandekar, Jörg Schultz (2007)  Placozoa : at least two   BIOLOGIA 62: 6. 641-645 DEC  
Abstract: It was shown that compensatory base changes (CBCs) in internal transcribed spacer 2 (ITS2) sequence-structure alignments can be used for distinguishing species. Using the ITS2 Database in combination with 4SALE â a tool for synchronous RNA sequence and secondary structure alignment and editing â in this study we present an in-depth CBC analysis for placozoan ITS2 sequences and their respective secondary structures. This analysis indicates at least two distinct species in Trichoplax (Placozoa) supporting a recently suggested hypothesis, that Placozoa is âno longer a phylum of oneâ.
Notes:
S Blenk, J Engelmann, M Weniger, J Schultz, M Dittrich, A Rosenwald, H K Müller-Hermelink, T Müller, T Dandekar (2007)  Germinal center B cell-like (GCB) and activated B cell-like (ABC) type of diffuse large B cell lymphoma (DLBCL): analysis of molecular predictors, signatures, cell cycle state and patient survival.   Cancer Inform 3: 399-420 12  
Abstract: Aiming to find key genes and events, we analyze a large data set on diffuse large B-cell lymphoma (DLBCL) gene-expression (248 patients, 12196 spots). Applying the loess normalization method on these raw data yields improved survival predictions, in particular for the clinical important group of patients with medium survival time. Furthermore, we identify a simplified prognosis predictor, which stratifies different risk groups similarly well as complex signatures. We identify specific, activated B cell-like (ABC) and germinal center B cell-like (GCB) distinguishing genes. These include early (e.g. CDKN3) and late (e.g. CDKN2C) cell cycle genes. Independently from previous classification by marker genes we confirm a clear binary class distinction between the ABC and GCB subgroups. An earlier suggested third entity is not supported. A key regulatory network, distinguishing marked over-expression in ABC from that in GCB, is built by: ASB13, BCL2, BCL6, BCL7A, CCND2, COL3A1, CTGF, FN1, FOXP1, IGHM, IRF4, LMO2, LRMP, MAPK10, MME, MYBL1, NEIL1 and SH3BP5. It predicts and supports the aggressive behaviour of the ABC subgroup. These results help to understand target interactions, improve subgroup diagnosis, risk prognosis as well as therapy in the ABC and GCB DLBCL subgroups.
Notes:
Tobias Müller, Nicole Philippi, Thomas Dandekar, Jörg Schultz, Matthias Wolf (2007)  Distinguishing species.   RNA 13: 9. 1469-1472 Sep  
Abstract: Given two organisms, how can one distinguish whether they belong to the same species or not? This might be straightforward for two divergent organisms, but can be extremely difficult and laborious for closely related ones. A molecular marker giving a clear distinction would therefore be of immense benefit. The internal transcribed spacer 2 (ITS2) has been widely used for low-level phylogenetic analyses. Case studies revealed that a compensatory base change (CBC) in the helix II or helix III ITS2 secondary structure between two organisms correlated with sexual incompatibility. We analyzed more than 1300 closely related species to test whether this correlation is generally applicable. In 93%, where a CBC was found between organisms classified within the same genus, they belong to different species. Thus, a CBC in an ITS2 sequence-structure alignment is a sufficient condition to distinguish even closely related species.
Notes:
Daniel Gerlach, Matthias Wolf, Thomas Dandekar, Tobias Müller, Andreas Pokorny, Sven Rahmann (2007)  Deep metazoan phylogeny.   In Silico Biol 7: 2. 151-154  
Abstract: We reconstructed a robust phylogenetic tree of the Metazoa, consisting of almost 1,500 taxa, by profile neighbor joining (PNJ), an automated computational method that inherits the efficiency of the neighbor joining algorithm. This tree supports the one proposed in the latest review on metazoan phylogeny. Our main goal is not to discuss aspects of the phylogeny itself, but rather to point out that PNJ can be a valuable tool when the basal branching pattern of a large phylogenetic tree must be estimated, whereas traditional methods would be computationally impractical.
Notes:
2006
Jörg Schultz, Tobias Müller, Marco Achtziger, Philipp N Seibel, Thomas Dandekar, Matthias Wolf (2006)  The internal transcribed spacer 2 database--a web server for (not only) low level phylogenetic analyses.   Nucleic Acids Res 34: Web Server issue. W704-W707 Jul  
Abstract: The internal transcribed spacer 2 (ITS2) is a phylogenetic marker which has been of broad use in generic and infrageneric level classifications, as its sequence evolves comparably fast. Only recently, it became clear, that the ITS2 might be useful even for higher level systematic analyses. As the secondary structure is highly conserved within all eukaryotes it serves as a valuable template for the construction of highly reliable sequence-structure alignments, which build a fundament for subsequent analyses. Thus, any phylogenetic study using ITS2 has to consider both sequence and structure. We have integrated a homology based RNA structure prediction algorithm into a web server, which allows the detection and secondary structure prediction for ITS2 in any given sequence. Furthermore, the resource contains more than 25,000 pre-calculated secondary structures for the currently known ITS2 sequences. These can be taxonomically searched and browsed. Thus, our resource could become a starting point for ITS2-based phylogenetic analyses and is therefore complementary to databases of other phylogenetic markers, which focus on higher level analyses. The current version of the ITS2 database can be accessed via http://its2.bioapps.biozentrum.uni-wuerzburg.de.
Notes:
Torben Friedrich, Birgit Pils, Thomas Dandekar, Jörg Schultz, Tobias Müller (2006)  Modelling interaction sites in protein domains with interaction profile hidden Markov models.   Bioinformatics 22: 23. 2851-2857 Dec  
Abstract: MOTIVATION: Due to the growing number of completely sequenced genomes, functional annotation of proteins becomes a more and more important issue. Here, we describe a method for the prediction of sites within protein domains, which are part of protein-ligand interactions. As recently demonstrated, these sites are not trivial to detect because of a varying degree of conservation of their location and type within a domain family. RESULTS: The developed method for the prediction of protein-ligand interaction sites is based on a newly defined interaction profile hidden Markov model (ipHMM) topology that takes structural and sequence data into account. It is based on a homology search via a posterior decoding algorithm that yields probabilities for interacting sequence positions and inherits the efficiency and the power of the profile hidden Markov model (pHMM) methodology. The algorithm enhances the quality of interaction site predictions and is a suitable tool for large scale studies, which was already demonstrated for pHMMs. AVAILABILITY: The MATLAB-files are available on request from the first author.
Notes:
Rosalia Deeken, Julia C Engelmann, Marina Efetova, Tina Czirjak, Tobias Müller, Werner M Kaiser, Olaf Tietz, Markus Krischke, Martin J Mueller, Klaus Palme, Thomas Dandekar, Rainer Hedrich (2006)  An integrated view of gene expression and solute profiles of Arabidopsis tumors: a genome-wide approach.   Plant Cell 18: 12. 3617-3634 Dec  
Abstract: Transformation of plant cells with T-DNA of virulent agrobacteria is one of the most extreme triggers of developmental changes in higher plants. For rapid growth and development of resulting tumors, specific changes in the gene expression profile and metabolic adaptations are required. Increased transport and metabolic fluxes are critical preconditions for growth and tumor development. A functional genomics approach, using the Affymetrix whole genome microarray (approximately 22,800 genes), was applied to measure changes in gene expression. The solute pattern of Arabidopsis thaliana tumors and uninfected plant tissues was compared with the respective gene expression profile. Increased levels of anions, sugars, and amino acids were correlated with changes in the gene expression of specific enzymes and solute transporters. The expression profile of genes pivotal for energy metabolism, such as those involved in photosynthesis, mitochondrial electron transport, and fermentation, suggested that tumors produce C and N compounds heterotrophically and gain energy mainly anaerobically. Thus, understanding of gene-to-metabolite networks in plant tumors promotes the identification of mechanisms that control tumor development.
Notes:
Philipp N Seibel, Tobias Müller, Thomas Dandekar, Jörg Schultz, Matthias Wolf (2006)  4SALE--a tool for synchronous RNA sequence and secondary structure alignment and editing.   BMC Bioinformatics 7: 11  
Abstract: BACKGROUND: In sequence analysis the multiple alignment builds the fundament of all proceeding analyses. Errors in an alignment could strongly influence all succeeding analyses and therefore could lead to wrong predictions. Hand-crafted and hand-improved alignments are necessary and meanwhile good common practice. For RNA sequences often the primary sequence as well as a secondary structure consensus is well known, e.g., the cloverleaf structure of the t-RNA. Recently, some alignment editors are proposed that are able to include and model both kinds of information. However, with the advent of a large amount of reliable RNA sequences together with their solved secondary structures (available from e.g. the ITS2 Database), we are faced with the problem to handle sequences and their associated secondary structures synchronously. RESULTS: 4SALE fills this gap. The application allows a fast sequence and synchronous secondary structure alignment for large data sets and for the first time synchronous manual editing of aligned sequences and their secondary structures. This study describes an algorithm for the synchronous alignment of sequences and their associated secondary structures as well as the main features of 4SALE used for further analyses and editing. 4SALE builds an optimal and unique starting point for every RNA sequence and structure analysis. CONCLUSION: 4SALE, which provides an user-friendly and intuitive interface, is a comprehensive toolbox for RNA analysis based on sequence and secondary structure information. The program connects sequence and structure databases like the ITS2 Database to phylogeny programs as for example the CBCAnalyzer. 4SALE is written in JAVA and therefore platform independent. The software is freely available and distributed from the website at http://4sale.bioapps.biozentrum.uni-wuerzburg.de.
Notes:
2005
Matthias Wolf, Joachim Friedrich, Thomas Dandekar, Tobias Müller (2005)  CBCAnalyzer: inferring phylogenies based on compensatory base changes in RNA secondary structures.   In Silico Biol 5: 3. 291-294 03  
Abstract: The CBCAnalyzer (CBC=compensatory base change) is a custom written software toolbox consisting of three parts, CTTransform, CBCDetect, and CBCTree. CTTransform reads several ct-file formats, and generates a so called "bracket-dot-bracket" format that typically is used as input for other tools such as RNAforester, RNAmovie or MARNA. The latter one creates a multiple alignment based on primary sequences and secondary structures that now can be used as input for CBCDetect. CBCDetect counts CBCs in all against all of the aligned sequences. This is important in detecting species that are discriminated by their sexual incompatibility. The count (distance) matrix obtained by CBCDetect is used as input for CBCTree that reconstructs a phylogram by using the algorithm of BIONJ. In this note we describe the features of the toolbox as well as application examples. The toolbox provides a graphical user interface. It is written in C++ and freely available at: http://cbcanalyzer.bioapps.biozentrum.uni-wuerzburg.de.
Notes:
Matthias Wolf, Marco Achtziger, Jörg Schultz, Thomas Dandekar, Tobias Müller (2005)  Homology modeling revealed more than 20,000 rRNA internal transcribed spacer 2 (ITS2) secondary structures.   RNA 11: 11. 1616-1623 Nov  
Abstract: Structural genomics meets phylogenetics and vice versa: Knowing rRNA secondary structures is a prerequisite for constructing rRNA alignments for inferring phylogenies, and inferring phylogenies is a precondition to understand the evolution of such rRNA secondary structures. Here, both scientific worlds go together. The rRNA internal transcribed spacer 2 (ITS2) region is a widely used phylogenetic marker. Because of its high variability at the sequence level, correct alignments have to take into account structural information. In this study, we examine the extent of the conservation in structure. We present (1) the homology modeled secondary structure of more than 20,000 ITS2 covering about 14,000 species; (2) a computational approach for homology modeling of rRNA structures, which additionally can be applied to other RNA families; and (3) a database providing about 25,000 ITS2 sequences with their associated secondary structures, a refined ITS2 specific general time reversible (GTR) substitution model, and a scoring matrix, available at http://its2.bioapps.biozentrum.uni-wuerzburg.de.
Notes:
Jörg Schultz, Stefanie Maisel, Daniel Gerlach, Tobias Müller, Matthias Wolf (2005)  A common core of secondary structure of the internal transcribed spacer 2 (ITS2) throughout the Eukaryota.   RNA 11: 4. 361-364 Apr  
Abstract: The ongoing characterization of novel species creates the need for a molecular marker which can be used for species- and, simultaneously, for mega-systematics. Recently, the use of the internal transcribed spacer 2 (ITS2) sequence was suggested, as it shows a high divergence in sequence with an assumed conservation in structure. This hypothesis was mainly based on small-scale analyses, comparing a limited number of sequences. Here, we report a large-scale analysis of more than 54,000 currently known ITS2 sequences with the goal to evaluate the hypothesis of a conserved structural core and to assess its use for automated large-scale phylogenetics. Structure prediction revealed that the previously described core structure can be found for more than 5000 sequences in a wide variety of taxa within the eukaryotes, indicating that the core secondary structure is indeed conserved. This conserved structure allowed an automated alignment of extremely divergent sequences as exemplified for the ITS2 sequences of a ctenophorean eumetazoon and a volvocalean green alga. All classified sequences, together with their structures can be accessed at http://www.biozentrum.uni-wuerzburg.de/bioinformatik/projects/ITS2.html. Furthermore, we found that, although sample sequences are known for most major taxa, there exists a profound divergence in coverage, which might become a hindrance for general usage. In summary, our analysis strengthens the potential of ITS2 as a general phylogenetic marker and provides a data source for further ITS2-based analyses.
Notes:
Armin Robubi, Tobias Müller, Jochen Fueller, Mirko Hekman, Ulf R Rapp, Thomas Dandekar (2005)  B-Raf and C-Raf signaling investigated in a simplified model of the mitogenic kinase cascade.   Biol Chem 386: 11. 1165-1171 Nov  
Abstract: Signaling pathways based on the reversible phosphorylation of proteins control most aspects of cellular life in higher organisms. Extracellular stimuli can induce growth, differentiation, survival and the stress response through a number of highly conserved signaling pathways. We discuss how the intensity and duration of signals may have dramatic consequences on the way cells respond to stimuli. Picking the central Ras-Raf-MEK-ERK signal cascade, we developed a mathematical model of how stimuli induce different signal patterns and thereby different cellular responses, depending on cell type and the ratio between B-Raf and C-Raf. Based on biochemical data for activation and dephosphorylation, as well as the differential equations of our model, we suggest a different signaling pattern and response result for B-Raf (strong activation, sustained signal) and C-Raf (steep activation, transient signal). We further support the significance of such differential modulatory signaling by showing different Raf isoform expression in various cell lines and experimental testing of the predicted kinase activities in B-Raf, C-Raf and mutated versions.
Notes:
Joachim Friedrich, Thomas Dandekar, Matthias Wolf, Tobias Müller (2005)  ProfDist: a tool for the construction of large phylogenetic trees based on profile distances.   Bioinformatics 21: 9. 2108-2109 May  
Abstract: SUMMARY: ProfDist is a user-friendly software package using the profile-neighbor-joining method (PNJ) in inferring phylogenies based on profile distances on DNA or RNA sequences. It is a tool for reconstructing and visualizing large phylogenetic trees providing new and standard features with a special focus on time efficency, robustness and accuracy. AVAILABILITY: A Windows version of ProfDist comes with a graphical user interface and is freely available at http://profdist.bioapps.biozentrum.uni-wuerzburg.de
Notes:
2004
Katja Rateitschak, Tobias Müller, Martin Vingron (2004)  Annotating significant pairs of transcription factor binding sites in regulatory DNA.   In Silico Biol 4: 4. 479-487  
Abstract: In the presented work we search for transcription factor binding sites (BS) by including additional information about typical BS patterns. The new proposed score combines the ordinary profile score based on TRANSFAC-matrices together with a score based on pairs of BS. The latter score positively weights pairs of BS that tend to occur together in many regulatory DNA-sequences, in contrast to a random background model. The empirical BS pair frequencies result from our evaluation of a large dataset of orthologous genes.
Notes:
Matthias Wolf, Tobias Müller, Thomas Dandekar, J Dennis Pollack (2004)  Phylogeny of Firmicutes with special reference to Mycoplasma (Mollicutes) as inferred from phosphoglycerate kinase amino acid sequence data.   Int J Syst Evol Microbiol 54: Pt 3. 871-875 May  
Abstract: The phylogenetic position of the Mollicutes has been re-examined by using phosphoglycerate kinase (Pgk) amino acid sequences. Hitherto unpublished sequences from Mycoplasma mycoides subsp. mycoides, Mycoplasma hyopneumoniae and Spiroplasma citri were included in the analysis. Phylogenetic trees based on Pgk data indicated a monophyletic origin for the Mollicutes within the Firmicutes, whereas Bacilli (Firmicutes) and Clostridia (Firmicutes) appeared to be paraphyletic. With two exceptions, i.e. Thermotoga (Thermotogae) and Fusobacterium (Fusobacteria), which clustered within the Firmicutes, comparative analyses show that at a low taxonomic level, the resolved phylogenetic relationships that were inferred from both the Pgk protein and 16S rRNA gene sequence data are congruent.
Notes:
Tobias Müller, Sven Rahmann, Thomas Dandekar, Matthias Wolf (2004)  Accurate and robust phylogeny estimation based on profile distances: a study of the Chlorophyceae (Chlorophyta).   BMC Evol Biol 4: Jun  
Abstract: BACKGROUND: In phylogenetic analysis we face the problem that several subclade topologies are known or easily inferred and well supported by bootstrap analysis, but basal branching patterns cannot be unambiguously estimated by the usual methods (maximum parsimony (MP), neighbor-joining (NJ), or maximum likelihood (ML)), nor are they well supported. We represent each subclade by a sequence profile and estimate evolutionary distances between profiles to obtain a matrix of distances between subclades. RESULTS: Our estimator of profile distances generalizes the maximum likelihood estimator of sequence distances. The basal branching pattern can be estimated by any distance-based method, such as neighbor-joining. Our method (profile neighbor-joining, PNJ) then inherits the accuracy and robustness of profiles and the time efficiency of neighbor-joining. CONCLUSIONS: Phylogenetic analysis of Chlorophyceae with traditional methods (MP, NJ, ML and MrBayes) reveals seven well supported subclades, but the methods disagree on the basal branching pattern. The tree reconstructed by our method is better supported and can be confirmed by known morphological characters. Moreover the accuracy is significantly improved as shown by parametric bootstrap.
Notes:
T Crass, I Antes, R Basekow, P Bork, C Buning, M Christensen, H Claussen, C Ebeling, P Ernst, V Gailus-Durner, K - H Glatting, R Gohla, F Gössling, K Grote, K Heidtke, A Herrmann, S O'Keeffe, O Kiesslich, S Kolibal, J O Korbel, T Lengauer, I Liebich, M van der Linden, H Luz, K Meissner, C von Mering, H - T Mevissen, H - W Mewes, H Michael, M Mokrejs, Tobias Müller, H Pospisil, M Rarey, J G Reich, R Schneider, D Schomburg, S Schulze-Kremer, K Schwarzer, I Sommer, S Springstubbe, S Suhai, G Thoppae, M Vingron, J Warfsmann, T Werner, D Wetzler, E Wingender, R Zimmer (2004)  The Helmholtz Network for Bioinformatics: an integrative web portal for bioinformatics resources.   Bioinformatics 20: 2. 268-270 Jan  
Abstract: SUMMARY: The Helmholtz Network for Bioinformatics (HNB) is a joint venture of eleven German bioinformatics research groups that offers convenient access to numerous bioinformatics resources through a single web portal. The 'Guided Solution Finder' which is available through the HNB portal helps users to locate the appropriate resources to answer their queries by employing a detailed, tree-like questionnaire. Furthermore, automated complex tool cascades ('tasks'), involving resources located on different servers, have been implemented, allowing users to perform comprehensive data analyses without the requirement of further manual intervention for data transfer and re-formatting. Currently, automated cascades for the analysis of regulatory DNA segments as well as for the prediction of protein functional properties are provided. AVAILABILITY: The HNB portal is available at http://www.hnbioinfo.de
Notes:
2003
Sven Rahmann, Tobias Müller, Martin Vingron (2003)  On the power of profiles for transcription factor binding site detection.   Stat Appl Genet Mol Biol 2: 11  
Abstract: Transcription factor binding site (TFBS) detection plays an important role in computational biology, with applications in gene finding and gene regulation. The sites are often modeled by gapless profiles, also known as position-weight matrices. Past research has focused on the significance of profile scores (the ability to avoid false positives), but this alone is not enough: The profile must also possess the power to detect the true positive signals. Several completed genomes are now available, and the search for TFBSs is moving to a large scale; so discriminating signal from noise becomes even more challenging.Since TFBS profiles are usually estimated from only a few experimentally confirmed instances, careful regularization is an important issue. We present a novel method that is well suited for this situation.We further develop measures that help in judging profile quality, based on both sensitivity and selectivity of a profile. It is shown that these quality measures can be efficiently computed, and we propose statistically well-founded methods to choose score thresholds.Our findings are applied to the TRANSFAC database of transcription factor binding sites. The results are disturbing: If we insist on a significance level of 5% in sequences of length 500, only 19% of the profiles detect a true signal instance with 95% success probability under varying background sequence compositions.
Notes:
Christine Steinhoff, Tobias Müller, Ulrike A Nuber, Martin Vingron (2003)  Gaussian Mixture Density Estimation applied to Microarray Data.   LNCS (Lecture Notes in Computer Sciences) 2810. 418-429  
Abstract: Several publications have focused on fitting a specific distribution to overall microarray data. Due to a number of biological features the distribution of overall spot intensities can take various shapes. It appears to be impossible to find a specific distribution fitting all experiments even if they are carried out perfectly. Therefore, a probabilistic representation that models a mixture of various effects would be suitable. We use a Gaussian mixture model to represent signal intensity profiles. The advantage of this approach is the derivation of a probabilistic criterion for expressed and non-expressed genes. Furthermore our approach does not involve any prior decision on the number of model parameters. We properly fit microarray data of various shapes by a mixture of Gaussians using the EM algorithm and determine the complexity of the mixture model by the Bayesian Information Criterion (BIC). Finally, we apply our method to simulated data and to biological data.
Notes:
2002
Tobias Müller, Rainer Spang, Martin Vingron (2002)  Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method.   Mol Biol Evol 19: 1. 8-13 Jan  
Abstract: Evolution of proteins is generally modeled as a Markov process acting on each site of the sequence. Replacement frequencies need to be estimated based on sequence alignments. Here we compare three approaches: First, the original method by Dayhoff, Schwartz, and Orcutt (1978) Atlas Protein Seq. Struc. 5:345-352, secondly, the resolvent method (RV) by Müller and Vingron (2000) J. Comput. Biol. 7(6):761-776, and finally a maximum likelihood approach (ML) developed in this paper. We evaluate the methods using a highly divergent and inhomogeneous set of sequence alignments as an input to the estimation procedure. ML is the method of choice for small sets of input data. Although the RV method is computationally much less demanding it performs only slightly worse than ML. Therefore, it is perfectly appropriate for large-scale applications.
Notes:
2001
Tobias Müller, Sven Rahmann, Marc Rehmsmeier (2001)  Non-symmetric score matrices and the detection of homologous transmembrane proteins.   Bioinformatics 17 Suppl 1: S182-S189  
Abstract: Given a transmembrane protein, we wish to find related ones by a database search. Due to the strongly hydrophobic amino acid composition of transmembrane domains, suboptimal results are obtained when general-purpose scoring matrices such as BLOSUM are used. Recently, a transmembrane-specific score matrix called PHAT was shown to perform much better than BLOSUM. In this article, we derive a transmembrane score matrix family, called SLIM, which has several distinguishing features. In contrast to currently used matrices, SLIM is non-symmetric. The asymmetry arises because different background compositions are assumed for the transmembrane query and the unknown database sequences. We describe the mathematical model behind SLIM in detail and show that SLIM outperforms PHAT both on simulated data and in a realistic setting. Since non-symmetric score matrices are a new concept in database search methods, we discuss some important theoretical and practical issues.
Notes:
2000
Tobias Müller, Martin Vingron (2000)  Modeling amino acid replacement.   J Comput Biol 7: 6. 761-776  
Abstract: The estimation of amino acid replacement frequencies during molecular evolution is crucial for many applications in sequence analysis. Score matrices for database search programs or phylogenetic analysis rely on such models of protein evolution. Pioneering work was done by Dayhoff et al. (1978) who formulated a Markov model of evolution and derived the famous PAM score matrices. Her estimation procedure for amino acid exchange frequencies is restricted to pairs of proteins that have a constant and small degree of divergence. Here we present an improved estimator, called the resolvent method, that is not subject to these limitations. This extension of Dayhoff's approach enables us to estimate an amino acid substitution model from alignments of varying degree of divergence. Extensive simulations show the capability of the new estimator to recover accurately the exchange frequencies among amino acids. Based on the SYSTERS database of aligned protein families (Krause and Vingron, 1998) we recompute a series of score matrices.
Notes:

Book chapters

2010
2006
Sven Rahmann, Tobias Müller, Thomas Dandekar, Matthias Wolf (2006)  Efficient and robust analysis of large phylogenetic datasets   In: Advanced Data Mining Technologies in Bioinformatics Edited by:Hui-Huang Hsu. 104-117 Hershey, PA, USA: Idea Group, Inc.  
Abstract: The goal of phylogenetics is to reconstruct ancestral relationships between different taxa, e.g., different species in the tree of life, by means of certain characters, such as genomic sequences. We consider the prominent problem of reconstructing the basal phylogenetic tree topology when several subclades have already been identified or are well known by other means, such as morphological characteristics. Whereas most available tools attempt to estimate a fully resolved tree from scratch, the profile neighbor-joining (PNJ) method focuses directly on the mentioned problem and has proven a robust and efficient method for large-scale data sets, especially when used in an iterative way. We describe an implementation of this idea, the ProfDist software package, which is freely available, and apply the method to estimate the phylogeny of the eukaryotes. Overall, the PNJ approach provides a novel effective way to mine large sequence datasets for relevant phylogenetic information.
Notes:

PhD theses

2001
Tobias Müller (2001)  Modellierung von Proteinevolution,   University of Heidelberg, Germany  
Abstract: In einem einfachen Modell kann Proteinevolution als eine zeitliche Akkumulation von Aminosäuremutationen interpretiert werden. Dabei werden Aminosäuren mit ähnlichen chemischen oder physikalischen Eigenschaften häufiger durch einander ausgetauscht als unähnliche. Im Dayhoffschen Modell von Proteinevolution wird deshalb die Ãhnlichkeit zweier Aminosäuren durch ihre Austauschhäufigkeit gemessen. Diese sind jedoch abhängig vom evolutionären Abstand der betrachteten homologen Sequenzen. Diese zeitliche Abhängigkeit wird durch einen Markoff-Prozeà modelliert, der an jeder Position des Proteins agiert. Die Entwicklung adäquater Schätzer der Markoff-Ketten-Parameter ist aus mathematischer Sicht ein Hauptanliegen dieser Arbeit. In den etablierten Schätzern wird die evolutionäre Distanz meist gar nicht bzw. nur unzureichend modelliert. In dieser Arbeit werden zwei neue Schätzer entwickelt, die auf dem Dayhoffschen Modell von Proteinevolution basieren, dabei jedoch dem Zeitparameter Rechnung tragen. Dies führt zu zwei stark verbesserten Schätzern, die in extensiven Simulationen validiert wurden. Der eine ist der Maximum-Likelihood-Schätzer, der auf einem relativ kleinen Datenfundament der Schätzer der Wahl ist. Der andere ist die Resolventen-Methode, die sehr effizient mit groÃen Datenmengen umgehen kann. Die geschätzten Parameter der Markoff-Kette sind grundlegend für die Berechnung von phylogenetischen Bäumen, Sequenzdatenbanksuchen und für die Berechnung von Sequenzalignments. Durch die verbesserte Modellierung und Schätzung dieser Parameter erwarten wir verbesserte Ergebnisse der Sequenzanalyseprogramme, die auf diesen Parametern beruhen.
Notes:

Masters theses

1996

Software

Powered by PublicationsList.org.