Benjamin Merget, M.Sc. PhD student Institute of Pharmacy and Food Chemistry University of Wuerzburg Am Hubland D-97074 Wuerzburg
benjamin.merget@uni-wuerzburg.de
Short CV
Since Oct 2011: PhD Studies at the Institute of Pharmacy and Food Chemistry, University of Wuerzburg Oct 2010 - Aug 2011: Graduate studies in Biology (M.Sc.), University of Wuerzburg, Germany Oct 2007 - Apr 2010: Undergraduate studies in Biology (B.Sc.), University of Wuerzburg, Germany
Abstract: Background: Chloroplast-encoded genes (matK and rbcL) have been formally proposed for
use in DNA barcoding efforts targeting embryophytes. Extending such a protocol to chlorophytan
green algae, though, is fraught with problems including non homology (matK) and heterogeneity that
prevents the creation of a universal PCR toolkit (rbcL). Some have advocated the use of the nuclear-
encoded, internal transcribed spacer two (ITS2) as an alternative to the traditional chloroplast
markers. as a barcode for plants, However, the ITS2 is broadly perceived to be insufficiently
conserved or to be confounded by introgression or biparental inheritance patterns, precluding its
broad use in phylogenetic reconstruction or as a DNA barcode. A growing body of evidence has shown
that simultaneous analysis of nucleotide data with secondary structure information can overcome at
least some of the limitations of ITS2. The goal of this investigation was to assess the feasibility of an
automated, sequence-structure approach for analysis of IT2 data from a large sampling of phylum
Chlorophyta.
Methodology/Principal Findings: Sequences and secondary structures from 591 chlorophycean, 741
trebouxiophycean and 938 ulvophycean algae, all obtained from the ITS2 Database, were aligned using
a sequence-structure-specific scoring matrix. Phylogenetic relationships were reconstructed by Profile
Neighbor-Joining coupled with a sequence-structure specific, general time reversible substitution
model. Results from analyses of the ITS2 data were robust at multiple nodes and showed considerable
congruence with results from published phylogenetic analyses.
Conclusions/Significance: Our observations on the power of automated, sequence-structure analyses
of ITS2 to reconstruct phylum-level phylogenies of the green algae validate this approach to assessing
diversity for large sets of chlorophytan taxa.
Abstract: Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.
Abstract: Background:
Hypnales comprise over 50% of all pleurocarpous mosses. They provide a young radiation complicating phylogenetic analyses. To resolve the hypnalean phylogeny, it is necessary to use a phylogenetic marker providing highly variable features to resolve species on the one hand and conserved features enabling a backbone analysis on the other. Therefore we used highly variable internal transcribed spacer 2 (ITS2) sequences and conserved secondary structures, as deposited with the ITS2 Database, simultaneously.
Findings:
We built an accurate and in parts robustly resolved large scale phylogeny for 1,634 currently available hypnalean ITS2 sequence-structure pairs.
Conclusions:
Profile Neighbor-Joining revealed a possible hypnalean backbone, indicating that most of the hypnalean taxa classified as different moss families are polyphyletic assemblages awaiting taxonomic changes.