hosted by
publicationslist.org
    

Maria Anisimova


maria.anisimova@inf.ethz.ch

Journal articles

2012
Bettina E Schirrmeister, Daniel A Dalquen, Maria Anisimova, Homayoun C Bagheri (2012)  Gene copy number variation and its significance in cyanobacterial phylogeny.   BMC Microbiol 12: 1. Aug  
Abstract: ABSTRACT: BACKGROUND: In eukaryotes, variation in gene copy numbers is often associated with deleterious effects, but may also have positive effects. For prokaryotes, studies on gene copy number variation are rare. Previous studies have suggested that high numbers of rRNA gene copies can be advantageous in environments with changing resource availability, but further association of gene copies and phenotypic traits are not documented. We used one of the morphologically most diverse prokaryotic phyla to test whether numbers of gene copies are associated with levels of cell differentiation. RESULTS: We implemented a search algorithm that identified 44 genes with highly conserved copies across 22 fully sequenced cyanobacterial taxa. For two very basal cyanobacterial species, Gloeobacter violaceus and a thermophilic Synechococcus species, distinct phylogenetic positions previously found were supported by identical protein coding gene copy numbers. Furthermore, we found that increased ribosomal gene copy numbers showed a strong correlation to cyanobacteria capable of terminal cell differentiation. Additionally, we detected extremely low variation of 16S rRNA sequence copies within the cyanobacteria. We compared our results for 16S rRNA to three other eubacterial phyla (Chroroflexi, Spirochaetes and Bacteroidetes). Based on Bayesian phylogenetic inference and the comparisons of genetic istances, we could confirm that cyanobacterial 16S rRNA paralogs and orthologs show significantly stronger conservation than found in other eubacterial phyla. Conclusions: A higher number of ribosomal operons could potentially provide an advantage to terminally differentiated cyanobacteria. Furthermore, we suggest that 16S rRNA gene copies in cyanobacteria are homogenized by both concerted evolution and purifying selection. In addition, the small ribosomal subunit in cyanobacteria appears to evolve at extraordinary slow evolutionary rates, an observation that has been made previously for morphological characteristics of cyanobacteria.
Notes:
Carolin Kosiol, Maria Anisimova (2012)  Selection on the protein-coding genome.   Methods Mol Biol 856: 113-140  
Abstract: Populations evolve as mutations arise in individual organisms and, through hereditary transmission, may become "fixed" (shared by all individuals) in the population. Most mutations are lethal or have negative fitness consequences for the organism. Others have essentially no effect on organismal fitness and can become fixed through the neutral stochastic process known as random drift. However, mutations may also produce a selective advantage that boosts their chances of reaching fixation. Regions of genes where new mutations are beneficial, rather than neutral or deleterious, tend to evolve more rapidly due to positive selection. Genes involved in immunity and defense are a well-known example; rapid evolution in these genes presumably occurs because new mutations help organisms to prevail in evolutionary "arms races" with pathogens. In recent years, genome-wide scans for selection have enlarged our understanding of the evolution of the protein-coding regions of the various species. In this chapter, we focus on the methods to detect selection in protein-coding genes. In particular, we discuss probabilistic models and how they have changed with the advent of new genome-wide data now available.
Notes:
Elke Schaper, Andrey V Kajava, Alain Hauser, Maria Anisimova (2012)  Repeat or not repeat?--Statistical validation of tandem repeat prediction in genomic sequences.   Nucleic Acids Res Aug  
Abstract: Tandem repeats (TRs) represent one of the most prevalent features of genomic sequences. Due to their abundance and functional significance, a plethora of detection tools has been devised over the last two decades. Despite the longstanding interest, TR detection is still not resolved. Our large-scale tests reveal that current detectors produce different, often nonoverlapping inferences, reflecting characteristics of the underlying algorithms rather than the true distribution of TRs in genomic data. Our simulations show that the power of detecting TRs depends on the degree of their divergence, and repeat characteristics such as the length of the minimal repeat unit and their number in tandem. To reconcile the diverse predictions of current algorithms, we propose and evaluate several statistical criteria for measuring the quality of predicted repeat units. In particular, we propose a model-based phylogenetic classifier, entailing a maximum-likelihood estimation of the repeat divergence. Applied in conjunction with the state of the art detectors, our statistical classification scheme for inferred repeats allows to filter out false-positive predictions. Since different algorithms appear to specialize at predicting TRs with certain properties, we advise applying multiple detectors with subsequent filtering to obtain the most complete set of genuine repeats.
Notes:
Daniel A Dalquen, Maria Anisimova, Gaston H Gonnet, Christophe Dessimoz (2012)  ALF--a simulation framework for genome evolution.   Mol Biol Evol 29: 4. 1115-1123 Apr  
Abstract: In computational evolutionary biology, verification and benchmarking is a challenging task because the evolutionary history of studied biological entities is usually not known. Computer programs for simulating sequence evolution in silico have shown to be viable test beds for the verification of newly developed methods and to compare different algorithms. However, current simulation packages tend to focus either on gene-level aspects of genome evolution such as character substitutions and insertions and deletions (indels) or on genome-level aspects such as genome rearrangement and speciation events. Here, we introduce Artificial Life Framework (ALF), which aims at simulating the entire range of evolutionary forces that act on genomes: nucleotide, codon, or amino acid substitution (under simple or mixture models), indels, GC-content amelioration, gene duplication, gene loss, gene fusion, gene fission, genome rearrangement, lateral gene transfer (LGT), or speciation. The other distinctive feature of ALF is its user-friendly yet powerful web interface. We illustrate the utility of ALF with two possible applications: 1) we reanalyze data from a study of selection after globin gene duplication and test the statistical significance of the original conclusions and 2) we demonstrate that LGT can dramatically decrease the accuracy of two well-established orthology inference methods. ALF is available as a stand-alone application or via a web interface at http://www.cbrg.ethz.ch/alf.
Notes:
2011
Philippe Remigi, Maria Anisimova, Alice Guidot, StĂ©phane Genin, Nemo Peeters (2011)  Functional diversification of the GALA type III effector family contributes to Ralstonia solanacearum adaptation on different plant hosts.   New Phytol Sep  
Abstract: • Type III effectors from phytopathogenic bacteria exhibit a high degree of functional redundancy, hampering the evaluation of their precise contribution to pathogenicity. This is illustrated by the GALA type III effectors from Ralstonia solanacearum, which have been shown to be collectively, but not individually, required for disease on Arabidopsis thaliana and tomato. We investigated evolution, redundancy and diversification of this family in order to understand the individual contribution of the GALA effectors to pathogenicity. • From sequences available, we reconstructed GALA phylogeny and performed selection studies. We then focused on the GALAs from the reference strain GMI1000 to examine their ability to suppress plant defense responses and contribution to pathogenicity on three different host plants: A. thaliana, tomato (Lycopersicum esculentum) and eggplant (Solanum melongena). • The GALA family is well conserved within R. solanacearum species. Patterns of selection detected on some GALA family members, together with experimental results, show that GALAs underwent functional diversification. • We conclude that functional divergence of the GALA family likely accounts for its remarkable conservation during R. solanacearum evolution and could contribute to R. solanacearum's adaptation on several host plants.
Notes:
Evgeniy S Balakirev, Maria Anisimova, Francisco J Ayala (2011)  Complex interplay of evolutionary forces in the ladybird homeobox genes of Drosophila melanogaster.   PLoS One 6: 7. 07  
Abstract: Tandemly arranged paralogous genes lbe and lbl are members of the Drosophila NK homeobox family. We analyzed population samples of Drosophila melanogaster from Africa, Europe, North and South America, and single strains of D. sechellia, D. simulans, and D. yakuba within two linked regions encompassing partial sequences of lbe and lbl. The evolution of lbe and lbl is highly constrained due to their important regulatory functions. Despite this, a variety of forces have shaped the patterns of variation in lb genes: recombination, intragenic gene conversion and natural selection strongly influence background variation created by linkage disequilibrium and dimorphic haplotype structure. The two genes exhibited similar levels of nucleotide diversity and positive selection was detected in the noncoding regions of both genes. However, synonymous variability was significantly higher for lbe: no nonsynonymous changes were observed in this gene. We argue that balancing selection impacts some synonymous sites of the lbe gene. Stability of mRNA secondary structure was significantly different between the lbe (but not lbl) haplotype groups and may represent a driving force of balancing selection in epistatically interacting synonymous sites. Balancing selection on synonymous sites may be the first, or one of a few such observations, in Drosophila. In contrast, recurrent positive selection on lbl at the protein level influenced evolution at three codon sites. Transcription factor binding-site profiles were different for lbe and lbl, suggesting that their developmental functions are not redundant. Combined with our previous results on nucleotide variation in esterase and other homeobox genes, these results suggest that interplay of balancing and directional selection may be a general feature of molecular evolution in Drosophila and other eukaryote genomes.
Notes:
Mingcong Wang, Maxim V Kapralov, Maria Anisimova (2011)  Coevolution of amino acid residues in the key photosynthetic enzyme Rubisco.   BMC Evol Biol 11: 09  
Abstract: One of the key forces shaping proteins is coevolution of amino acid residues. Knowing which residues coevolve in a particular protein may facilitate our understanding of protein evolution, structure and function, and help to identify substitutions that may lead to desired changes in enzyme kinetics. Rubisco, the most abundant enzyme in biosphere, plays an essential role in the process of carbon fixation through photosynthesis, thus facilitating life on Earth. This makes Rubisco an important model system for studying the dynamics of protein fitness optimization on the evolutionary landscape. In this study we investigated the selective and coevolutionary forces acting on large subunit of land plants Rubisco using Markov models of codon substitution and clustering approaches applied to amino acid substitution histories.
Notes:
Bettina E Schirrmeister, Maria Anisimova, Alexandre Antonelli, Homayoun C Bagheri (2011)  Evolution of cyanobacterial morphotypes: Taxa required for improved phylogenomic approaches.   Commun Integr Biol 4: 4. 424-427 Jul  
Abstract: Within prokaryotes cyanobacteria represent one of the oldest and morphologically most diverse phyla on Earth. The rise of oxygen levels in the atmosphere 2.32-2.45 billion years ago is assigned to the photosynthetic activity of ancestors from this phylum. Subsequently cyanobacteria were able to adapt to various habitats evolving a comprehensive set of different morphotypes. In a recent study we showed that this evolution is not a gradual transition from simple unicellular to more complex multicellular forms as often assumed. Instead complexity was lost several times and regained at least once. An understanding of the genetic basis of these transitions would be further strengthened by phylogenomic approaches. However, considering that new methods for phylogenomic analyses are emerging, it is unfortunate that genomes available today are comprised of an unbalanced sampling of taxa. We propose avenues to remedy this by identifying taxa that would improve the representation of phylogenetic diversity in this phylum.
Notes:
Adam M Szalkowski, Maria Anisimova (2011)  Markov models of amino acid substitution to study proteins with intrinsically disordered regions.   PLoS One 6: 5. 05  
Abstract: Intrinsically disordered proteins (IDPs) or proteins with disordered regions (IDRs) do not have a well-defined tertiary structure, but perform a multitude of functions, often relying on their native disorder to achieve the binding flexibility through changing to alternative conformations. Intrinsic disorder is frequently found in all three kingdoms of life, and may occur in short stretches or span whole proteins. To date most studies contrasting the differences between ordered and disordered proteins focused on simple summary statistics. Here, we propose an evolutionary approach to study IDPs, and contrast patterns specific to ordered protein regions and the corresponding IDRs.
Notes:
Maria Anisimova, Manuel Gil, Jean-François Dufayard, Christophe Dessimoz, Olivier Gascuel (2011)  Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes.   Syst Biol 60: 5. 685-699 Oct  
Abstract: Phylogenetic inference and evaluating support for inferred relationships is at the core of many studies testing evolutionary hypotheses. Despite the popularity of nonparametric bootstrap frequencies and Bayesian posterior probabilities, the interpretation of these measures of tree branch support remains a source of discussion. Furthermore, both methods are computationally expensive and become prohibitive for large data sets. Recent fast approximate likelihood-based measures of branch supports (approximate likelihood ratio test [aLRT] and Shimodaira-Hasegawa [SH]-aLRT) provide a compelling alternative to these slower conventional methods, offering not only speed advantages but also excellent levels of accuracy and power. Here we propose an additional method: a Bayesian-like transformation of aLRT (aBayes). Considering both probabilistic and frequentist frameworks, we compare the performance of the three fast likelihood-based methods with the standard bootstrap (SBS), the Bayesian approach, and the recently introduced rapid bootstrap. Our simulations and real data analyses show that with moderate model violations, all tests are sufficiently accurate, but aLRT and aBayes offer the highest statistical power and are very fast. With severe model violations aLRT, aBayes and Bayesian posteriors can produce elevated false-positive rates. With data sets for which such violation can be detected, we recommend using SH-aLRT, the nonparametric version of aLRT based on a procedure similar to the Shimodaira-Hasegawa tree selection. In general, the SBS seems to be excessively conservative and is much slower than our approximate likelihood-based methods.
Notes:
2010
StĂ©phane Guindon, Jean-François Dufayard, Vincent Lefort, Maria Anisimova, Wim Hordijk, Olivier Gascuel (2010)  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.   Syst Biol 59: 3. 307-321 May  
Abstract: PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.
Notes:
M Anisimova, G M Cannarozzi, D A Liberles (2010)  Finding the balance between the mathematical and biological optima in multiple sequence alignment. link   Trends Evol. Biol. 2: 1. e7  
Abstract: Recent advances in evolutionary modelling and alignment methodology enable alignment of sequences with special features and incorporate structural and functional information. However, our reviewing experience and a recent study by Morrison suggest that these newer methods are under-utilized (especially in the communities of molecular systematics and experimental biology), and the resulting alignments are often curated manually. Most often, no clear biological reasoning is invoked during manual alignment; instead only aesthetic qualities are considered, as measured by eye. Such subjectivity is not consistent with core scientific principles. Although we recognize that methodological problems still exist, computerized alignment methods are currently more realistic and can model a variety of evolutionary mechanisms. We also suggest future directions for the further improvement of automatic alignment methods based upon disconnects of existing methods with underlying biological mechanisms.
Notes:
S Dimitrieva, M Anisimova (2010)  PANDITplus: toward better integration of evolutionary view on molecular sequences with supplementary bioinformatics resources.   Trends Evol Biol 2(1): e1  
Abstract: Recent comparative genomic and other large-scale bioinformatics studies increasingly have been using gene annotations, functional classifications, and complementary data from the emerging “-omics” disciplines. Indeed, such analyses have better chances to uncover hidden patterns in complex multidimensional and heterogeneous biological systems data. On the other hand, inferences from such studies are extremely sensitive to data samples and quality, and are more difficult to compare or replicate owing to differences in supplementary data sources at times not publicly available. As a contribution toward the unification and integration of good quality data from heterogeneous bioinformatics resources, we present here an integrated data bank PANDITplus. It is built as an extension of PANDIT, the database of PFAM alignments and phylogenetic trees for known protein domains and families spanning lineages from the three domains of life. PANDITplus is a relational database containing information on functional categories, metabolic pathways, protein–protein interactions, disease associations, gene expression, three-dimensional structure, as well as estimates from evolutionary analyses of selective pressures. User-friendly interface enables customized queries and fast data access. We recommend PANDITplus as a common bioinformatics platform for testing evolutionary hypotheses, which go beyond the mere inferences from molecular data by incorporating supplementary gene information. Equally, PANDITplus provides an excellent resource for the development, testing, and comparison of statistical models of substitution and probabilistic dependencies between a molecular sequence and its various attributes. The database may be accessed via http://www.panditplus.org.
Notes:
2009
Maria Anisimova, Carolin Kosiol (2009)  Investigating protein-coding sequence evolution with probabilistic codon substitution models.   Mol Biol Evol 26: 2. 255-271 Feb  
Abstract: This review is motivated by the true explosion in the number of recent studies both developing and ameliorating probabilistic models of codon evolution. Traditionally parametric, the first codon models focused on estimating the effects of selective pressure on the protein via an explicit parameter in the maximum likelihood framework. Likelihood ratio tests of nested codon models armed the biologists with powerful tools, which provided unambiguous evidence for positive selection in real data. This, in turn, triggered a new wave of methodological developments. The new generation of models views the codon evolution process in a more sophisticated way, relaxing several mathematical assumptions. These models make a greater use of physicochemical amino acid properties, genetic code machinery, and the large amounts of data from the public domain. The overview of the most recent advances on modeling codon evolution is presented here, and a wide range of their applications to real data is discussed. On the downside, availability of a large variety of models, each accounting for various biological factors, increases the margin for misinterpretation; the biological meaning of certain parameters may vary among models, and model selection procedures also deserve greater attention. Solid understanding of the modeling assumptions and their applicability is essential for successful statistical data analysis.
Notes:
2008
Andrey V Kajava, Maria Anisimova, Nemo Peeters (2008)  Origin and evolution of GALA-LRR, a new member of the CC-LRR subfamily: from plants to bacteria?   PLoS ONE 3: 2. 02  
Abstract: The phytopathogenic bacterium Ralstonia solanacearum encodes type III effectors, called GALA proteins, which contain F-box and LRR domains. The GALA LRRs do not perfectly fit any of the previously described LRR subfamilies. By applying protein sequence analysis and structural prediction, we clarify this ambiguous case of LRR classification and assign GALA-LRRs to CC-LRR subfamily. We demonstrate that side-by-side packing of LRRs in the 3D structures may control the limits of repeat variability within the LRR subfamilies during evolution. The LRR packing can be used as a criterion, complementing the repeat sequences, to classify newly identified LRR domains. Our phylogenetic analysis of F-box domains proposes the lateral gene transfer of bacterial GALA proteins from host plants. We also present an evolutionary scenario which can explain the transformation of the original plant LRRs into slightly different bacterial LRRs. The examination of the selective evolutionary pressure acting on GALA proteins suggests that the convex side of their horse-shoe shaped LRR domains is more prone to positive selection than the concave side, and we therefore hypothesize that the convex surface might be the site of protein binding relevant to the adaptor function of the F-box GALA proteins. This conclusion provides a strong background for further functional studies aimed at determining the role of these type III effectors in the virulence of R. solanacearum.
Notes:
2007
M Anisimova, D A Liberles (2007)  The quest for natural selection in the age of comparative genomics.   Heredity 99: 6. 567-579 Dec  
Abstract: Continued genome sequencing has fueled progress in statistical methods for understanding the action of natural selection at the molecular level. This article reviews various statistical techniques (and their applicability) for detecting adaptation events and the functional divergence of proteins. As large-scale automated studies become more frequent, they provide a useful resource for generating biological null hypotheses for further experimental and statistical testing. Furthermore, they shed light on typical patterns of lineage-specific evolution of organisms, on the functional and structural evolution of protein families and on the interplay between the two. More complex models are being developed to better reflect the underlying biological and chemical processes and to complement simpler statistical models. Linking molecular processes to their statistical signatures in genomes can be demanding, and the proper application of statistical models is discussed.
Notes:
Maria Anisimova, Joseph Bielawski, Katherine Dunn, Ziheng Yang (2007)  Phylogenomic analysis of natural selection pressure in Streptococcus genomes.   BMC Evol Biol 7: 08  
Abstract: BACKGROUND: In comparative analyses of bacterial pathogens, it has been common practice to discriminate between two types of genes: (i) those shared by pathogens and their non-pathogenic relatives (core genes), and (ii) those found exclusively in pathogens (pathogen-specific accessory genes). Rather than attempting to a priori delineate genes into sets more or less relevant to pathogenicity, we took a broad approach to the analysis of Streptococcus species by investigating the strength of natural selection in all clusters of homologous genes. The genus Streptococcus is comprised of a wide variety of both pathogenic and commensal lineages, and we relate our findings to the pre-existing knowledge of Streptococcus virulence factors. RESULTS: Our analysis of 1730 gene clusters revealed 136 cases of positive Darwinian selection, which we suggest is most likely to result from an antagonistic interaction between the host and pathogen at the molecular level. A two-step validation procedure suggests that positive selection was robustly identified in our genomic survey. We found no evidence to support the notion that pathogen specific accessory genes are more likely to be subject to positive selection than core genes. Indeed, we even uncovered a few cases of essential gene evolution by positive selection. Among the gene clusters subject to positive selection, a large fraction (29%) can be connected to virulence. The most striking finding was that a considerable fraction of the positively selected genes are also known to have tissue specific patterns of expression during invasive disease. As current expression data is far from comprehensive, we suggest that this fraction was underestimated. CONCLUSION: Our findings suggest that pathogen specific genes, although a popular focus of research, do not provide a complete picture of the evolutionary dynamics of virulence. The results of this study, and others, support the notion that the products of both core and accessory genes participate in complex networks that comprise the molecular basis of virulence. Future work should seek to understand the evolutionary dynamics of both core and accessory genes as a function of the networks in which they participate.
Notes:
Maria Anisimova, Ziheng Yang (2007)  Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites.   Mol Biol Evol 24: 5. 1219-1228 May  
Abstract: Detection of positive Darwinian selection has become ever more important with the rapid growth of genomic data sets. Recent branch-site models of codon substitution account for variation of selective pressure over branches on the tree and across sites in the sequence and provide a means to detect short episodes of molecular adaptation affecting just a few sites. In likelihood ratio tests based on such models, the branches to be tested for positive selection have to be specified a priori. In the absence of a biological hypothesis to designate so-called foreground branches, one may test many branches, but a correction for multiple testing becomes necessary. In this paper, we employ computer simulation to evaluate the performance of 6 multiple test correction procedures when the branch-site models are used to test every branch on the phylogeny for positive selection. Four of the methods control the familywise error rates (FWERs), whereas the other 2 control the false discovery rate (FDR). We found that all correction procedures achieved acceptable FWER except for extremely divergent sequences and serious model violations, when the test may become unreliable. The power of the test to detect positive selection is influenced by the strength of selection and the sequence divergence, with the highest power observed at intermediate divergences. The 4 correction procedures that control the FWER had similar power. We recommend Rom's procedure for its slightly higher power, but the simple Bonferroni correction is useable as well. The 2 correction procedures that control the FDR had slightly more power and also higher FWER. We demonstrate the multiple test procedures by analyzing gene sequences from the extracellular domain of the cluster of differentiation 2 (CD2) gene from 10 mammalian species. Both our simulation and real data analysis suggest that the multiple test procedures are useful when multiple branches have to be tested on the same data set.
Notes:
2006
Maria Anisimova, Olivier Gascuel (2006)  Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative.   Syst Biol 55: 4. 539-552 Aug  
Abstract: We revisit statistical tests for branches of evolutionary trees reconstructed upon molecular data. A new, fast, approximate likelihood-ratio test (aLRT) for branches is presented here as a competitive alternative to nonparametric bootstrap and Bayesian estimation of branch support. The aLRT is based on the idea of the conventional LRT, with the null hypothesis corresponding to the assumption that the inferred branch has length 0. We show that the LRT statistic is asymptotically distributed as a maximum of three random variables drawn from the chi(0)2 + chi(1)2 distribution. The new aLRT of interior branch uses this distribution for significance testing, but the test statistic is approximated in a slightly conservative but practical way as 2(l1- l2), i.e., double the difference between the maximum log-likelihood values corresponding to the best tree and the second best topological arrangement around the branch of interest. Such a test is fast because the log-likelihood value l2 is computed by optimizing only over the branch of interest and the four adjacent branches, whereas other parameters are fixed at their optimal values corresponding to the best ML tree. The performance of the new test was studied on simulated 4-, 12-, and 100-taxon data sets with sequences of different lengths. The aLRT is shown to be accurate, powerful, and robust to certain violations of model assumptions. The aLRT is implemented within the algorithm used by the recent fast maximum likelihood tree estimation program PHYML (Guindon and Gascuel, 2003).
Notes:
Evgeniy S Balakirev, Maria Anisimova, Francisco J Ayala (2006)  Positive and negative selection in the beta-esterase gene cluster of the Drosophila melanogaster subgroup.   J Mol Evol 62: 4. 496-510 Apr  
Abstract: We examine the pattern of molecular evolution of the beta-esterase gene cluster, including the Est-6 and psiEst-6 genes, in eight species of the Drosophila melanogaster subgroup. Using maximum likelihood estimates of nonsynonymous/synonymous rate ratios, we show that the majority of Est-6 sites evolves under strong (48% of sites) or moderate (50% of sites) negative selection and a minority of sites (1.5%) is under significant positive selection. Est-6 sites likely to be under positive selection are associated with increased intraspecific variability. One positively selected site is responsible for the EST-6 F/S allozyme polymorphism; the same site is responsible for the EST-6 functional divergence between species of the melanogaster subgroup. For psiEst-6 83.7% sites evolve under negative selection, 16% sites evolve neutrally, and 0.3% sites are under positive selection. The positively selected sites of psiEst-6 are located at the beginning and at the end of the gene, where there is reduced divergence between D. melanogaster and D. simulans; these regions of psiEst-6 could be involved in regulation or some other function. Branch-site-specific analysis shows that the evolution of the melanogaster subgroup underwent episodic positive selection. Collating the present data with previous results for the beta-esterase genes, we propose that positive and negative selection are involved in a complex relationship that may be typical of the divergence of duplicate genes as one or both duplicates evolve a new function.
Notes:
2004
Maria Anisimova, Ziheng Yang (2004)  Molecular evolution of the hepatitis delta virus antigen gene: recombination or positive selection?   J Mol Evol 59: 6. 815-826 Dec  
Abstract: We present the statistical analysis of diversifying selective pressures on the hepatitis D antigen gene (HDAg). Thirty-three distinct HDAg sequences from subtypes I, II, and III were tested for positive selection using maximum likelihood methods based on models of codon substitution that allow variable selective pressures across sites. Such methods have been shown to be sufficiently accurate and successful in detecting positive selection in a variety of viral and nonviral protein-coding genes. About 11% of codon sites in HDAg were estimated to be under diversifying selection. Remarkably, most of the residues predicted to evolve under positive selection were located in the immunogenic domain and the N-terminus region with reported antigenic activity. These sites are potential targets of the host's immune response. Identification of residues mutating to escape immune recognition may help to distinguish the most virulent strains and aid vaccine design. Possible interplay between positive selection and recombination on the gene is discussed but no significant evidence for recombination was found.
Notes:
2003
Maria Anisimova, Rasmus Nielsen, Ziheng Yang (2003)  Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites.   Genetics 164: 3. 1229-1236 Jul  
Abstract: Maximum-likelihood methods based on models of codon substitution accounting for heterogeneous selective pressures across sites have proved to be powerful in detecting positive selection in protein-coding DNA sequences. Those methods are phylogeny based and do not account for the effects of recombination. When recombination occurs, such as in population data, no unique tree topology can describe the evolutionary history of the whole sequence. This violation of assumptions raises serious concerns about the likelihood method for detecting positive selection. Here we use computer simulation to evaluate the reliability of the likelihood-ratio test (LRT) for positive selection in the presence of recombination. We examine three tests based on different models of variable selective pressures among sites. Sequences are simulated using a coalescent model with recombination and analyzed using codon-based likelihood models ignoring recombination. We find that the LRT is robust to low levels of recombination (with fewer than three recombination events in the history of a sample of 10 sequences). However, at higher levels of recombination, the type I error rate can be as high as 90%, especially when the null model in the LRT is unrealistic, and the test often mistakes recombination as evidence for positive selection. The test that compares the more realistic models M7 (beta) against M8 (beta and omega) is more robust to recombination, where the null model M7 allows the positive selection pressure to vary between 0 and 1 (and so does not account for positive selection), and the alternative model M8 allows an additional discrete class with omega = d(N)/d(S) that could be estimated to be >1 (and thus accounts for positive selection). Identification of sites under positive selection by the empirical Bayes method appears to be less affected than the LRT by recombination.
Notes:
2002
Maria Anisimova, Joseph P Bielawski, Ziheng Yang (2002)  Accuracy and power of bayes prediction of amino acid sites under positive selection.   Mol Biol Evol 19: 6. 950-958 Jun  
Abstract: Bayes prediction quantifies uncertainty by assigning posterior probabilities. It was used to identify amino acids in a protein under recurrent diversifying selection indicated by higher nonsynonymous (d(N)) than synonymous (d(S)) substitution rates or by omega = d(N)/d(S) > 1. Parameters were estimated by maximum likelihood under a codon substitution model that assumed several classes of sites with different omega ratios. The Bayes theorem was used to calculate the posterior probabilities of each site falling into these site classes. Here, we evaluate the performance of Bayes prediction of amino acids under positive selection by computer simulation. We measured the accuracy by the proportion of predicted sites that were truly under selection and the power by the proportion of true positively selected sites that were predicted by the method. The accuracy was slightly better for longer sequences, whereas the power was largely unaffected by the increase in sequence length. Both accuracy and power were higher for medium or highly diverged sequences than for similar sequences. We found that accuracy and power were unacceptably low when data contained only a few highly similar sequences. However, sampling a large number of lineages improved the performance substantially. Even for very similar sequences, accuracy and power can be high if over 100 taxa are used in the analysis. We make the following recommendations: (1) prediction of positive selection sites is not feasible for a few closely related sequences; (2) using a large number of lineages is the best way to improve the accuracy and power of the prediction; and (3) multiple models of heterogeneous selective pressures among sites should be applied in real data analysis.
Notes:
2001
M Anisimova, J P Bielawski, Z Yang (2001)  Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution.   Mol Biol Evol 18: 8. 1585-1592 Aug  
Abstract: The selective pressure at the protein level is usually measured by the nonsynonymous/synonymous rate ratio (omega = dN/dS), with omega < 1, omega = 1, and omega > 1 indicating purifying (or negative) selection, neutral evolution, and diversifying (or positive) selection, respectively. The omega ratio is commonly calculated as an average over sites. As every functional protein has some amino acid sites under selective constraints, averaging rates across sites leads to low power to detect positive selection. Recently developed models of codon substitution allow the omega ratio to vary among sites and appear to be powerful in detecting positive selection in empirical data analysis. In this study, we used computer simulation to investigate the accuracy and power of the likelihood ratio test (LRT) in detecting positive selection at amino acid sites. The test compares two nested models: one that allows for sites under positive selection (with omega > 1), and another that does not, with the chi2 distribution used for significance testing. We found that use of the chi(2) distribution makes the test conservative, especially when the data contain very short and highly similar sequences. Nevertheless, the LRT is powerful. Although the power can be low with only 5 or 6 sequences in the data, it was nearly 100% in data sets of 17 sequences. Sequence length, sequence divergence, and the strength of positive selection also were found to affect the power of the LRT. The exact distribution assumed for the omega ratio over sites was found not to affect the effectiveness of the LRT.
Notes:

Book chapters

2012
Powered by PublicationsList.org.