Abstract: The His274→Tyr (H274Y) oseltamivir (Tamiflu) resistance mutation causes a substantial decrease in the total levels of surface-expressed neuraminidase protein and activity in early isolates of human seasonal H1N1 influenza, and in the swine-origin pandemic H1N1. In seasonal H1N1, H274Y only became widespread after the occurrence of secondary mutations that counteracted this decrease. H274Y is currently rare in pandemic H1N1, and it remains unclear whether secondary mutations exist that might similarly counteract the decreased neuraminidase surface expression associated with this resistance mutation in pandemic H1N1. Here we investigate the possibility of predicting such secondary mutations. We first test the ability of several computational approaches to retrospectively identify the secondary mutations that enhanced levels of surface-expressed neuraminidase protein and activity in seasonal H1N1 shortly before the emergence of oseltamivir resistance. We then use the most successful computational approach to predict a set of candidate secondary mutations to the pandemic H1N1 neuraminidase. We experimentally screen these mutations, and find that several of them do indeed partially counteract the decrease in neuraminidase surface expression caused by H274Y. Two of the secondary mutations together restore surface-expressed neuraminidase activity to wildtype levels, and also eliminate the very slight decrease in viral growth in tissue-culture caused by H274Y. Our work therefore demonstrates a combined computational-experimental approach for identifying mutations that enhance neuraminidase surface expression, and describes several specific mutations with the potential to be of relevance to the spread of oseltamivir resistance in pandemic H1N1.
Abstract: Resistance to oseltamivir, the most widely used influenza antiviral drug, spread to fixation in seasonal influenza A(H1N1) between 2006 and 2009. This sudden rise in resistance seemed puzzling given the low overall level of the oseltamivir usage and the lack of a correlation between local rates of resistance and oseltamivir usage. We used a stochastic simulation model and deterministic approximations to examine how such events can occur, and in particular to determine how the rate of fixation of the resistant strain depends both on its fitness in untreated hosts as well as the frequency of antiviral treatment. We found that, for the levels of antiviral usage in the population, the resistant strain will eventually spread to fixation, if it is not attenuated in transmissibility relative to the drug-sensitive strain, but not at the speed observed in seasonal H1N1. The extreme speed with which the resistance spread in seasonal H1N1 suggests that the resistant strain had a transmission advantage in untreated hosts, and this could have arisen from genetic hitchhiking, or from the mutations responsible for resistance and compensation. Importantly, our model also shows that resistant virus will fail to spread if it is even slightly less transmissible than its sensitive counterpart-a finding of relevance given that resistant pandemic influenza (H1N1) 2009 may currently suffer from a small, but nonetheless experimentally perceptible reduction in transmissibility.
Abstract: Consensus design is an appealing strategy for the stabilization of proteins. It exploits amino acid conservation in sets of homologous proteins to identify likely beneficial mutations. Nevertheless, its success depends on the phylogenetic diversity of the sequence set available. Here, we show that randomization of a single protein represents a reliable alternative source of sequence diversity that is essentially free of phylogenetic bias. A small number of functional protein sequences selected from binary-patterned libraries suffice as input for the consensus design of active enzymes that are easier to produce and substantially more stable than individual members of the starting data set. Although catalytic activity correlates less consistently with sequence conservation in these extensively randomized proteins, less extreme mutagenesis strategies might be adopted in practice to augment stability while maintaining function.
Abstract: The His274-->Tyr274 (H274Y) mutation confers oseltamivir resistance on N1 influenza neuraminidase but had long been thought to compromise viral fitness. However, beginning in 2007-2008, viruses containing H274Y rapidly became predominant among human seasonal H1N1 isolates. We show that H274Y decreases the amount of neuraminidase that reaches the cell surface and that this defect can be counteracted by secondary mutations that also restore viral fitness. Two such mutations occurred in seasonal H1N1 shortly before the widespread appearance of H274Y. The evolution of oseltamivir resistance was therefore enabled by "permissive" mutations that allowed the virus to tolerate subsequent occurrences of H274Y. An understanding of this process may provide a basis for predicting the evolution of oseltamivir resistance in other influenza strains.
Abstract: One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (DeltaDeltaG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution.
Abstract: Directed evolution is a widely-used engineering strategy for improving the stabilities or biochemical functions of proteins by repeated rounds of mutation and selection. These experiments offer empirical lessons about how proteins evolve in the face of clearly-defined laboratory selection pressures. Directed evolution has revealed that single amino acid mutations can enhance properties such as catalytic activity or stability and that adaptation can often occur through pathways consisting of sequential beneficial mutations. When there are no single mutations that improve a particular protein property experiments always find a wealth of mutations that are neutral with respect to the laboratory-defined measure of fitness. These neutral mutations can open new adaptive pathways by at least 2 different mechanisms. Functionally-neutral mutations can enhance a protein's stability, thereby increasing its tolerance for subsequent functionally beneficial but destabilizing mutations. They can also lead to changes in "promiscuous" functions that are not currently under selective pressure, but can subsequently become the starting points for the adaptive evolution of new functions. These lessons about the coupling between adaptive and neutral protein evolution in the laboratory offer insight into the evolution of proteins in nature.
Abstract: Naturally evolving proteins gradually accumulate mutations while continuing to fold to stable structures. This process of neutral evolution is an important mode of genetic change and forms the basis for the molecular clock. We present a mathematical theory that predicts the number of accumulated mutations, the index of dispersion, and the distribution of stabilities in an evolving protein population from knowledge of the stability effects (delta deltaG values) for single mutations. Our theory quantitatively describes how neutral evolution leads to marginally stable proteins and provides formulas for calculating how fluctuations in stability can overdisperse the molecular clock. It also shows that the structural influences on the rate of sequence evolution observed in earlier simulations can be calculated using just the single-mutation delta deltaG values. We consider both the case when the product of the population size and mutation rate is small and the case when this product is large, and show that in the latter case the proteins evolve excess mutational robustness that is manifested by extra stability and an increase in the rate of sequence evolution. All our theoretical predictions are confirmed by simulations with lattice proteins. Our work provides a mathematical foundation for understanding how protein biophysics shapes the process of evolution.
Abstract: Many of the mutations accumulated by naturally evolving proteins are neutral in the sense that they do not significantly alter a protein's ability to perform its primary biological function. However, new protein functions evolve when selection begins to favor other, "promiscuous" functions that are incidental to a protein's original biological role. If mutations that are neutral with respect to a protein's primary biological function cause substantial changes in promiscuous functions, these mutations could enable future functional evolution.
Abstract: An important question is whether evolution favors properties such as mutational robustness or evolvability that do not directly benefit any individual but can influence the course of future evolution. Functionally similar proteins can differ substantially in their robustness to mutations and capacity to evolve new functions, but it has remained unclear whether any of these differences might be due to evolutionary selection for these properties.
Abstract: Oligopeptide repeats appear in many proteins that undergo conformational conversions to form amyloid, including the mammalian prion protein PrP and the yeast prion protein Sup35. Whereas the repeats in PrP have been studied more exhaustively, interpretation of these studies is confounded by the fact that many details of the PrP prion conformational conversion are not well understood. On the other hand, there is now a relatively good understanding of the factors that guide the conformational conversion of the Sup35 prion protein. To provide a general model for studying the role of oligopeptide repeats in prion conformational conversion and amyloid formation, we have substituted various numbers of the PrP octarepeats for the endogenous Sup35 repeats. The resulting chimeric proteins can adopt the [PSI+] prion state in yeast, and the stability of the prion state depends on the number of repeats. In vitro, these chimeric proteins form amyloid fibers, with more repeats leading to shorter lag phases and faster assembly rates. Both pH and the presence of metal ions modulate assembly kinetics of the chimeric proteins, and the extent of modulation is highly sensitive to the number of PrP repeats. This work offers new insight into the properties of the PrP octarepeats in amyloid assembly and prion formation. It also reveals new features of the yeast prion protein, and provides a level of control over yeast prion assembly that will be useful for future structural studies and for creating amyloid-based biomaterials.
Abstract: Thermostable enzymes combine catalytic specificity with the toughness required to withstand industrial reaction conditions. Stabilized enzymes also provide robust starting points for evolutionary improvement of other protein properties. We recently created a library of at least 2,300 new active chimeras of the biotechnologically important cytochrome P450 enzymes. Here we show that a chimera's thermostability can be predicted from the additive contributions of its sequence fragments. Based on these predictions, we constructed a family of 44 novel thermostable P450s with half-lives of inactivation at 57 degrees C up to 108 times that of the most stable parent. Although they differ by as many as 99 amino acids from any known P450, the stable sequences are catalytically active. Among the novel functions they exhibit is the ability to produce drug metabolites. This chimeric P450 family provides a unique ensemble for biotechnological applications and for studying sequence-stability-function relationships.
Abstract: The biophysical properties that enable proteins to so readily evolve to perform diverse biochemical tasks are largely unknown. Here, we show that a protein's capacity to evolve is enhanced by the mutational robustness conferred by extra stability. We use simulations with model lattice proteins to demonstrate how extra stability increases evolvability by allowing a protein to accept a wider range of beneficial mutations while still folding to its native structure. We confirm this view experimentally by mutating marginally stable and thermostable variants of cytochrome P450 BM3. Mutants of the stabilized parent were more likely to exhibit new or improved functions. Only the stabilized P450 parent could tolerate the highly destabilizing mutations needed to confer novel activities such as hydroxylating the antiinflammatory drug naproxen. Our work establishes a crucial link between protein stability and evolution. We show that we can exploit this link to discover protein functions, and we suggest how natural evolution might do the same.
Abstract: We investigate how a protein's structure influences the rate at which its sequence evolves. Our basic hypothesis is that proteins with highly designable structures (structures that are encoded by many sequences) will evolve more rapidly. Recent theoretical advances argue that structures with a higher density of interresidue contacts are more designable, and we show that high contact density is correlated with an increased rate of sequence evolution in yeast. In addition, we investigate the correlations between the rate of sequence evolution and several other structural descriptors, carefully controlling for the strong effect of expression level on evolutionary rate. Overall, we find that the structural descriptors that we consider appear to explain roughly 10% of the variation in rates of protein evolution in yeast. We also show that despite the well-known trend for buried residues to be more conserved, proteins with a higher fraction of buried residues, nonetheless, tend to evolve their sequences more rapidly. We suggest that this effect is due to the increased designability of structures with more buried residues. Our results provide evidence that protein structure plays an important role in shaping the rate of sequence evolution and provide evidence to support recent theoretical advances linking structural designability to contact density.
Abstract: Creating artificial protein families affords new opportunities to explore the determinants of structure and biological function free from many of the constraints of natural selection. We have created an artificial family comprising 3,000 P450 heme proteins that correctly fold and incorporate a heme cofactor by recombining three cytochromes P450 at seven crossover locations chosen to minimize structural disruption. Members of this protein family differ from any known sequence at an average of 72 and by as many as 109 amino acids. Most (>73%) of the properly folded chimeric P450 heme proteins are catalytically active peroxygenases; some are more thermostable than the parent proteins. A multiple sequence alignment of 955 chimeras, including both folded and not, is a valuable resource for sequence-structure-function studies. Logistic regression analysis of the multiple sequence alignment identifies key structural contributions to cytochrome P450 heme incorporation and peroxygenase activity and suggests possible structural differences between parents CYP102A1 and CYP102A2.
Abstract: We present a simple theory that uses thermodynamic parameters to predict the probability that a protein retains the wild-type structure after one or more random amino acid substitutions. Our theory predicts that for large numbers of substitutions the probability that a protein retains its structure will decline exponentially with the number of substitutions, with the severity of this decline determined by properties of the structure. Our theory also predicts that a protein can gain extra robustness to the first few substitutions by increasing its thermodynamic stability. We validate our theory with simulations on lattice protein models and by showing that it quantitatively predicts previously published experimental measurements on subtilisin and our own measurements on variants of TEM1 beta-lactamase. Our work unifies observations about the clustering of functional proteins in sequence space, and provides a basis for interpreting the response of proteins to substitutions in protein engineering applications.
Abstract: Directed evolution is a common technique to engineer enzymes for a diverse set of applications. Structural information and an understanding of how proteins respond to mutation and recombination are being used to develop improved directed evolution strategies by increasing the probability that mutant sequences have the desired properties. Strategies that target mutagenesis to particular regions of a protein or use recombination to introduce large sequence changes can complement full-gene random mutagenesis and pave the way to achieving ever more ambitious enzyme engineering goals.
Abstract: Much recent work has explored molecular and population-genetic constraints on the rate of protein sequence evolution. The best predictor of evolutionary rate is expression level, for reasons that have remained unexplained. Here, we hypothesize that selection to reduce the burden of protein misfolding will favor protein sequences with increased robustness to translational missense errors. Pressure for translational robustness increases with expression level and constrains sequence evolution. Using several sequenced yeast genomes, global expression and protein abundance data, and sets of paralogs traceable to an ancient whole-genome duplication in yeast, we rule out several confounding effects and show that expression level explains roughly half the variation in Saccharomyces cerevisiae protein evolutionary rates. We examine causes for expression's dominant role and find that genome-wide tests favor the translational robustness explanation over existing hypotheses that invoke constraints on function or translational efficiency. Our results suggest that proteins evolve at rates largely unrelated to their functions and can explain why highly expressed proteins evolve slowly across the tree of life.
Abstract: We have recently proposed a thermodynamic model that predicts the tolerance of proteins to random amino acid substitutions. Here we test this model against extensive simulations with compact lattice proteins, and find that the overall performance of the model is very good. We also derive an approximate analytic expression for the fraction of mutant proteins that fold stably to the native structure, Pf(m), as a function of the number of amino acid substitutions m, and present several methods to estimate the asymptotic behavior of Pf(m) for large m. We test the accuracy of all approximations against our simulation results, and find good overall agreement between the approximations and the simulation measurements.
Abstract: Functional proteins must fold with some minimal stability to a structure that can perform a biochemical task. Here we use a simple model to investigate the relationship between the stability requirement and the capacity of a protein to evolve the function of binding to a ligand. Although our model contains no built-in tradeoff between stability and function, proteins evolved function more efficiently when the stability requirement was relaxed. Proteins with both high stability and high function evolved more efficiently when the stability requirement was gradually increased than when there was constant selection for high stability. These results show that in our model, the evolution of function is enhanced by allowing proteins to explore sequences corresponding to marginally stable structures, and that it is easier to improve stability while maintaining high function than to improve function while maintaining high stability. Our model also demonstrates that even in the absence of a fundamental biophysical tradeoff between stability and function, the speed with which function can evolve is limited by the stability requirement imposed on the protein.
Abstract: Several studies have suggested that proteins that interact with more partners evolve more slowly. The strength and validity of this association has been called into question. Here we investigate how biases in high-throughput protein-protein interaction studies could lead to a spurious correlation.