hosted by
publicationslist.org
    

Minglei Wang


wml@illinois.edu

Journal articles

2010
Minglei Wang, Ying-Ying Jiang, Kyung Mo Kim, Ge Qu, Hong-Fang Ji, Jay E Mittenthal, Hong-Yu Zhang, Gustavo Caetano-Anollés (2010)  A Universal Molecular Clock of Protein Folds and its Power in Tracing the Early History of Aerobic Metabolism and Planet Oxygenation.   Mol Biol Evol Aug  
Abstract: The standard molecular clock describes a constant rate of molecular evolution and provides a powerful framework for evolutionary timescales. Here we describe the existence and implications of a molecular clock of folds, a universal recurrence in the discovery of new structures in the world of proteins. Using a phylogenomic structural census in hundreds of proteomes we build phylogenies and timelines of domains at fold and fold superfamily levels of structural complexity. These timelines correlate approximately linearly with geological timescales and were here used to date two crucial events in life history, planet oxygenation and organism diversification. We first dissected the structures and functions of enzymes in simulated metabolic networks. The placement of anaerobic and aerobic enzymes in the timeline revealed that aerobic metabolism emerged approximately 2.9 billion years (Ga) ago and expanded during a period of approximately 400 million years, reaching what is known as the Great Oxidation Event. During this period, enzymes recruited old and new folds for oxygen-mediated enzymatic activities. Remarkably, the first fold lost by a superkingdom disappeared in Archaea 2.6 Ga ago, within the span of oxygen rise, suggesting oxygen also triggered diversification of life. The implications of a molecular clock of folds are many and important for the neutral theory of molecular evolution and for understanding the growth and diversity of the protein world. The clock also extends the standard concept that was specific to molecules and their timescales and turns it into a universal timescale-generating tool.
Notes:
2009
Gustavo Caetano-Anollés, Minglei Wang, Derek Caetano-Anollés, Jay E Mittenthal (2009)  The origin, evolution and structure of the protein world.   Biochem J 417: 3. 621-637 Feb  
Abstract: Contemporary protein architectures can be regarded as molecular fossils, historical imprints that mark important milestones in the history of life. Whereas sequences change at a considerable pace, higher-order structures are constrained by the energetic landscape of protein folding, the exploration of sequence and structure space, and complex interactions mediated by the proteostasis and proteolytic machineries of the cell. The survey of architectures in the living world that was fuelled by recent structural genomic initiatives has been summarized in protein classification schemes, and the overall structure of fold space explored with novel bioinformatic approaches. However, metrics of general structural comparison have not yet unified architectural complexity using the 'shared and derived' tenet of evolutionary analysis. In contrast, a shift of focus from molecules to proteomes and a census of protein structure in fully sequenced genomes were able to uncover global evolutionary patterns in the structure of proteins. Timelines of discovery of architectures and functions unfolded episodes of specialization, reductive evolutionary tendencies of architectural repertoires in proteomes and the rise of modularity in the protein world. They revealed a biologically complex ancestral proteome and the early origin of the archaeal lineage. Studies also identified an origin of the protein world in enzymes of nucleotide metabolism harbouring the P-loop-containing triphosphate hydrolase fold and the explosive discovery of metabolic functions that recapitulated well-defined prebiotic shells and involved the recruitment of structures and functions. These observations have important implications for origins of modern biochemistry and diversification of life.
Notes:
Minglei Wang, Gustavo Caetano-Anollés (2009)  The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world.   Structure 17: 1. 66-78 Jan  
Abstract: Protein domains are compact evolutionary units of structure and function that usually combine in proteins to produce complex domain arrangements. In order to study their evolution, we reconstructed genome-based phylogenetic trees of architectures from a census of domain structure and organization conducted at protein fold and fold-superfamily levels in hundreds of fully sequenced genomes. These trees defined timelines of architectural discovery and revealed remarkable evolutionary patterns, including the explosive appearance of domain combinations during the rise of organismal lineages, the dominance of domain fusion processes throughout evolution, and the late appearance of a new class of multifunctional modules in Eukarya by fission of domain combinations. Our study provides a detailed account of the history and diversification of a molecular interactome and shows how the interplay of domain fusions and fissions defines an evolutionary mechanics of domain organization that is fundamentally responsible for the complexity of the protein world.
Notes:
2008
Gustavo Caetano-Anollés, Minglei Wang (2008)  The Protein World - evolution of protein architecture   The Biochemist 30: 1. 4-8 Feb  
Abstract: Contemporary protein architectures can be regarded as molecular fossils, historical imprints that mark important milestones in the history of life. A census of protein structure in proteomes and novel bioinformatics methods uncovered patterns and processes linked to the evolution of both proteins and proteomes that are described here. Timelines of discovery of protein architectures revealed episodes of specialization, reductive evolutionary tendencies of architectural repertoires in proteomes and the rise of modularity in the protein world. Some of these tendencies were driven by recruitment of structures and functions. Our observations have important implications for origins of modern biochemistry, modules in the protein world, and diversification of life.
Notes:
Gustavo Caetano-Anolles, Feng-Jie Sun, Minglei Wang, Liudmila S Yafremava, Ajith Harish, Hee Shin Kim, Vegeir Knudsen, Derek Caetano-Anolles, Jay E Mittenthal (2008)  Origins and evolution of modern biochemistry: insights from genomes and molecular structure.   Front Biosci 13: 5212-5240 May  
Abstract: The survey of components in living systems at different levels of organization enables an evolutionary exploration of patterns and processes in macromolecules, networks, and genomic repertoires. Here we discuss how phylogenetic strategies that generate intrinsically rooted phylogenies impact the evolutionary study of RNA and protein components of the macromolecular machinery that is responsible for biological function. We used these methods to generate timelines of discovery of components in systems, such as substructures in RNA molecules, architectures in proteomes, domains in multi-domain proteins, enzymes in metabolic networks, and protein architectures in proteomes. These timelines unfolded remarkable patterns of origin and evolution of molecules, repertoires and networks, showing episodes of both functional specialization (e.g., rise of domains with specialized functions) and molecular simplification (e.g., reductive tendencies in molecules and proteomes). These observations have important evolutionary implications for origins of translation, the genetic code, modules in the protein world, and diversification of life, and suggest early evolution of modern biochemistry was driven by recruitment of both RNA and protein catalysts in an ancient community of complex organisms.
Notes:
2007
Minglei Wang, Liudmila S Yafremava, Derek Caetano-Anollés, Jay E Mittenthal, Gustavo Caetano-Anollés (2007)  Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world.   Genome Res 17: 11. 1572-1585 Nov  
Abstract: The repertoire of protein architectures in proteomes is evolutionarily conserved and capable of preserving an accurate record of genomic history. Here we use a census of protein architecture in 185 genomes that have been fully sequenced to generate genome-based phylogenies that describe the evolution of the protein world at fold (F) and fold superfamily (FSF) levels. The patterns of representation of F and FSF architectures over evolutionary history suggest three epochs in the evolution of the protein world: (1) architectural diversification, where members of an architecturally rich ancestral community diversified their protein repertoire; (2) superkingdom specification, where superkingdoms Archaea, Bacteria, and Eukarya were specified; and (3) organismal diversification, where F and FSF specific to relatively small sets of organisms appeared as the result of diversification of organismal lineages. Functional annotation of FSF along these architectural chronologies revealed patterns of discovery of biological function. Most importantly, the analysis identified an early and extensive differential loss of architectures occurring primarily in Archaea that segregates the archaeal lineage from the ancient community of organisms and establishes the first organismal divide. Reconstruction of phylogenomic trees of proteomes reflects the timeline of architectural diversification in the emerging lineages. Thus, Archaea undertook a minimalist strategy using only a small subset of the full architectural repertoire and then crystallized into a diversified superkingdom late in evolution. Our analysis also suggests a communal ancestor to all life that was molecularly complex and adopted genomic strategies currently present in Eukarya.
Notes:
2006
Wang, Simina Maria Boca, Rakhee Alelkar, Jay E Mittenthal, Gustavo Caetano-Anollés (2006)  A phylogenomic reconstruction of the protein world based on a genomic census of protein fold architecture   Complexity 12: 27-40  
Abstract: The protein world has a hierarchical and redundant organization that can be specified in terms of evolutionary units of molecular structure, the protein domains. The Structural Classification of Proteins (SCOP) has unified domains into a comparatively small set of folding architectures, the protein fold families and superfamilies, and these have been further grouped into protein folds. In this study, we reconstruct the evolution of the protein world using information embedded in a structural genomic census of fold architectures defined by a phylogenomic analysis of 185 completely sequenced genomes using advanced hidden Markov models and 776 folds described in SCOP release 1.67. Our study confirms the existence of defined evolutionary patterns of architectural diversification and explores how phylogenomic trees generated from folds relate to those reconstructed from fold superfamilies. Evolutionary patterns help us propose a general conceptual model that describes the growth of architectures in the protein world.
Notes:
Jiangning Song, Minglei Wang, Kevin Burrage (2006)  Exploring synonymous codon usage preferences of disulfide-bonded and non-disulfide bonded cysteines in the E. coli genome.   J Theor Biol 241: 2. 390-401 Jul  
Abstract: High-quality data about protein structures and their gene sequences are essential to the understanding of the relationship between protein folding and protein coding sequences. Firstly we constructed the EcoPDB database, which is a high-quality database of Escherichia coli genes and their corresponding PDB structures. Based on EcoPDB, we presented a novel approach based on information theory to investigate the correlation between cysteine synonymous codon usages and local amino acids flanking cysteines, the correlation between cysteine synonymous codon usages and synonymous codon usages of local amino acids flanking cysteines, as well as the correlation between cysteine synonymous codon usages and the disulfide bonding states of cysteines in the E. coli genome. The results indicate that the nearest neighboring residues and their synonymous codons of the C-terminus have the greatest influence on the usages of the synonymous codons of cysteines and the usage of the synonymous codons has a specific correlation with the disulfide bond formation of cysteines in proteins. The correlations may result from the regulation mechanism of protein structures at gene sequence level and reflect the biological function restriction that cysteines pair to form disulfide bonds. The results may also be helpful in identifying residues that are important for synonymous codon selection of cysteines to introduce disulfide bridges in protein engineering and molecular biology. The approach presented in this paper can also be utilized as a complementary computational method and be applicable to analyse the synonymous codon usages in other model organisms.
Notes:
Minglei Wang, Gustavo Caetano-Anollés (2006)  Global phylogeny determined by the combination of protein domains in proteomes.   Mol Biol Evol 23: 12. 2444-2454 Dec  
Abstract: The majority of proteins consist of multiple domains that are either repeated or combined in defined order. In this study, we survey the combination of protein domains defined at fold and fold superfamily levels in 185 genomes belonging to organisms that have been fully sequenced and introduce a method that reconstructs rooted phylogenomic trees from the content and arrangement of domains in proteins at a genomic level. We find that the majority of domain combinations were unique to Archaea, Bacteria, or Eukarya, suggesting most combinations originated after life had diversified. Domain repeat and domain repeat within multidomain proteins increased notably in eukaryotes, mainly at the expense of single-domain and domain-pair proteins. This increase was mostly confined to Metazoa. We also find an unbalanced sharing of domain combinations which suggests that Eukarya is more closely related to Bacteria than to Archaea, an observation that challenges the widely assumed eukaryote-archaebacterial sisterhood relationship. The occurrence and abundance of the molecular repertoire (interactome) of domain combinations was used to generate phylogenomic trees. These global interactome-based phylogenies described organismal histories satisfactorily, revealing the tripartite nature of life, and supporting controversial evolutionary patterns, such as the Coelomata hypothesis, the grouping of plants and animals, and the Gram-positive origin of bacteria. Results suggest strongly that the process of domain combination is not random but curved by evolution, rejecting the null hypothesis of domain modules combining in the absence of natural selection or an optimality criterion.
Notes:
2005
Ming-Lei Wang, Hui Yao, Wen-Bo Xu (2005)  Prediction by support vector machines and analysis by Z-score of poly-L-proline type II conformation based on local sequence.   Comput Biol Chem 29: 2. 95-100 Apr  
Abstract: In recent years, the poly-L-proline type II (PPII) conformation has gained more and more importance. This structure plays vital roles in many biological processes. But few studies have been made to predict PPII secondary structures computationally. The support vector machine (SVM) represents a new approach to supervised pattern classification and has been successfully applied to a wide range of pattern recognition problems. In this paper, we present a SVM prediction method of PPII conformation based on local sequence. The overall accuracy for both the independent testing set and estimate of jackknife testing reached approximately 70%. Matthew's correlation coefficient (MCC) could reach 0.4. By comparing the results of training and testing datasets with different sequence identities, we suggest that the performance of this method correlates with the sequence identity of dataset. The parameter of SVM kernel function was an important factor to the performance of this method. The propensities of residues located at different positions were also analyzed. By computing Z-scores, we found that P and G were the two most important residues to PPII structure conformation.
Notes:
2004
Ming-Lei Wang, Wei-Jiang Li, Wen-Bo Xu (2004)  Support vector machines for prediction of peptidyl prolyl cis/trans isomerization   Journal of Peptide Research 63: 23-28  
Abstract: A new method for peptidyl prolyl cis/trans isomerization prediction based on the theory of support vector machines (SVM) was introduced. The SVM represents a new approach to supervised pattern classification and has been successfully applied to a wide range of pattern recognition problems. In this study, six training datasets consisting of different length local sequence respectively were used. The polynomial kernel functions with different parameter d were chosen. The test for the independent testing dataset and the jackknife test were both carried out. When the local sequence length was 20-residue and the parameter d = 8, the SVM method archived the best performance with the correct rate for the cis and trans forms reaching 70.4 and 69.7% for the independent testing dataset, 76.7 and 76.6% for the jackknife test, respectively. Matthew's correlation coefficients for the jackknife test could reach about 0.5. The results obtained through this study indicated that the SVM method would become a powerful tool for predicting peptidyl prolyl cis/trans isomerization.
Notes:
Liang-Wei Liu, Ming-Lei Wang, Wei-Lan Shao, Wei-Jiang Li (2004)  A novel model to calculate dipeptides responsible for optimum temperature in F/10 xylanase   Process Biochemistry 40: 1389-1394  
Abstract: A bioinformatics method was used to analyze the characteristic dipeptides responsible for optimum temperature of xylanase in the F/10 family. It was found that the positive dipeptides are: TD, GH, and WY; and the negative dipeptides are: GA, IA, FH, LH, SR, and NH. The calculated temperature fitted the optimum temperature of the xylanase very well and the maximal and minimal optimum temperatures were calculated as, 150.46 and ?29.06 °C. The thermostable mechanism was discussed and the result is useful for xylanase engineering for high temperature activity, for it also provided position information for engineering.
Notes:
Jiang-Ning Song, Ming-Lei Wang, Wei-Jiang Li, Wen-Bo Xu (2004)  Prediction of the disulfide-bonding state of cysteines in proteins based on dipeptide composition.   Biochem Biophys Res Commun 318: 1. 142-147 May  
Abstract: In this paper, a novel approach has been introduced to predict the disulfide-bonding state of cysteines in proteins by means of a linear discriminator based on their dipeptide composition. The prediction is performed with a newly enlarged dataset with 8114 cysteine-containing segments extracted from 1856 non-homologous proteins of well-resolved three-dimensional structures. The oxidation of cysteines exhibits obvious cooperativity: almost all cysteines in disulfide-bond-containing proteins are in the oxidized form. This cooperativity can be well described by protein's dipeptide composition, based on which the prediction accuracy of the oxidation form of cysteines scores as high as 89.1% and 85.2%, when measured on cysteine and protein basis using the rigorous jack-knife procedure, respectively. The result demonstrates the applicability of this new relatively simple method and provides superior prediction performance compared with existing methods for the prediction of the oxidation states of cysteines in proteins.
Notes:
Ming-Lei Wang, Jiang-Ning Song, Wen-Bo Xu, Wei-Jiang Li (2004)  A novel method of analyzing proline synonymous codons in E. coli.   FEBS Lett 576: 3. 336-338 Oct  
Abstract: Proline is a special imino acid in protein and the isomerization of the prolyl peptide bond has notable biological significance and influences the final structure of protein greatly, so the correlation between proline synonymous codon usage and local amino acid, the correlation between proline synonymous codon usage and the isomerization of the prolyl peptide bond were both investigated in the Escherichia coli genome by using a novel method based on information theory. The results show that in peptide chain, the residue at the first position C-terminal influences the usage of proline synonymous codon greatly and proline synonymous codons contain some factors influencing the isomerization of the prolyl peptide bond.
Notes:
Powered by PublicationsList.org.