hosted by
publicationslist.org
    

Chris P Ponting

MRC Functional Genomics Unit
Department of Physiology Anatomy & Genetics
Le Gros Clark Building
University of Oxford
South Parks Road
OXFORD
OX1 3QX
chris.ponting@dpag.ox.ac.uk
My group is studying the evolution of genes and genomes using comparative genomics methods. The group contributed to the publicly-funded Human Genome Project described in Nature (2001) and in (2004) (PubMed), and performed much of the protein comparisons for mouse, (PubMed), rat (PubMed), chicken (PubMed), dog (PubMed) and marsupial (PubMed) Genome Projects, also published in Nature. Our work has the important benefit of reducing the number of animals used in experiments.

Our work occurs at the intersection between comparative genomics, evolutionary analyses and molecular structure-function predictions. We take new sequence data from mammalian and model organism genomes and use these to illuminate how evolution has shaped our genomes and genes, and how we are different from one another. We also analyse parasite and pathogen genomes in order to address issues of communicable diseases. Our primary aim is to understand the contribution each DNA base in the human genome makes to the functionality of our species.

Our research is multi-faceted and multi-disciplinary. We seek to understand the roles of proteins, non-coding genes and other functional elements in the evolution, development and disease-susceptibility of humans. We study how our DNA is conserved, with respect to other species, or how it is different among members of our own species. Finally, our computational findings are used to help inform experiments that elucidate function

The group is also interested in the prediction of structure, function and evolution of genes of interest to the biomedical community in general, and to the groups of the MRC Functional Genomics Unit in particular. We contributed, for example, to the understanding of the function and evolution of genes implicated in asthma, familial Alzheimer's disease, breast cancer, Aicardi-Goutiere's syndrome (PubMed), obesity (PubMed) and muscular dystrophies. The group is a major contributor to the Oxford Parkinson’s Disease Centre (http://opdc.medsci.ox.ac.uk/) which was recently awarded £5m over 5 years by Parkinson's UK to determine the earliest pathological pathways in Parkinson's disease. With Caleb Webber, the group is also a major contributor to the FP7-funded GENCODYS (Genetic and Epigenetic Networks in Cognitive Dysfunction) project (http://www.gencodys.eu/homepage.php)

Journal articles

2010
Kiwoong Nam, Carina Mugal, Benoit Nabholz, Holger Schielzeth, Jochen B W Wolf, Niclas Backström, Axel Künstner, Christopher N Balakrishnan, Andreas Heger, Chris P Ponting, David F Clayton, Hans Ellegren (2010)  Molecular evolution of genes in avian genomes.   Genome Biol 11: 6. 06  
Abstract: BACKGROUND: Obtaining a draft genome sequence of the zebra finch (Taeniopygia guttata), the second bird genome to be sequenced, provides the necessary resource for whole-genome comparative analysis of gene sequence evolution in a non-mammalian vertebrate lineage. To analyze basic molecular evolutionary processes during avian evolution, and to contrast these with the situation in mammals, we aligned the protein-coding sequences of 8,384 1:1 orthologs of chicken, zebra finch, a lizard and three mammalian species. RESULTS: We found clear differences in the substitution rate at fourfold degenerate sites, being lowest in the ancestral bird lineage, intermediate in the chicken lineage and highest in the zebra finch lineage, possibly reflecting differences in generation time. We identified positively selected and/or rapidly evolving genes in avian lineages and found an over-representation of several functional classes, including anion transporter activity, calcium ion binding, cell adhesion and microtubule cytoskeleton. CONCLUSIONS: Focusing specifically on genes of neurological interest and genes differentially expressed in the unique vocal control nuclei of the songbird brain, we find a number of positively selected genes, including synaptic receptors. We found no evidence that selection for beneficial alleles is more efficient in regions of high recombination; in fact, there was a weak yet significant negative correlation between omega and recombination rate, which is in the direction predicted by the Hill-Robertson effect if slightly deleterious mutations contribute to protein evolution. These findings set the stage for studies of functional genetics of avian genes.
Notes:
Rebecca A Chodroff, Leo Goodstadt, Tamara M Sirey, Peter L Oliver, Kay E Davies, Eric D Green, Zoltán Molnár, Chris P Ponting (2010)  Long noncoding RNA genes: conservation of sequence and brain expression among diverse amniotes.   Genome Biol 11: 7. 07  
Abstract: BACKGROUND: Long considered to be the building block of life, it is now apparent that protein is only one of many functional products generated by the eukaryotic genome. Indeed, more of the human genome is transcribed into noncoding sequence than into protein-coding sequence. Nevertheless, whilst we have developed a deep understanding of the relationships between evolutionary constraint and function for protein-coding sequence, little is known about these relationships for non-coding transcribed sequence. This dearth of information is partially attributable to a lack of established non-protein-coding RNA (ncRNA) orthologs among birds and mammals within sequence and expression databases. RESULTS: Here, we performed a multi-disciplinary study of four highly conserved and brain-expressed transcripts selected from a list of mouse long intergenic noncoding RNA (lncRNA) loci that generally show pronounced evolutionary constraint within their putative promoter regions and across exon-intron boundaries. We identify some of the first lncRNA orthologs present in birds (chicken), marsupial (opossum), and eutherian mammals (mouse), and investigate whether they exhibit conservation of brain expression. In contrast to conventional protein-coding genes, the sequences, transcriptional start sites, exon structures, and lengths for these non-coding genes are all highly variable. CONCLUSIONS: The biological relevance of lncRNAs would be highly questionable if they were limited to closely related phyla. Instead, their preservation across diverse amniotes, their apparent conservation in exon structure, and similarities in their pattern of brain expression during embryonic and early postnatal stages together indicate that these are functional RNA molecules, of which some have roles in vertebrate brain development.
Notes:
Stephen Meader, Chris P Ponting, Gerton Lunter (2010)  Massive turnover of functional sequence in human and other mammalian genomes.   Genome Res 20: 10. 1335-1343 Oct  
Abstract: Despite the availability of dozens of animal genome sequences, two key questions remain unanswered: First, what fraction of any species' genome confers biological function, and second, are apparent differences in organismal complexity reflected in an objective measure of genomic complexity? Here, we address both questions by applying, across the mammalian phylogeny, an evolutionary model that estimates the amount of functional DNA that is shared between two species' genomes. Our main findings are, first, that as the divergence between mammalian species increases, the predicted amount of pairwise shared functional sequence drops off dramatically. We show by simulations that this is not an artifact of the method, but rather indicates that functional (and mostly noncoding) sequence is turning over at a very high rate. We estimate that between 200 and 300 Mb (∼6.5%-10%) of the human genome is under functional constraint, which includes five to eight times as many constrained noncoding bases than bases that code for protein. In contrast, in D. melanogaster we estimate only 56-66 Mb to be constrained, implying a ratio of noncoding to coding constrained bases of about 2. This suggests that, rather than genome size or protein-coding gene complement, it is the number of functional bases that might best mirror our naïve preconceptions of organismal complexity.
Notes:
Dalila Pinto, Alistair T Pagnamenta, Lambertus Klei, Richard Anney, Daniele Merico, Regina Regan, Judith Conroy, Tiago R Magalhaes, Catarina Correia, Brett S Abrahams, Joana Almeida, Elena Bacchelli, Gary D Bader, Anthony J Bailey, Gillian Baird, Agatino Battaglia, Tom Berney, Nadia Bolshakova, Sven Bölte, Patrick F Bolton, Thomas Bourgeron, Sean Brennan, Jessica Brian, Susan E Bryson, Andrew R Carson, Guillermo Casallo, Jillian Casey, Brian H Y Chung, Lynne Cochrane, Christina Corsello, Emily L Crawford, Andrew Crossett, Cheryl Cytrynbaum, Geraldine Dawson, Maretha de Jonge, Richard Delorme, Irene Drmic, Eftichia Duketis, Frederico Duque, Annette Estes, Penny Farrar, Bridget A Fernandez, Susan E Folstein, Eric Fombonne, Christine M Freitag, John Gilbert, Christopher Gillberg, Joseph T Glessner, Jeremy Goldberg, Andrew Green, Jonathan Green, Stephen J Guter, Hakon Hakonarson, Elizabeth A Heron, Matthew Hill, Richard Holt, Jennifer L Howe, Gillian Hughes, Vanessa Hus, Roberta Igliozzi, Cecilia Kim, Sabine M Klauck, Alexander Kolevzon, Olena Korvatska, Vlad Kustanovich, Clara M Lajonchere, Janine A Lamb, Magdalena Laskawiec, Marion Leboyer, Ann Le Couteur, Bennett L Leventhal, Anath C Lionel, Xiao-Qing Liu, Catherine Lord, Linda Lotspeich, Sabata C Lund, Elena Maestrini, William Mahoney, Carine Mantoulan, Christian R Marshall, Helen McConachie, Christopher J McDougle, Jane McGrath, William M McMahon, Alison Merikangas, Ohsuke Migita, Nancy J Minshew, Ghazala K Mirza, Jeff Munson, Stanley F Nelson, Carolyn Noakes, Abdul Noor, Gudrun Nygren, Guiomar Oliveira, Katerina Papanikolaou, Jeremy R Parr, Barbara Parrini, Tara Paton, Andrew Pickles, Marion Pilorge, Joseph Piven, Chris P Ponting, David J Posey, Annemarie Poustka, Fritz Poustka, Aparna Prasad, Jiannis Ragoussis, Katy Renshaw, Jessica Rickaby, Wendy Roberts, Kathryn Roeder, Bernadette Roge, Michael L Rutter, Laura J Bierut, John P Rice, Jeff Salt, Katherine Sansom, Daisuke Sato, Ricardo Segurado, Ana F Sequeira, Lili Senman, Naisha Shah, Val C Sheffield, Latha Soorya, Inês Sousa, Olaf Stein, Nuala Sykes, Vera Stoppioni, Christina Strawbridge, Raffaella Tancredi, Katherine Tansey, Bhooma Thiruvahindrapduram, Ann P Thompson, Susanne Thomson, Ana Tryfon, John Tsiantis, Herman Van Engeland, John B Vincent, Fred Volkmar, Simon Wallace, Kai Wang, Zhouzhi Wang, Thomas H Wassink, Caleb Webber, Rosanna Weksberg, Kirsty Wing, Kerstin Wittemeyer, Shawn Wood, Jing Wu, Brian L Yaspan, Danielle Zurawiecki, Lonnie Zwaigenbaum, Joseph D Buxbaum, Rita M Cantor, Edwin H Cook, Hilary Coon, Michael L Cuccaro, Bernie Devlin, Sean Ennis, Louise Gallagher, Daniel H Geschwind, Michael Gill, Jonathan L Haines, Joachim Hallmayer, Judith Miller, Anthony P Monaco, John I Nurnberger, Andrew D Paterson, Margaret A Pericak-Vance, Gerard D Schellenberg, Peter Szatmari, Astrid M Vicente, Veronica J Vieland, Ellen M Wijsman, Stephen W Scherer, James S Sutcliffe, Catalina Betancur (2010)  Functional impact of global rare copy number variation in autism spectrum disorders.   Nature 466: 7304. 368-372 Jul  
Abstract: The autism spectrum disorders (ASDs) are a group of conditions characterized by impairments in reciprocal social interaction and communication, and the presence of restricted and repetitive behaviours. Individuals with an ASD vary greatly in cognitive development, which can range from above average to intellectual disability. Although ASDs are known to be highly heritable ( approximately 90%), the underlying genetic determinants are still largely unknown. Here we analysed the genome-wide characteristics of rare (<1% frequency) copy number variation in ASD using dense genotyping arrays. When comparing 996 ASD individuals of European ancestry to 1,287 matched controls, cases were found to carry a higher global burden of rare, genic copy number variants (CNVs) (1.19 fold, P = 0.012), especially so for loci previously implicated in either ASD and/or intellectual disability (1.69 fold, P = 3.4 x 10(-4)). Among the CNVs there were numerous de novo and inherited events, sometimes in combination in a given family, implicating many novel ASD genes such as SHANK2, SYNGAP1, DLGAP2 and the X-linked DDX53-PTCHD1 locus. We also discovered an enrichment of CNVs disrupting functional gene sets involved in cellular proliferation, projection and motility, and GTPase/Ras signalling. Our results reveal many new genetic and functional targets in ASD that may lead to final connected pathways.
Notes:
Jayne Y Hehir-Kwa, Nienke Wieskamp, Caleb Webber, Rolph Pfundt, Han G Brunner, Christian Gilissen, Bert B A de Vries, Chris P Ponting, Joris A Veltman (2010)  Accurate distinction of pathogenic from benign CNVs in mental retardation.   PLoS Comput Biol 6: 4. Apr  
Abstract: Copy number variants (CNVs) have recently been recognized as a common form of genomic variation in humans. Hundreds of CNVs can be detected in any individual genome using genomic microarrays or whole genome sequencing technology, but their phenotypic consequences are still poorly understood. Rare CNVs have been reported as a frequent cause of neurological disorders such as mental retardation (MR), schizophrenia and autism, prompting widespread implementation of CNV screening in diagnostics. In previous studies we have shown that, in contrast to benign CNVs, MR-associated CNVs are significantly enriched in genes whose mouse orthologues, when disrupted, result in a nervous system phenotype. In this study we developed and validated a novel computational method for differentiating between benign and MR-associated CNVs using structural and functional genomic features to annotate each CNV. In total 13 genomic features were included in the final version of a Naïve Bayesian Tree classifier, with LINE density and mouse knock-out phenotypes contributing most to the classifier's accuracy. After demonstrating that our method (called GECCO) perfectly classifies CNVs causing known MR-associated syndromes, we show that it achieves high accuracy (94%) and negative predictive value (99%) on a blinded test set of more than 1,200 CNVs from a large cohort of individuals with MR. These results indicate that this classification method will be of value for objectively prioritizing CNVs in clinical research and diagnostics.
Notes:
Elizabeth H Bayne, Sharon A White, Alexander Kagansky, Dominika A Bijos, Luis Sanchez-Pulido, Kwang-Lae Hoe, Dong-Uk Kim, Han-Oh Park, Chris P Ponting, Juri Rappsilber, Robin C Allshire (2010)  Stc1: a critical link between RNAi and chromatin modification required for heterochromatin integrity.   Cell 140: 5. 666-677 Mar  
Abstract: In fission yeast, RNAi directs heterochromatin formation at centromeres, telomeres, and the mating type locus. Noncoding RNAs transcribed from repeat elements generate siRNAs that are incorporated into the Argonaute-containing RITS complex and direct it to nascent homologous transcripts. This leads to recruitment of the CLRC complex, including the histone methyltransferase Clr4, promoting H3K9 methylation and heterochromatin formation. A key question is what mediates the recruitment of Clr4/CLRC to transcript-bound RITS. We have identified a LIM domain protein, Stc1, that is required for centromeric heterochromatin integrity. Our analyses show that Stc1 is specifically required to establish H3K9 methylation via RNAi, and interacts both with the RNAi effector Ago1, and with the chromatin-modifying CLRC complex. Moreover, tethering Stc1 to a euchromatic locus is sufficient to induce silencing and heterochromatin formation independently of RNAi. We conclude that Stc1 associates with RITS on centromeric transcripts and recruits CLRC, thereby coupling RNAi to chromatin modification.
Notes:
Stephen Meader, LaDeana W Hillier, Devin Locke, Chris P Ponting, Gerton Lunter (2010)  Genome assembly quality: assessment and improvement using the neutral indel model.   Genome Res 20: 5. 675-684 May  
Abstract: We describe a statistical and comparative-genomic approach for quantifying error rates of genome sequence assemblies. The method exploits not substitutions but the pattern of insertions and deletions (indels) in genome-scale alignments for closely related species. Using two- or three-way alignments, the approach estimates the amount of aligned sequence containing clusters of nucleotides that were wrongly inserted or deleted during sequencing or assembly. Thus, the method is well-suited to assessing fine-scale sequence quality within single assemblies, between different assemblies of a single set of reads, and between genome assemblies for different species. When applying this approach to four primate genome assemblies, we found that average gap error rates per base varied considerably, by up to sixfold. As expected, bacterial artificial chromosome (BAC) sequences contained lower, but still substantial, predicted numbers of errors, arguing for caution in regarding BACs as the epitome of genome fidelity. We then mapped short reads, at approximately 10-fold statistical coverage, from a Bornean orangutan onto the Sumatran orangutan genome assembly originally constructed from capillary reads. This resulted in a reduced gap error rate and a separation of error-prone from high-fidelity sequence. Over 5000 predicted indel errors in protein-coding sequence were corrected in a hybrid assembly. Our approach contributes a new fine-scale quality metric for assemblies that should facilitate development of improved genome sequencing and assembly strategies.
Notes:
Ben L Murton, Wee Loong Chin, Chris P Ponting, Laura S Itzhaki (2010)  Characterising the binding specificities of the subunits associated with the KMT2/Set1 histone lysine methyltransferase.   J Mol Biol 398: 4. 481-488 May  
Abstract: KMT2/Set1 is the catalytic subunit of the complex of proteins associated with Set1 (COMPASS) that is responsible for the methylation of lysine 4 of histone H3 (H3K4) in Saccharomyces cerevisiae. Whereas monomethylated H3K4 (H3K4me1) is found throughout the genome, di- (H3K4me2) and tri- (H3K4me3) methylated H3K4 are enriched at specific loci, which correlates with the promoter and 5'-ends of actively transcribed genes in the case of H3K4me3. The COMPASS subunits contain a number of domains that are conserved in homologous complexes in higher eukaryotes and are reported to interact with modified histones. However, the exact organization of these subunits and their role within the complex have not been elucidated. In this study we showed that: (1) subunits Swd1 and Swd3 form a stable heterodimer that dissociates upon binding to a modified H3K4me2 tail peptide, suggesting a regulatory role in COMPASS; (2) the affinity of the subunit Spp1 for modified histone H3 substrates is much higher than that of Swd1 and Swd3; (3) Spp1 has a preference for H3K4me2/3 methylation state; and (4) Spp1 contains a high-affinity DNA-binding domain in the previously uncharacterised C-terminal region. These data allow us to suggest a mechanism for the regulation of COMPASS activity at an actively transcribed gene.
Notes:
Wesley C Warren, David F Clayton, Hans Ellegren, Arthur P Arnold, Ladeana W Hillier, Axel Künstner, Steve Searle, Simon White, Albert J Vilella, Susan Fairley, Andreas Heger, Lesheng Kong, Chris P Ponting, Erich D Jarvis, Claudio V Mello, Pat Minx, Peter Lovell, Tarciso A F Velho, Margaret Ferris, Christopher N Balakrishnan, Saurabh Sinha, Charles Blatti, Sarah E London, Yun Li, Ya-Chi Lin, Julia George, Jonathan Sweedler, Bruce Southey, Preethi Gunaratne, Michael Watson, Kiwoong Nam, Niclas Backström, Linnea Smeds, Benoit Nabholz, Yuichiro Itoh, Osceola Whitney, Andreas R Pfenning, Jason Howard, Martin Völker, Bejamin M Skinner, Darren K Griffin, Liang Ye, William M McLaren, Paul Flicek, Victor Quesada, Gloria Velasco, Carlos Lopez-Otin, Xose S Puente, Tsviya Olender, Doron Lancet, Arian F A Smit, Robert Hubley, Miriam K Konkel, Jerilyn A Walker, Mark A Batzer, Wanjun Gu, David D Pollock, Lin Chen, Ze Cheng, Evan E Eichler, Jessica Stapley, Jon Slate, Robert Ekblom, Tim Birkhead, Terry Burke, David Burt, Constance Scharff, Iris Adam, Hugues Richard, Marc Sultan, Alexey Soldatov, Hans Lehrach, Scott V Edwards, Shiaw-Pyng Yang, Xiaoching Li, Tina Graves, Lucinda Fulton, Joanne Nelson, Asif Chinwalla, Shunfeng Hou, Elaine R Mardis, Richard K Wilson (2010)  The genome of a songbird.   Nature 464: 7289. 757-762 Apr  
Abstract: The zebra finch is an important model organism in several fields with unique relevance to human neuroscience. Like other songbirds, the zebra finch communicates through learned vocalizations, an ability otherwise documented only in humans and a few other animals and lacking in the chicken-the only bird with a sequenced genome until now. Here we present a structural, functional and comparative analysis of the genome sequence of the zebra finch (Taeniopygia guttata), which is a songbird belonging to the large avian order Passeriformes. We find that the overall structures of the genomes are similar in zebra finch and chicken, but they differ in many intrachromosomal rearrangements, lineage-specific gene family expansions, the number of long-terminal-repeat-based retrotransposons, and mechanisms of sex chromosome dosage compensation. We show that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets. We also show evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience. These results indicate an active involvement of the genome in neural processes underlying vocal communication and identify potential genetic substrates for the evolution and regulation of this behaviour.
Notes:
Sreeram V Ramagopalan, Andreas Heger, Antonio J Berlanga, Narelle J Maugeri, Matthew R Lincoln, Amy Burrell, Lahiru Handunnetthi, Adam E Handel, Giulio Disanto, Sarah-Michelle Orton, Corey T Watson, Julia M Morahan, Gavin Giovannoni, Chris P Ponting, George C Ebers, Julian C Knight (2010)  A ChIP-seq defined genome-wide map of vitamin D receptor binding: associations with disease and evolution.   Genome Res 20: 10. 1352-1360 Oct  
Abstract: Initially thought to play a restricted role in calcium homeostasis, the pleiotropic actions of vitamin D in biology and their clinical significance are only now becoming apparent. However, the mode of action of vitamin D, through its cognate nuclear vitamin D receptor (VDR), and its contribution to diverse disorders, remain poorly understood. We determined VDR binding throughout the human genome using chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq). After calcitriol stimulation, we identified 2776 genomic positions occupied by the VDR and 229 genes with significant changes in expression in response to vitamin D. VDR binding sites were significantly enriched near autoimmune and cancer associated genes identified from genome-wide association (GWA) studies. Notable genes with VDR binding included IRF8, associated with MS, and PTPN2 associated with Crohn's disease and T1D. Furthermore, a number of single nucleotide polymorphism associations from GWA were located directly within VDR binding intervals, for example, rs13385731 associated with SLE and rs947474 associated with T1D. We also observed significant enrichment of VDR intervals within regions of positive selection among individuals of Asian and European descent. ChIP-seq determination of transcription factor binding, in combination with GWA data, provides a powerful approach to further understanding the molecular bases of complex diseases.
Notes:
Lesheng Kong, Peter V Lovell, Andreas Heger, Claudio V Mello, Chris P Ponting (2010)  Accelerated evolution of PAK3- and PIM1-like kinase gene families in the zebra finch, Taeniopygia guttata.   Mol Biol Evol 27: 8. 1923-1934 Aug  
Abstract: Genes encoding protein kinases tend to evolve slowly over evolutionary time, and only rarely do they appear as recent duplications in sequenced vertebrate genomes. Consequently, it was a surprise to find two families of kinase genes that have greatly and recently expanded in the zebra finch (Taeniopygia guttata) lineage. In contrast to other amniotic genomes (including chicken) that harbor only single copies of p21-activated serine/threonine kinase 3 (PAK3) and proviral integration site 1 (PIM1) genes, the zebra finch genome appeared at first to additionally contain 67 PAK3-like (PAK3L) and 51 PIM1-like (PIM1L) protein kinase genes. An exhaustive analysis of these gene models, however, revealed most to be incomplete, owing to the absence of terminal exons. After reprediction, 31 PAK3L genes and 10 PIM1L genes remain, and all but three are predicted, from the retention of functional sites and open reading frames, to be enzymatically active. PAK3L, but not PIM1L, gene sequences show evidence of recurrent episodes of positive selection, concentrated within structures spatially adjacent to N- and C-terminal protein regions that have been discarded from zebra finch PAK3L genes. At least seven zebra finch PAK3L genes were observed to be expressed in testis, whereas two sequences were found transcribed in the brain, one broadly including the song nuclei and the other in the ventricular zone and in cells resembling Bergmann's glia in the cerebellar Purkinje cell layer. Two PIM1L sequences were also observed to be expressed with broad distributions in the zebra finch brain, one in both the ventricular zone and the cerebellum and apparently associated with glial cells and the other showing neuronal cell expression and marked enrichment in midbrain/thalamic nuclei. These expression patterns do not correlate with zebra finch-specific features such as vocal learning. Nevertheless, our results show how ancient and conserved intracellular signaling molecules can be co-opted, following duplication, thereby resulting in lineage-specific functions, presumably affecting the zebra finch testis and brain.
Notes:
Chris P Ponting, T Grant Belgard (2010)  Transcribed dark matter: meaning or myth?   Hum Mol Genet 19: R2. R162-R168 Oct  
Abstract: Genomic tiling arrays, cDNA sequencing and, more recently, RNA-Seq have provided initial insights into the extent and depth of transcribed sequence across human and other genomes. These methods have led to greatly improved annotations of protein-coding genes, but have also identified transcription outside of annotated exons. One resultant issue that has aroused dispute is the balance of transcription of known exons against transcription outside of known exons. While non-genic 'dark matter' transcription was found by tiling arrays to be pervasive, it was seen to contribute only a small percentage of the polyadenylated transcriptome in some RNA-Seq experiments. This apparent contradiction has been compounded by a lack of clarity about what exactly constitutes a protein-coding gene. It remains unclear, for example, whether or not all transcripts that overlap on either strand within a genomic locus should be assigned to a single gene locus, including those that fail to share promoters, exons and splice junctions. The inability of tiling arrays and RNA-Seq to count transcripts, rather than exons or exon pairs, adds to these difficulties. While there is agreement that thousands of apparently non-coding loci are present outside of protein-coding genes in the human genome, there is vigorous debate of what constitutes evidence for their functionality. These issues will only be resolved upon the demonstration, or otherwise, that organismal or cellular phenotypes frequently result when non-coding RNA loci are disrupted.
Notes:
Eris Duro, Cecilia Lundin, Katrine Ask, Luis Sanchez-Pulido, Thomas J Macartney, Rachel Toth, Chris P Ponting, Anja Groth, Thomas Helleday, John Rouse (2010)  Identification of the MMS22L-TONSL Complex that Promotes Homologous Recombination.   Mol Cell Nov  
Abstract: Budding yeast Mms22 is required for homologous recombination (HR)-mediated repair of stalled or broken DNA replication forks. Here we identify a human Mms22-like protein (MMS22L) and an MMS22L-interacting protein, NFκBIL2/TONSL. Depletion of MMS22L or TONSL from human cells causes a high level of double-strand breaks (DSBs) during DNA replication. Both proteins accumulate at stressed replication forks, and depletion of MMS22L or TONSL from cells causes hypersensitivity to agents that cause S phase-associated DSBs, such as topoisomerase (TOP) inhibitors. In this light, MMS22L and TONSL are required for the HR-mediated repair of replication fork-associated DSBs. In cells depleted of either protein, DSBs induced by the TOP1 inhibitor camptothecin are resected normally, but the loading of the RAD51 recombinase is defective. Therefore, MMS22L and TONSL are required for the maintenance of genome stability when unscheduled DSBs occur in the vicinity of DNA replication forks.
Notes:
Shinya Ohta, Jimi-Carlo Bukowski-Wills, Luis Sanchez-Pulido, Flavia de de Alves, Laura Wood, Zhuo A Chen, Melpi Platani, Lutz Fischer, Damien F Hudson, Chris P Ponting, Tatsuo Fukagawa, William C Earnshaw, Juri Rappsilber (2010)  The protein composition of mitotic chromosomes determined using multiclassifier combinatorial proteomics.   Cell 142: 5. 810-821 Sep  
Abstract: Despite many decades of study, mitotic chromosome structure and composition remain poorly characterized. Here, we have integrated quantitative proteomics with bioinformatic analysis to generate a series of independent classifiers that describe the approximately 4,000 proteins identified in isolated mitotic chromosomes. Integrating these classifiers by machine learning uncovers functional relationships between protein complexes in the context of intact chromosomes and reveals which of the approximately 560 uncharacterized proteins identified here merits further study. Indeed, of 34 GFP-tagged predicted chromosomal proteins, 30 were chromosomal, including 13 with centromere-association. Of 16 GFP-tagged predicted nonchromosomal proteins, 14 were confirmed to be nonchromosomal. An unbiased analysis of the whole chromosome proteome from genetic knockouts of kinetochore protein Ska3/Rama1 revealed that the APC/C and RanBP2/RanGAP1 complexes depend on the Ska complex for stable association with chromosomes. Our integrated analysis predicts that up to 97 new centromere-associated proteins remain to be discovered in our data set.
Notes:
2009
Jasmina Ponjavic, Peter L Oliver, Gerton Lunter, Chris P Ponting (2009)  Genomic and transcriptional co-localization of protein-coding and long non-coding RNA pairs in the developing brain.   PLoS Genet 5: 8. Aug  
Abstract: Besides protein-coding mRNAs, eukaryotic transcriptomes include many long non-protein-coding RNAs (ncRNAs) of unknown function that are transcribed away from protein-coding loci. Here, we have identified 659 intergenic long ncRNAs whose genomic sequences individually exhibit evolutionary constraint, a hallmark of functionality. Of this set, those expressed in the brain are more frequently conserved and are significantly enriched with predicted RNA secondary structures. Furthermore, brain-expressed long ncRNAs are preferentially located adjacent to protein-coding genes that are (1) also expressed in the brain and (2) involved in transcriptional regulation or in nervous system development. This led us to the hypothesis that spatiotemporal co-expression of ncRNAs and nearby protein-coding genes represents a general phenomenon, a prediction that was confirmed subsequently by in situ hybridisation in developing and adult mouse brain. We provide the full set of constrained long ncRNAs as an important experimental resource and present, for the first time, substantive and predictive criteria for prioritising long ncRNA and mRNA transcript pairs when investigating their biological functions and contributions to development and disease.
Notes:
Ana C Marques, Chris P Ponting (2009)  Catalogues of mammalian long noncoding RNAs: modest conservation and incompleteness.   Genome Biol 10: 11. 11  
Abstract: BACKGROUND: Despite increasing interest in the noncoding fraction of transcriptomes, the number, species-conservation and functions, if any, of many non-protein-coding transcripts remain to be discovered. Two extensive long intergenic noncoding RNA (ncRNA) transcript catalogues are now available for mouse: over 3,000 macroRNAs identified by cDNA sequencing, and 1,600 long intergenic noncoding RNA (lincRNA) intervals that are predicted from chromatin-state maps. Previously we showed that macroRNAs tend to be more highly conserved than putatively neutral sequence, although only 5% of bases are predicted as constrained. By contrast, over a thousand lincRNAs were reported as being highly conserved. This apparent difference may account for the surprisingly small fraction (11%) of transcripts that are represented in both catalogues. Here we sought to resolve the reported discrepancy between the evolutionary rates for these two sets. RESULTS: Our analyses reveal lincRNA and macroRNA exon sequences to be subject to the same relatively low degree of sequence constraint. Nonetheless, our observations are consistent with the functionality of a fraction of ncRNA in these sets, with up to a quarter of ncRNA exons having evolved significantly slower than neighboring neutral sequence. The more tissue-specific macroRNAs are enriched in predicted RNA secondary structures and thus may often act in trans, whereas the more highly and broadly expressed lincRNAs appear more likely to act in the cis-regulation of adjacent transcription factor genes. CONCLUSIONS: Taken together, our results indicate that each of the two ncRNA catalogues unevenly and lightly samples the true, much larger, ncRNA repertoire of the mouse.
Notes:
Ivan M Muñoz, Karolina Hain, Anne-Cécile Déclais, Mary Gardiner, Geraldine W Toh, Luis Sanchez-Pulido, Johannes M Heuckmann, Rachel Toth, Thomas Macartney, Berina Eppink, Roland Kanaar, Chris P Ponting, David M J Lilley, John Rouse (2009)  Coordination of structure-specific nucleases by human SLX4/BTBD12 is required for DNA repair.   Mol Cell 35: 1. 116-127 Jul  
Abstract: Budding yeast Slx4 interacts with the structure-specific endonuclease Slx1 to ensure completion of ribosomal DNA replication. Slx4 also interacts with the Rad1-Rad10 endonuclease to control cleavage of 3' flaps during repair of double-strand breaks (DSBs). Here we describe the identification of human SLX4, a scaffold for DNA repair nucleases XPF-ERCC1, MUS81-EME1, and SLX1. SLX4 immunoprecipitates show SLX1-dependent nuclease activity toward Holliday junctions and MUS81-dependent activity toward other branched DNA structures. Furthermore, SLX4 enhances the nuclease activity of SLX1, MUS81, and XPF. Consistent with a role in processing recombination intermediates, cells depleted of SLX4 are hypersensitive to genotoxins that cause DSBs and show defects in the resolution of interstrand crosslink-induced DSBs. Depletion of SLX4 causes a decrease in DSB-induced homologous recombination. These data show that SLX4 is a regulator of structure-specific nucleases and that SLX4 and SLX1 are important regulators of genome stability in human cells.
Notes:
Guillaume M Hautbergue, Ming-Lung Hung, Matthew J Walsh, Ambrosius P L Snijders, Chung-Te Chang, Rachel Jones, Chris P Ponting, Mark J Dickman, Stuart A Wilson (2009)  UIF, a New mRNA export adaptor that works together with REF/ALY, requires FACT for recruitment to mRNA.   Curr Biol 19: 22. 1918-1924 Dec  
Abstract: Messenger RNA (mRNA) export adaptors play an important role in the transport of mRNA from the nucleus to the cytoplasm. They couple early mRNA processing events such as 5' capping and 3' end formation with loading of the TAP/NXF1 export receptor onto mRNA. The canonical adaptor REF/ALY/Yra1 is recruited to mRNA via UAP56 and subsequently delivers the mRNA to NXF1 [1]. Knockdown of UAP56 [2, 3] and NXF1 [4-7] in higher eukaryotes efficiently blocks mRNA export, whereas knockdown of REF only causes a modest reduction, suggesting the existence of additional adaptors [8-10]. Here we identify a new UAP56-interacting factor, UIF, which functions as an export adaptor, binding NXF1 and delivering mRNA to the nuclear pore. REF and UIF are simultaneously found on the same mRNA molecules, and both proteins are required for efficient export of mRNA. We show that the histone chaperone FACT specifically binds UIF, but not REF, via the SSRP1 subunit, and this interaction is required for recruitment of UIF to mRNA. Together the results indicate that REF and UIF represent key human adaptors for the export of cellular mRNAs via the UAP56-NXF1 pathway.
Notes:
Caleb Webber, Jayne Y Hehir-Kwa, Duc-Quang Nguyen, Bert B A de Vries, Joris A Veltman, Chris P Ponting (2009)  Forging links between human mental retardation-associated CNVs and mouse gene knockout models.   PLoS Genet 5: 6. Jun  
Abstract: Rare copy number variants (CNVs) are frequently associated with common neurological disorders such as mental retardation (MR; learning disability), autism, and schizophrenia. CNV screening in clinical practice is limited because pathological CNVs cannot be distinguished routinely from benign CNVs, and because genes underlying patients' phenotypes remain largely unknown. Here, we present a novel, statistically robust approach that forges links between 148 MR-associated CNVs and phenotypes from approximately 5,000 mouse gene knockout experiments. These CNVs were found to be significantly enriched in two classes of genes, those whose mouse orthologues, when disrupted, result in either abnormal axon or dopaminergic neuron morphologies. Additional enrichments highlighted correspondences between relevant mouse phenotypes and secondary presentations such as brain abnormality, cleft palate, and seizures. The strength of these phenotype enrichments (>100% increases) greatly exceeded molecular annotations (<30% increases) and allowed the identification of 78 genes that may contribute to MR and associated phenotypes. This study is the first to demonstrate how the power of mouse knockout data can be systematically exploited to better understand genetically heterogeneous neurological disorders.
Notes:
Chris P Ponting, Peter L Oliver, Wolf Reik (2009)  Evolution and functions of long noncoding RNAs.   Cell 136: 4. 629-641 Feb  
Abstract: RNA is not only a messenger operating between DNA and protein. Transcription of essentially the entire eukaryotic genome generates a myriad of non-protein-coding RNA species that show complex overlapping patterns of expression and regulation. Although long noncoding RNAs (lncRNAs) are among the least well-understood of these transcript species, they cannot all be dismissed as merely transcriptional "noise." Here, we review the evolution of lncRNAs and their roles in transcriptional regulation, epigenetic gene regulation, and disease.
Notes:
Andreas Heger, Chris P Ponting, Ian Holmes (2009)  Accurate estimation of gene evolutionary rates using XRATE, with an application to transmembrane proteins.   Mol Biol Evol 26: 8. 1715-1721 Aug  
Abstract: XRATE implements algorithms for comparative annotation, ancestral reconstruction, evolutionary rate estimation, and simulation. Its modeling repertoire includes phylogenetic stochastic context-free grammars and phylo-hidden Markov models. Following earlier tests of XRATE as a machine-learning tool suitable for alignment annotation, we now report the first tests of XRATE as a precise quantitative instrument for estimating evolutionary rates. We implement a codon model similar to that of Goldman and Yang (1994) (A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11: 725-736) and show that XRATE's parameter estimates are consistent with those of PAML. To demonstrate its utility, we apply the model to measure the difference in selective strength (omega) between intracellular and secreted regions of type I transmembrane proteins. In 215 of 303 instances, a complex model with individual omega for each region provides a better fit to the data than the simpler single omega value model. Secreted portions of type I transmembrane proteins show an elevation in omega similar to that seen for secreted protein genes. Less stringent purifying selection is thus a general property of the extracellular milieu, rather than being specific to only soluble and secreted proteins.
Notes:
Deanna M Church, Leo Goodstadt, Ladeana W Hillier, Michael C Zody, Steve Goldstein, Xinwe She, Carol J Bult, Richa Agarwala, Joshua L Cherry, Michael DiCuccio, Wratko Hlavina, Yuri Kapustin, Peter Meric, Donna Maglott, Zoë Birtle, Ana C Marques, Tina Graves, Shiguo Zhou, Brian Teague, Konstantinos Potamousis, Christopher Churas, Michael Place, Jill Herschleb, Ron Runnheim, Daniel Forrest, James Amos-Landgraf, David C Schwartz, Ze Cheng, Kerstin Lindblad-Toh, Evan E Eichler, Chris P Ponting (2009)  Lineage-specific biology revealed by a finished genome assembly of the mouse.   PLoS Biol 7: 5. May  
Abstract: The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non-protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.
Notes:
Peter L Oliver, Leo Goodstadt, Joshua J Bayes, Zoë Birtle, Kevin C Roach, Nitin Phadnis, Scott A Beatson, Gerton Lunter, Harmit S Malik, Chris P Ponting (2009)  Accelerated evolution of the Prdm9 speciation gene across diverse metazoan taxa.   PLoS Genet 5: 12. Dec  
Abstract: The onset of prezygotic and postzygotic barriers to gene flow between populations is a hallmark of speciation. One of the earliest postzygotic isolating barriers to arise between incipient species is the sterility of the heterogametic sex in interspecies' hybrids. Four genes that underlie hybrid sterility have been identified in animals: Odysseus, JYalpha, and Overdrive in Drosophila and Prdm9 (Meisetz) in mice. Mouse Prdm9 encodes a protein with a KRAB motif, a histone methyltransferase domain and several zinc fingers. The difference of a single zinc finger distinguishes Prdm9 alleles that cause hybrid sterility from those that do not. We find that concerted evolution and positive selection have rapidly altered the number and sequence of Prdm9 zinc fingers across 13 rodent genomes. The patterns of positive selection in Prdm9 zinc fingers imply that rapid evolution has acted on the interface between the Prdm9 protein and the DNA sequences to which it binds. Similar patterns are apparent for Prdm9 zinc fingers for diverse metazoans, including primates. Indeed, allelic variation at the DNA-binding positions of human PRDM9 zinc fingers show significant association with decreased risk of infertility. Prdm9 thus plays a role in determining male sterility both between species (mouse) and within species (human). The recurrent episodes of positive selection acting on Prdm9 suggest that the DNA sequences to which it binds must also be evolving rapidly. Our findings do not identify the nature of the underlying DNA sequences, but argue against the proposed role of Prdm9 as an essential transcription factor in mouse meiosis. We propose a hypothetical model in which incompatibilities between Prdm9-binding specificity and satellite DNAs provide the molecular basis for Prdm9-mediated hybrid sterility. We suggest that Prdm9 should be investigated as a candidate gene in other instances of hybrid sterility in metazoans.
Notes:
Chris P Ponting, Leo Goodstadt (2009)  Separating derived from ancestral features of mouse and human genomes.   Biochem Soc Trans 37: Pt 4. 734-739 Aug  
Abstract: To take full advantage of the mouse as a model organism, it is essential to distinguish lineage-specific biology from what is shared between human and mouse. Investigations into shared genetic elements common to both have been well served by the draft human and mouse genome sequences. More recently, the virtually complete euchromatic sequences of the two reference genomes have been finished. These reveal a high ( approximately 5%) level of sequence duplications that had previously been recalcitrant to sequencing and assembly. Within these duplications lie large numbers of rodent- or primate-specific genes. In the present paper, we review the sequence properties of the two genomes, dwelling most on the duplications, deletions and insertions that separate each of them from their most recent common ancestor, approx. 90 million years ago. We consider the differences in gene numbers and repertoires between the two species, and speculate on their contributions to lineage-specific biology. Loss of ancient single-copy genes are rare, as are gains of new functional genes through retrotransposition. Instead, most changes to the gene repertoire have occurred in large multicopy families. It has been proposed that numbers of such 'environmental genes' rise and fall, and their sequences change, as adaptive responses to infection and other environmental pressures, including conspecific competition. Nevertheless, many such genes may be under little or no selection.
Notes:
2008
Duc-Quang Nguyen, Caleb Webber, Jayne Hehir-Kwa, Rolph Pfundt, Joris Veltman, Chris P Ponting (2008)  Reduced purifying selection prevails over positive selection in human copy number variant evolution.   Genome Res 18: 11. 1711-1723 Nov  
Abstract: Copy number variation is a dominant contributor to genomic variation and may frequently underlie an individual's variable susceptibilities to disease. Here we question our previous proposition that copy number variants (CNVs) are often retained in the human population because of their adaptive benefit. We show that genic biases of CNVs are best explained, not by positive selection, but by reduced efficiency of selection in eliminating deleterious changes from the human population. Of four CNV data sets examined, three exhibit significant increases in protein evolutionary rates. These increases appear to be attributable to the frequent coincidence of CNVs with segmental duplications (SDs) that recombine infrequently. Furthermore, human orthologs of mouse genes, which, when disrupted, result in pre- or postnatal lethality, are unusually depleted in CNVs. Together, these findings support a model of reduced purifying selection (Hill-Robertson interference) within copy number variable regions that are enriched in nonessential genes, allowing both the fixation of slightly deleterious substitutions and increased drift of CNV alleles. Additionally, all four CNV sets exhibited increased rates of interspecies chromosomal rearrangement and nucleotide substitution and an increased gene density. We observe that sequences with high G+C contents are most prone to copy number variation. In particular, frequently duplicated human SD sequence, or CNVs that are large and/or observed frequently, tend to be elevated in G+C content. In contrast, SD sequences that appear fixed in the human population lie more frequently within low G+C sequence. These findings provide an overarching view of how CNVs arise and segregate in the human population.
Notes:
Chris P Ponting (2008)  The functional repertoires of metazoan genomes.   Nat Rev Genet 9: 9. 689-698 Sep  
Abstract: Metazoan genomes are being sequenced at an increasingly rapid rate. For each new genome, the number of protein-coding genes it encodes and the amount of functional DNA it contains are known only inaccurately. Nevertheless, there have been considerable recent advances in identifying protein-coding and non-coding sequences that have remained constrained in diverse species. However, these approaches struggle to pinpoint genomic sequences that are functional in some species but that are absent or not functional in others. Yet it is here, encoded in lineage-specific and functional sequence, that we expect physiological differences between species to be most concentrated.
Notes:
Andreas Heger, Chris P Ponting (2008)  OPTIC: orthologous and paralogous transcripts in clades.   Nucleic Acids Res 36: Database issue. D267-D270 Jan  
Abstract: The genome sequences of a large number of metazoan species are now known. As multiple closely related genomes are sequenced, comparative studies that previously focussed on only pairs of genomes can now be extended over whole clades. The orthologous and paralogous transcripts in clades (OPTIC) database currently provides sets of gene predictions and orthology assignments for three clades: (i) amniotes, including human, dog, mouse, opossum, platypus and chicken (17 443 orthologous groups); (ii) a Drosophila clade of 12 species (12 889 orthologous groups) and (iii) a nematode clade of four species (13 626 orthologous groups). Gene predictions, multiple alignments and phylogenetic trees are freely available to browse and download from http://genserv.anat.ox.ac.uk/clades. Further genomes and clades will be added in the future.
Notes:
Wesley C Warren, LaDeana W Hillier, Jennifer A Marshall Graves, Ewan Birney, Chris P Ponting, Frank Grützner, Katherine Belov, Webb Miller, Laura Clarke, Asif T Chinwalla, Shiaw-Pyng Yang, Andreas Heger, Devin P Locke, Pat Miethke, Paul D Waters, Frédéric Veyrunes, Lucinda Fulton, Bob Fulton, Tina Graves, John Wallis, Xose S Puente, Carlos López-Otín, Gonzalo R Ordóñez, Evan E Eichler, Lin Chen, Ze Cheng, Janine E Deakin, Amber Alsop, Katherine Thompson, Patrick Kirby, Anthony T Papenfuss, Matthew J Wakefield, Tsviya Olender, Doron Lancet, Gavin A Huttley, Arian F A Smit, Andrew Pask, Peter Temple-Smith, Mark A Batzer, Jerilyn A Walker, Miriam K Konkel, Robert S Harris, Camilla M Whittington, Emily S W Wong, Neil J Gemmell, Emmanuel Buschiazzo, Iris M Vargas Jentzsch, Angelika Merkel, Juergen Schmitz, Anja Zemann, Gennady Churakov, Jan Ole Kriegs, Juergen Brosius, Elizabeth P Murchison, Ravi Sachidanandam, Carly Smith, Gregory J Hannon, Enkhjargal Tsend-Ayush, Daniel McMillan, Rosalind Attenborough, Willem Rens, Malcolm Ferguson-Smith, Christophe M Lefèvre, Julie A Sharp, Kevin R Nicholas, David A Ray, Michael Kube, Richard Reinhardt, Thomas H Pringle, James Taylor, Russell C Jones, Brett Nixon, Jean-Louis Dacheux, Hitoshi Niwa, Yoko Sekita, Xiaoqiu Huang, Alexander Stark, Pouya Kheradpour, Manolis Kellis, Paul Flicek, Yuan Chen, Caleb Webber, Ross Hardison, Joanne Nelson, Kym Hallsworth-Pepin, Kim Delehaunty, Chris Markovic, Pat Minx, Yucheng Feng, Colin Kremitzki, Makedonka Mitreva, Jarret Glasscock, Todd Wylie, Patricia Wohldmann, Prathapan Thiru, Michael N Nhan, Craig S Pohl, Scott M Smith, Shunfeng Hou, Mikhail Nefedov, Pieter J de Jong, Marilyn B Renfree, Elaine R Mardis, Richard K Wilson (2008)  Genome analysis of the platypus reveals unique signatures of evolution.   Nature 453: 7192. 175-183 May  
Abstract: We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.
Notes:
Camilla M Whittington, Anthony T Papenfuss, Paramjit Bansal, Allan M Torres, Emily S W Wong, Janine E Deakin, Tina Graves, Amber Alsop, Kyriena Schatzkamer, Colin Kremitzki, Chris P Ponting, Peter Temple-Smith, Wesley C Warren, Philip W Kuchel, Katherine Belov (2008)  Defensins and the convergent evolution of platypus and reptile venom genes.   Genome Res 18: 6. 986-994 Jun  
Abstract: When the platypus (Ornithorhynchus anatinus) was first discovered, it was thought to be a taxidermist's hoax, as it has a blend of mammalian and reptilian features. It is a most remarkable mammal, not only because it lays eggs but also because it is venomous. Rather than delivering venom through a bite, as do snakes and shrews, male platypuses have venomous spurs on each hind leg. The platypus genome sequence provides a unique opportunity to unravel the evolutionary history of many of these interesting features. While searching the platypus genome for the sequences of antimicrobial defensin genes, we identified three Ornithorhynchus venom defensin-like peptide (OvDLP) genes, which produce the major components of platypus venom. We show that gene duplication and subsequent functional diversification of beta-defensins gave rise to these platypus OvDLPs. The OvDLP genes are located adjacent to the beta-defensins and share similar gene organization and peptide structures. Intriguingly, some species of snakes and lizards also produce venoms containing similar molecules called crotamines and crotamine-like peptides. This led us to trace the evolutionary origins of other components of platypus and reptile venom. Here we show that several venom components have evolved separately in the platypus and reptiles. Convergent evolution has repeatedly selected genes coding for proteins containing specific structural motifs as templates for venom molecules.
Notes:
Mai M Abd El-Aziz, Isabel Barragan, Ciara A O'Driscoll, Leo Goodstadt, Elena Prigmore, Salud Borrego, Marcela Mena, Juan I Pieras, Mohamed F El-Ashry, Leen Abu Safieh, Amna Shah, Michael E Cheetham, Nigel P Carter, Christina Chakarova, Chris P Ponting, Shomi S Bhattacharya, Guillermo Antinolo (2008)  EYS, encoding an ortholog of Drosophila spacemaker, is mutated in autosomal recessive retinitis pigmentosa.   Nat Genet 40: 11. 1285-1287 Nov  
Abstract: Using a positional cloning approach supported by comparative genomics, we have identified a previously unreported gene, EYS, at the RP25 locus on chromosome 6q12 commonly mutated in autosomal recessive retinitis pigmentosa. Spanning over 2 Mb, this is the largest eye-specific gene identified so far. EYS is independently disrupted in four other mammalian lineages, including that of rodents, but is well conserved from Drosophila to man and is likely to have a role in the modeling of retinal architecture.
Notes:
Preeti Bakrania, Maria Efthymiou, Johannes C Klein, Alison Salt, David J Bunyan, Alex Wyatt, Chris P Ponting, Angela Martin, Steven Williams, Victoria Lindley, Joanne Gilmore, Marie Restori, Anthony G Robson, Magella M Neveu, Graham E Holder, J Richard O Collin, David O Robinson, Peter Farndon, Heidi Johansen-Berg, Dianne Gerrelli, Nicola K Ragge (2008)  Mutations in BMP4 cause eye, brain, and digit developmental anomalies: overlap between the BMP4 and hedgehog signaling pathways.   Am J Hum Genet 82: 2. 304-319 Feb  
Abstract: Developmental ocular malformations, including anophthalmia-microphthalmia (AM), are heterogeneous disorders with frequent sporadic or non-Mendelian inheritance. Recurrent interstitial deletions of 14q22-q23 have been associated with AM, sometimes with poly/syndactyly and hypopituitarism. We identify two further cases of AM (one with associated pituitary anomalies) with a 14q22-q23 deletion. Using a positional candidate gene approach, we analyzed the BMP4 (Bone Morphogenetic Protein-4) gene and identified a frameshift mutation (c.226del2, p.S76fs104X) that segregated with AM, retinal dystrophy, myopia, brain anomalies, and polydactyly in a family and a nonconservative missense mutation (c.278A-->G, p.E93G) in a highly conserved base in another family. MR imaging and tractography in the c.226del2 proband revealed a primary brain developmental disorder affecting thalamostriatal and callosal pathways, also present in the affected grandmother. Using in situ hybridization in human embryos, we demonstrate expression of BMP4 in optic vesicle, developing retina and lens, pituitary region, and digits strongly supporting BMP4 as a causative gene for AM, pituitary, and poly/syndactyly. Because BMP4 interacts with HH signaling genes in animals, we evaluated gene expression in human embryos and demonstrate cotemporal and cospatial expression of BMP4 and HH signaling genes. We also identified four cases, some of whom had retinal dystrophy, with "low-penetrant" mutations in both BMP4 and HH signaling genes: SHH (Sonic Hedgehog) or PTCH1 (Patched). We propose that BMP4 is a major gene for AM and/or retinal dystrophy and brain anomalies and may be a candidate gene for myopia and poly/syndactyly. Our finding of low-penetrant variants in BMP4 and HH signaling partners is suggestive of an interaction between the two pathways in humans.
Notes:
Steven A Ramm, Peter L Oliver, Chris P Ponting, Paula Stockley, Richard D Emes (2008)  Sexual selection and the adaptive evolution of mammalian ejaculate proteins.   Mol Biol Evol 25: 1. 207-219 Jan  
Abstract: An elevated rate of substitution characterizes the molecular evolution of reproductive proteins from a wide range of taxa. Although the selective pressures explaining this rapid evolution are yet to be resolved, recent evidence implicates sexual selection as a potentially important explanatory factor. To investigate this hypothesis, we sought evidence of a high rate of adaptive gene evolution linked to postcopulatory sexual selection in muroid rodents, a model vertebrate group displaying a broad range of mating systems. Specifically, we sequenced 7 genes from diverse rodents that are expressed in the testes, prostate, or seminal vesicles, products of which have the potential to act in sperm competition. We inferred positive Darwinian selection in these genes by estimation of the ratio of nonsynonymous (d(N), amino acid changing) to synonymous (d(S), amino acid retaining) substitution rates (omega = d(N)/d(S)). Next, we tested whether variation in this ratio among lineages could be attributed to interspecific variation in mating systems, as inferred from the variation in these rodents' relative testis sizes (RTS). Four of the 7 genes examined (Prm1, Sva, Acrv1, and Svs2, but not Svp2, Msmb, or Spink3) exhibit unambiguous evidence of positive selection. One of these, the seminal vesicle-derived protein Svs2, also shows some evidence for a concentration of positive selection in those lineages in which sperm competition is common. However, this was not a general trend among all the rodent genes we examined. Using the same methods, we then reanalyzed previously published data on 2 primate genes, SEMG1 and SEMG2. Although SEMG2 also shows evidence of positive selection concentrated in lineages subject to high levels of sperm competition, no such trend was found for SEMG1. Overall, despite a high rate of positive selection being a feature of many ejaculate proteins, these results indicate that the action of sexual selection potentially responsible for elevated evolutionary rates may be difficult to detect on a gene-by-gene basis. Although the extreme diversity of reproductive phenotypes exhibited in nature attests to the power of sexual selection, the extent to which this force predominates in driving the rapid molecular evolution of reproductive genes therefore remains to be determined.
Notes:
Christina M Laukaitis, Andreas Heger, Tyler D Blakley, Pavel Munclinger, Chris P Ponting, Robert C Karn (2008)  Rapid bursts of androgen-binding protein (Abp) gene duplication occurred independently in diverse mammals.   BMC Evol Biol 8: 02  
Abstract: BACKGROUND: The draft mouse (Mus musculus) genome sequence revealed an unexpected proliferation of gene duplicates encoding a family of secretoglobin proteins including the androgen-binding protein (ABP) alpha, beta and gamma subunits. Further investigation of 14 alpha-like (Abpa) and 13 beta- or gamma-like (Abpbg) undisrupted gene sequences revealed a rich diversity of developmental stage-, sex- and tissue-specific expression. Despite these studies, our understanding of the evolution of this gene family remains incomplete. Questions arise from imperfections in the initial mouse genome assembly and a dearth of information about the gene family structure in other rodents and mammals. RESULTS: Here, we interrogate the latest 'finished' mouse (Mus musculus) genome sequence assembly to show that the Abp gene repertoire is, in fact, twice as large as reported previously, with 30 Abpa and 34 Abpbg genes and pseudogenes. All of these have arisen since the last common ancestor with rat (Rattus norvegicus). We then demonstrate, by sequencing homologs from species within the Mus genus, that this burst of gene duplication occurred very recently, within the past seven million years. Finally, we survey Abp orthologs in genomes from across the mammalian clade and show that bursts of Abp gene duplications are not specific to the murid rodents; they also occurred recently in the lagomorph (rabbit, Oryctolagus cuniculus) and ruminant (cattle, Bos taurus) lineages, although not in other mammalian taxa. CONCLUSION: We conclude that Abp genes have undergone repeated bursts of gene duplication and adaptive sequence diversification driven by these genes' participation in chemosensation and/or sexual identification.
Notes:
2007
Jasmina Ponjavic, Chris P Ponting (2007)  The long and the short of RNA maps.   Bioessays 29: 11. 1077-1080 Nov  
Abstract: The landscapes of mammalian genomes are characterized by complex patterns of intersecting and overlapping sense and antisense transcription, giving rise to large numbers of coding and non-protein-coding RNAs (ncRNAs). A recent report by Kapranov and colleagues(1) describes three potentially novel classes of RNAs located at the very edges of protein-coding genes. The presence of RNAs from one of these classes appears to be correlated with the expression levels of their associated genes. These results suggest that a proportion of these RNAs might have roles in the cis-regulation of neighbouring protein-coding genes' expression.
Notes:
Leo Goodstadt, Andreas Heger, Caleb Webber, Chris P Ponting (2007)  An analysis of the gene complement of a marsupial, Monodelphis domestica: evolution of lineage-specific genes and giant chromosomes.   Genome Res 17: 7. 969-981 Jul  
Abstract: The newly sequenced genome of Monodelphis domestica not only provides the out-group necessary to better understand our own eutherian lineage, but it enables insights into the innovative biology of metatherians. Here, we compare Monodelphis with Homo sequences from alignments of single nucleotides, genes, and whole chromosomes. Using PhyOP, we have established orthologs in Homo for 82% (15,250) of Monodelphis gene predictions. Those with single orthologs in each species exhibited a high median synonymous substitution rate (d(S) = 1.02), thereby explaining the relative paucity of aligned regions outside of coding sequences. Orthology assignments were used to construct a synteny map that illustrates the considerable fragmentation of Monodelphis and Homo karyotypes since their therian last common ancestor. Fifteen percent of Monodelphis genes are predicted, from their low divergence at synonymous sites, to have been duplicated in the metatherian lineage. The majority of Monodelphis-specific genes possess predicted roles in chemosensation, reproduction, adaptation to specific diets, and immunity. Using alignments of Monodelphis genes to sequences from either Homo or Trichosurus vulpecula (an Australian marsupial), we show that metatherian X chromosomes have elevated silent substitution rates and high G+C contents in comparison with both metatherian autosomes and eutherian chromosomes. Each of these elevations is also a feature of subtelomeric chromosomal regions. We attribute these observations to high rates of female-specific recombination near the chromosomal ends and within the X chromosome, which act to sustain or increase G+C levels by biased gene conversion. In particular, we propose that the higher G+C content of the Monodelphis X chromosome is a direct consequence of its small size relative to the giant autosomes.
Notes:
Jasmina Ponjavic, Chris P Ponting, Gerton Lunter (2007)  Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs.   Genome Res 17: 5. 556-565 May  
Abstract: Long transcripts that do not encode protein have only rarely been the subject of experimental scrutiny. Presumably, this is owing to the current lack of evidence of their functionality, thereby leaving an impression that, instead, they represent "transcriptional noise." Here, we describe an analysis of 3122 long and full-length, noncoding RNAs ("macroRNAs") from the mouse, and compare their sequences and their promoters with orthologous sequence from human and from rat. We considered three independent signatures of purifying selection related to substitutions, sequence insertions and deletions, and splicing. We find that the evolution of the set of noncoding RNAs is not consistent with neutralist explanations. Rather, our results indicate that purifying selection has acted on the macroRNAs' promoters, primary sequence, and consensus splice site motifs. Promoters have experienced the greatest elimination of nucleotide substitutions, insertions, and deletions. The proportion of conserved sequence (4.1%-5.5%) in these macroRNAs is comparable to the density of exons within protein-coding transcripts (5.2%). These macroRNAs, taken together, thus possess the imprint of purifying selection, thereby indicating their functionality. Our findings should now provide an incentive for the experimental investigation of these macroRNAs' functions.
Notes:
Christina F Chakarova, Myrto G Papaioannou, Hemant Khanna, Irma Lopez, Naushin Waseem, Amna Shah, Torsten Theis, James Friedman, Cecilia Maubaret, Kinga Bujakowska, Brotati Veraitch, Mai M Abd El-Aziz, De Quincy Prescott, Sunil K Parapuram, Wendy A Bickmore, Peter M G Munro, Andreas Gal, Christian P Hamel, Valeria Marigo, Chris P Ponting, Bernd Wissinger, Eberhart Zrenner, Karl Matter, Anand Swaroop, Robert K Koenekoop, Shomi S Bhattacharya (2007)  Mutations in TOPORS cause autosomal dominant retinitis pigmentosa with perivascular retinal pigment epithelium atrophy.   Am J Hum Genet 81: 5. 1098-1103 Nov  
Abstract: We report mutations in the gene for topoisomerase I-binding RS protein (TOPORS) in patients with autosomal dominant retinitis pigmentosa (adRP) linked to chromosome 9p21.1 (locus RP31). A positional-cloning approach, together with the use of bioinformatics, identified TOPORS (comprising three exons and encoding a protein of 1,045 aa) as the gene responsible for adRP. Mutations that include an insertion and a deletion have been identified in two adRP-affected families--one French Canadian and one German family, respectively. Interestingly, a distinct phenotype is noted at the earlier stages of the disease, with an unusual perivascular cuff of retinal pigment epithelium atrophy, which was found surrounding the superior and inferior arcades in the retina. TOPORS is a RING domain-containing E3 ubiquitin ligase and localizes in the nucleus in speckled loci that are associated with promyelocytic leukemia bodies. The ubiquitous nature of TOPORS expression and a lack of mutant protein in patients are highly suggestive of haploinsufficiency, rather than a dominant negative effect, as the molecular mechanism of the disease and make rescue of the clinical phenotype amenable to somatic gene therapy.
Notes:
Tarjei S Mikkelsen, Matthew J Wakefield, Bronwen Aken, Chris T Amemiya, Jean L Chang, Shannon Duke, Manuel Garber, Andrew J Gentles, Leo Goodstadt, Andreas Heger, Jerzy Jurka, Michael Kamal, Evan Mauceli, Stephen M J Searle, Ted Sharpe, Michelle L Baker, Mark A Batzer, Panayiotis V Benos, Katherine Belov, Michele Clamp, April Cook, James Cuff, Radhika Das, Lance Davidow, Janine E Deakin, Melissa J Fazzari, Jacob L Glass, Manfred Grabherr, John M Greally, Wanjun Gu, Timothy A Hore, Gavin A Huttley, Michael Kleber, Randy L Jirtle, Edda Koina, Jeannie T Lee, Shaun Mahony, Marco A Marra, Robert D Miller, Robert D Nicholls, Mayumi Oda, Anthony T Papenfuss, Zuly E Parra, David D Pollock, David A Ray, Jacqueline E Schein, Terence P Speed, Katherine Thompson, John L VandeBerg, Claire M Wade, Jerilyn A Walker, Paul D Waters, Caleb Webber, Jennifer R Weidman, Xiaohui Xie, Michael C Zody, Jennifer A Marshall Graves, Chris P Ponting, Matthew Breen, Paul B Samollow, Eric S Lander, Kerstin Lindblad-Toh (2007)  Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences.   Nature 447: 7141. 167-177 May  
Abstract: We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.
Notes:
Andrew G Clark, Michael B Eisen, Douglas R Smith, Casey M Bergman, Brian Oliver, Therese A Markow, Thomas C Kaufman, Manolis Kellis, William Gelbart, Venky N Iyer, Daniel A Pollard, Timothy B Sackton, Amanda M Larracuente, Nadia D Singh, Jose P Abad, Dawn N Abt, Boris Adryan, Montserrat Aguade, Hiroshi Akashi, Wyatt W Anderson, Charles F Aquadro, David H Ardell, Roman Arguello, Carlo G Artieri, Daniel A Barbash, Daniel Barker, Paolo Barsanti, Phil Batterham, Serafim Batzoglou, Dave Begun, Arjun Bhutkar, Enrico Blanco, Stephanie A Bosak, Robert K Bradley, Adrianne D Brand, Michael R Brent, Angela N Brooks, Randall H Brown, Roger K Butlin, Corrado Caggese, Brian R Calvi, A Bernardo de Carvalho, Anat Caspi, Sergio Castrezana, Susan E Celniker, Jean L Chang, Charles Chapple, Sourav Chatterji, Asif Chinwalla, Alberto Civetta, Sandra W Clifton, Josep M Comeron, James C Costello, Jerry A Coyne, Jennifer Daub, Robert G David, Arthur L Delcher, Kim Delehaunty, Chuong B Do, Heather Ebling, Kevin Edwards, Thomas Eickbush, Jay D Evans, Alan Filipski, Sven Findeiss, Eva Freyhult, Lucinda Fulton, Robert Fulton, Ana C L Garcia, Anastasia Gardiner, David A Garfield, Barry E Garvin, Greg Gibson, Don Gilbert, Sante Gnerre, Jennifer Godfrey, Robert Good, Valer Gotea, Brenton Gravely, Anthony J Greenberg, Sam Griffiths-Jones, Samuel Gross, Roderic Guigo, Erik A Gustafson, Wilfried Haerty, Matthew W Hahn, Daniel L Halligan, Aaron L Halpern, Gillian M Halter, Mira V Han, Andreas Heger, LaDeana Hillier, Angie S Hinrichs, Ian Holmes, Roger A Hoskins, Melissa J Hubisz, Dan Hultmark, Melanie A Huntley, David B Jaffe, Santosh Jagadeeshan, William R Jeck, Justin Johnson, Corbin D Jones, William C Jordan, Gary H Karpen, Eiko Kataoka, Peter D Keightley, Pouya Kheradpour, Ewen F Kirkness, Leonardo B Koerich, Karsten Kristiansen, Dave Kudrna, Rob J Kulathinal, Sudhir Kumar, Roberta Kwok, Eric Lander, Charles H Langley, Richard Lapoint, Brian P Lazzaro, So-Jeong Lee, Lisa Levesque, Ruiqiang Li, Chiao-Feng Lin, Michael F Lin, Kerstin Lindblad-Toh, Ana Llopart, Manyuan Long, Lloyd Low, Elena Lozovsky, Jian Lu, Meizhong Luo, Carlos A Machado, Wojciech Makalowski, Mar Marzo, Muneo Matsuda, Luciano Matzkin, Bryant McAllister, Carolyn S McBride, Brendan McKernan, Kevin McKernan, Maria Mendez-Lago, Patrick Minx, Michael U Mollenhauer, Kristi Montooth, Stephen M Mount, Xu Mu, Eugene Myers, Barbara Negre, Stuart Newfeld, Rasmus Nielsen, Mohamed A F Noor, Patrick O'Grady, Lior Pachter, Montserrat Papaceit, Matthew J Parisi, Michael Parisi, Leopold Parts, Jakob S Pedersen, Graziano Pesole, Adam M Phillippy, Chris P Ponting, Mihai Pop, Damiano Porcelli, Jeffrey R Powell, Sonja Prohaska, Kim Pruitt, Marta Puig, Hadi Quesneville, Kristipati Ravi Ram, David Rand, Matthew D Rasmussen, Laura K Reed, Robert Reenan, Amy Reily, Karin A Remington, Tania T Rieger, Michael G Ritchie, Charles Robin, Yu-Hui Rogers, Claudia Rohde, Julio Rozas, Marc J Rubenfield, Alfredo Ruiz, Susan Russo, Steven L Salzberg, Alejandro Sanchez-Gracia, David J Saranga, Hajime Sato, Stephen W Schaeffer, Michael C Schatz, Todd Schlenke, Russell Schwartz, Carmen Segarra, Rama S Singh, Laura Sirot, Marina Sirota, Nicholas B Sisneros, Chris D Smith, Temple F Smith, John Spieth, Deborah E Stage, Alexander Stark, Wolfgang Stephan, Robert L Strausberg, Sebastian Strempel, David Sturgill, Granger Sutton, Granger G Sutton, Wei Tao, Sarah Teichmann, Yoshiko N Tobari, Yoshihiko Tomimura, Jason M Tsolas, Vera L S Valente, Eli Venter, J Craig Venter, Saverio Vicario, Filipe G Vieira, Albert J Vilella, Alfredo Villasante, Brian Walenz, Jun Wang, Marvin Wasserman, Thomas Watts, Derek Wilson, Richard K Wilson, Rod A Wing, Mariana F Wolfner, Alex Wong, Gane Ka-Shu Wong, Chung-I Wu, Gabriel Wu, Daisuke Yamamoto, Hsiao-Pei Yang, Shiaw-Pyng Yang, James A Yorke, Kiyohito Yoshida, Evgeny Zdobnov, Peili Zhang, Yu Zhang, Aleksey V Zimin, Jennifer Baldwin, Amr Abdouelleil, Jamal Abdulkadir, Adal Abebe, Brikti Abera, Justin Abreu, St Christophe Acer, Lynne Aftuck, Allen Alexander, Peter An, Erica Anderson, Scott Anderson, Harindra Arachi, Marc Azer, Pasang Bachantsang, Andrew Barry, Tashi Bayul, Aaron Berlin, Daniel Bessette, Toby Bloom, Jason Blye, Leonid Boguslavskiy, Claude Bonnet, Boris Boukhgalter, Imane Bourzgui, Adam Brown, Patrick Cahill, Sheridon Channer, Yama Cheshatsang, Lisa Chuda, Mieke Citroen, Alville Collymore, Patrick Cooke, Maura Costello, Katie D'Aco, Riza Daza, Georgius De Haan, Stuart DeGray, Christina DeMaso, Norbu Dhargay, Kimberly Dooley, Erin Dooley, Missole Doricent, Passang Dorje, Kunsang Dorjee, Alan Dupes, Richard Elong, Jill Falk, Abderrahim Farina, Susan Faro, Diallo Ferguson, Sheila Fisher, Chelsea D Foley, Alicia Franke, Dennis Friedrich, Loryn Gadbois, Gary Gearin, Christina R Gearin, Georgia Giannoukos, Tina Goode, Joseph Graham, Edward Grandbois, Sharleen Grewal, Kunsang Gyaltsen, Nabil Hafez, Birhane Hagos, Jennifer Hall, Charlotte Henson, Andrew Hollinger, Tracey Honan, Monika D Huard, Leanne Hughes, Brian Hurhula, M Erii Husby, Asha Kamat, Ben Kanga, Seva Kashin, Dmitry Khazanovich, Peter Kisner, Krista Lance, Marcia Lara, William Lee, Niall Lennon, Frances Letendre, Rosie LeVine, Alex Lipovsky, Xiaohong Liu, Jinlei Liu, Shangtao Liu, Tashi Lokyitsang, Yeshi Lokyitsang, Rakela Lubonja, Annie Lui, Pen MacDonald, Vasilia Magnisalis, Kebede Maru, Charles Matthews, William McCusker, Susan McDonough, Teena Mehta, James Meldrim, Louis Meneus, Oana Mihai, Atanas Mihalev, Tanya Mihova, Rachel Mittelman, Valentine Mlenga, Anna Montmayeur, Leonidas Mulrain, Adam Navidi, Jerome Naylor, Tamrat Negash, Thu Nguyen, Nga Nguyen, Robert Nicol, Choe Norbu, Nyima Norbu, Nathaniel Novod, Barry O'Neill, Sahal Osman, Eva Markiewicz, Otero L Oyono, Christopher Patti, Pema Phunkhang, Fritz Pierre, Margaret Priest, Sujaa Raghuraman, Filip Rege, Rebecca Reyes, Cecil Rise, Peter Rogov, Keenan Ross, Elizabeth Ryan, Sampath Settipalli, Terry Shea, Ngawang Sherpa, Lu Shi, Diana Shih, Todd Sparrow, Jessica Spaulding, John Stalker, Nicole Stange-Thomann, Sharon Stavropoulos, Catherine Stone, Christopher Strader, Senait Tesfaye, Talene Thomson, Yama Thoulutsang, Dawa Thoulutsang, Kerri Topham, Ira Topping, Tsamla Tsamla, Helen Vassiliev, Andy Vo, Tsering Wangchuk, Tsering Wangdi, Michael Weiand, Jane Wilkinson, Adam Wilson, Shailendra Yadav, Geneva Young, Qing Yu, Lisa Zembek, Danni Zhong, Andrew Zimmer, Zac Zwirko, Pablo Alvarez, Will Brockman, Jonathan Butler, CheeWhye Chin, Manfred Grabherr, Michael Kleber, Evan Mauceli, Iain MacCallum (2007)  Evolution of genes and genomes on the Drosophila phylogeny.   Nature 450: 7167. 203-218 Nov  
Abstract: Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
Notes:
Andreas Heger, Chris P Ponting (2007)  Variable strength of translational selection among 12 Drosophila species.   Genetics 177: 3. 1337-1348 Nov  
Abstract: Codon usage bias in Drosophila melanogaster genes has been attributed to negative selection of those codons whose cellular tRNA abundance restricts rates of mRNA translation. Previous studies, which involved limited numbers of genes, can now be compared against analyses of the entire gene complements of 12 Drosophila species whose genome sequences have become available. Using large numbers (6138) of orthologs represented in all 12 species, we establish that the codon preferences of more closely related species are better correlated. Differences between codon usage biases are attributed, in part, to changes in mutational biases. These biases are apparent from the strong correlation (r = 0.92, P < 0.001) among these genomes' intronic G + C contents and exonic G + C contents at degenerate third codon positions. To perform a cross-species comparison of selection on codon usage, while accounting for changes in mutational biases, we calibrated each genome in turn using the codon usage bias indices of highly expressed ribosomal protein genes. The strength of translational selection was predicted to have varied between species largely according to their phylogeny, with the D. melanogaster group species exhibiting the strongest degree of selection.
Notes:
Andreas Heger, Chris P Ponting (2007)  Evolutionary rate analyses of orthologs and paralogs from 12 Drosophila genomes.   Genome Res 17: 12. 1837-1849 Dec  
Abstract: The newly sequenced genome sequences of 11 Drosophila species provide the first opportunity to investigate variations in evolutionary rates across a clade of closely related species. Protein-coding genes were predicted using established Drosophila melanogaster genes as templates, with recovery rates ranging from 81%-97% depending on species divergence and on genome assembly quality. Orthology and paralogy assignments were shown to be self-consistent among the different Drosophila species and to be consistent with regions of conserved gene order (synteny blocks). Next, we investigated the rates of diversification among these species' gene repertoires with respect to amino acid substitutions and to gene duplications. Constraints on amino acid sequences appear to have been most pronounced on D. ananassae and least pronounced on D. simulans and D. erecta terminal lineages. Codons predicted to have been subject to positive selection were found to be significantly over-represented among genes with roles in immune response and RNA metabolism, with the latter category including each subunit of the Dicer-2/r2d2 heterodimer. The vast majority of gene duplications (96.5%) and synteny rearrangements were found to occur, as expected, within single Müller elements. We show that the rate of ancient gene duplications was relatively uniform. However, gene duplications in terminal lineages are strongly skewed toward very recent events, consistent with either a rapid-birth and rapid-death model or the presence of large proportions of copy number variable genes in these Drosophila populations. Duplications were significantly more frequent among trypsin-like proteases and DM8 putative lipid-binding domain proteins.
Notes:
Thomas Gerken, Christophe A Girard, Yi-Chun Loraine Tung, Celia J Webby, Vladimir Saudek, Kirsty S Hewitson, Giles S H Yeo, Michael A McDonough, Sharon Cunliffe, Luke A McNeill, Juris Galvanovskis, Patrik Rorsman, Peter Robins, Xavier Prieur, Anthony P Coll, Marcella Ma, Zorica Jovanovic, I Sadaf Farooqi, Barbara Sedgwick, Inês Barroso, Tomas Lindahl, Chris P Ponting, Frances M Ashcroft, Stephen O'Rahilly, Christopher J Schofield (2007)  The obesity-associated FTO gene encodes a 2-oxoglutarate-dependent nucleic acid demethylase.   Science 318: 5855. 1469-1472 Nov  
Abstract: Variants in the FTO (fat mass and obesity associated) gene are associated with increased body mass index in humans. Here, we show by bioinformatics analysis that FTO shares sequence motifs with Fe(II)- and 2-oxoglutarate-dependent oxygenases. We find that recombinant murine Fto catalyzes the Fe(II)- and 2OG-dependent demethylation of 3-methylthymine in single-stranded DNA, with concomitant production of succinate, formaldehyde, and carbon dioxide. Consistent with a potential role in nucleic acid demethylation, Fto localizes to the nucleus in transfected cells. Studies of wild-type mice indicate that Fto messenger RNA (mRNA) is most abundant in the brain, particularly in hypothalamic nuclei governing energy balance, and that Fto mRNA levels in the arcuate nucleus are regulated by feeding and fasting. Studies can now be directed toward determining the physiologically relevant FTO substrate and how nucleic acid methylation status is linked to increased fat mass.
Notes:
2006
Tom D Bunney, Richard Harris, Natalia Lamuño Gandarillas, Michelle B Josephs, S Mark Roe, S Caroline Sorli, Hugh F Paterson, Fernando Rodrigues-Lima, Diego Esposito, Chris P Ponting, Peter Gierschik, Laurence H Pearl, Paul C Driscoll, Matilda Katan (2006)  Structural and mechanistic insights into ras association domains of phospholipase C epsilon.   Mol Cell 21: 4. 495-507 Feb  
Abstract: Ras proteins signal to a number of distinct pathways by interacting with diverse effectors. Studies of ras/effector interactions have focused on three classes, Raf kinases, ral guanylnucleotide-exchange factors, and phosphatidylinositol-3-kinases. Here we describe ras interactions with another effector, the recently identified phospholipase C epsilon (PLCepsilon). We solved structures of PLCepsilon RA domains (RA1 and RA2) by NMR and the structure of the RA2/ras complex by X-ray crystallography. Although the similarity between ubiquitin-like folds of RA1 and RA2 proves that they are homologs, only RA2 can bind ras. Some of the features of the RA2/ras interface are unique to PLCepsilon, while the ability to make contacts with both switch I and II regions of ras is shared only with phosphatidylinositol-3-kinase. Studies of PLCepsilon regulation suggest that, in a cellular context, the RA2 domain, in a mode specific to PLCepsilon, has a role in membrane targeting with further regulatory impact on PLC activity.
Notes:
Duc-Quang Nguyen, Caleb Webber, Chris P Ponting (2006)  Bias of selection on human copy-number variants.   PLoS Genet 2: 2. Feb  
Abstract: Although large-scale copy-number variation is an important contributor to conspecific genomic diversity, whether these variants frequently contribute to human phenotype differences remains unknown. If they have few functional consequences, then copy-number variants (CNVs) might be expected both to be distributed uniformly throughout the human genome and to encode genes that are characteristic of the genome as a whole. We find that human CNVs are significantly overrepresented close to telomeres and centromeres and in simple tandem repeat sequences. Additionally, human CNVs were observed to be unusually enriched in those protein-coding genes that have experienced significantly elevated synonymous and nonsynonymous nucleotide substitution rates, estimated between single human and mouse orthologues. CNV genes encode disproportionately large numbers of secreted, olfactory, and immunity proteins, although they contain fewer than expected genes associated with Mendelian disease. Despite mouse CNVs also exhibiting a significant elevation in synonymous substitution rates, in most other respects they do not differ significantly from the genomic background. Nevertheless, they encode proteins that are depleted in olfactory function, and they exhibit significantly decreased amino acid sequence divergence. Natural selection appears to have acted discriminately among human CNV genes. The significant overabundance, within human CNVs, of genes associated with olfaction, immunity, protein secretion, and elevated coding sequence divergence, indicates that a subset may have been retained in the human population due to the adaptive benefit of increased gene dosage. By contrast, the functional characteristics of mouse CNVs either suggest that advantageous gene copies have been depleted during recent selective breeding of laboratory mouse strains or suggest that they were preferentially fixed as a consequence of the larger effective population size of wild mice. It thus appears that CNV differences among mouse strains do not provide an appropriate model for large-scale sequence variations in the human population.
Notes:
Gerton Lunter, Chris P Ponting, Jotun Hein (2006)  Genome-wide identification of human functional DNA using a neutral indel model.   PLoS Comput Biol 2: 1. Jan  
Abstract: It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human-mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Furthermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes.
Notes:
Frank S Cordes, Peter Kraiczy, Pietro Roversi, Markus M Simon, Volker Brade, Oliver Jahraus, Russell Wallis, Leo Goodstadt, Chris P Ponting, Christine Skerka, Peter F Zipfel, Reinhard Wallich, Susan M Lea (2006)  Structure-function mapping of BbCRASP-1, the key complement factor H and FHL-1 binding protein of Borrelia burgdorferi.   Int J Med Microbiol 296 Suppl 40: 177-184 May  
Abstract: Borrelia burgdorferi, a spirochaete transmitted to human hosts during feeding of infected Ixodes ticks, is the causative agent of Lyme disease, the most frequent vector-borne disease in Eurasia and North America. Sporadically Lyme disease develops into a chronic, multisystemic disorder. Serum-resistant B. burgdorferi strains bind complement factor H (FH) and FH-like protein 1 (FHL-1) on the spirochaete surface. This binding is dependent on the expression of proteins termed complement-regulator acquiring surface proteins (CRASPs). The atomic structure of BbCRASP-1, the key FHL-1/FH-binding protein of B. burgdorferi, has recently been determined. Our analysis indicates that its protein topology apparently evolved to provide a high affinity interaction site for FH/FHL-1 and leads to an atomic-level hypothesis for the functioning of BbCRASP-1. This work demonstrates that pathogens interact with complement regulators in ways that are distinct from the mechanisms used by the host and are thus obvious targets for drug design.
Notes:
Chris P Ponting (2006)  A novel domain suggests a ciliary function for ASPM, a brain size determining gene.   Bioinformatics 22: 9. 1031-1035 May  
Abstract: The N-terminal domain of abnormal spindle-like microcephaly-associated protein (ASPM) is identified as a member of a novel family of ASH (ASPM, SPD-2, Hydin) domains. These domains are present in proteins associated with cilia, flagella, the centrosome and the Golgi complex, and in Hydin and OCRL whose deficiencies are associated with hydrocephalus and Lowe oculocerebrorenal syndrome, respectively. Genes encoding ASH domains thus represent good candidates for primary ciliary dyskinesias. ASPM has been proposed to function in neurogenesis and to be a major determinant of cerebral cortical size in humans. Support for this hypothesis stems from associations between mutations in ASPM and primary microcephaly, and from the rapid evolution of ASPM during recent hominid evolution. The identification of the ASH domain family instead indicates possible roles for ASPM in sperm flagellar or in ependymal cells' cilia. ASPM's rapid evolution may thus reflect selective pressures on ciliary function, rather than pressures on mitosis during neurogenesis.
Notes:
Zoë Birtle, Chris P Ponting (2006)  Meisetz and the birth of the KRAB motif.   Bioinformatics 22: 23. 2841-2845 Dec  
Abstract: The largest family of transcription factors in mammals is of Cys(2)His(2) zinc finger-proteins, each with an NH(2)-terminal KRAB motif. Extensive expansions of this family have occurred in separate mammalian lineages, with approximately 400 such genes known in the human genome. Despite their widespread occurrence, the evolutionary provenance of the KRAB motif is unclear since previously it has not been found outside of the tetrapod vertebrates. Here, we show that homologues of the histone methyltransferase Meisetz are present within the sea urchin (Strongylocentrotus purpuratus) genome. Sea urchin and mammalian Meisetz sequences each contain an N-terminal KRAB motif, which thereby establishes an early origin of the KRAB motif prior to the divergence of echinoderm and chordate lineages. Finally, we present evidence that KRAB motifs derive from a novel family of KRI (KRAB Interior) motifs that were present in the last common ancestor of animals, plants and fungi.
Notes:
Leo Goodstadt, Chris P Ponting (2006)  Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human.   PLoS Comput Biol 2: 9. Sep  
Abstract: Accurate predictions of orthology and paralogy relationships are necessary to infer human molecular function from experiments in model organisms. Previous genome-scale approaches to predicting these relationships have been limited by their use of protein similarity and their failure to take into account multiple splicing events and gene prediction errors. We have developed PhyOP, a new phylogenetic orthology prediction pipeline based on synonymous rate estimates, which accurately predicts orthology and paralogy relationships for transcripts, genes, exons, or genomic segments between closely related genomes. We were able to identify orthologue relationships to human genes for 93% of all dog genes from Ensembl. Among 1:1 orthologues, the alignments covered a median of 97.4% of protein sequences, and 92% of orthologues shared essentially identical gene structures. PhyOP accurately recapitulated genomic maps of conserved synteny. Benchmarking against predictions from Ensembl and Inparanoid showed that PhyOP is more accurate, especially in its predictions of paralogy. Nearly half (46%) of PhyOP paralogy predictions are unique. Using PhyOP to investigate orthologues and paralogues in the human and dog genomes, we found that the human assembly contains 3-fold more gene duplications than the dog. Species-specific duplicate genes, or "in-paralogues," are generally shorter and have fewer exons than 1:1 orthologues, which is consistent with selective constraints and mutation biases based on the sizes of duplicated genes. In-paralogues have experienced elevated amino acid and synonymous nucleotide substitution rates. Duplicates possess similar biological functions for either the dog or human lineages. Having accounted for 2,954 likely pseudogenes and gene fragments, and after separating 346 erroneously merged genes, we estimated that the human genome encodes a minimum of 19,700 protein-coding genes, similar to the gene count of nematode worms. PhyOP is a fast and robust approach to orthology prediction that will be applicable to whole genomes from multiple closely related species. PhyOP will be particularly useful in predicting orthology for mammalian genomes that have been incompletely sequenced, and for large families of rapidly duplicating genes.
Notes:
Chris P Ponting, Gerton Lunter (2006)  Signatures of adaptive evolution within human non-coding sequence.   Hum Mol Genet 15 Spec No 2: R170-R175 Oct  
Abstract: The human genome is often portrayed as consisting of three sequence types, each distinguished by their mode of evolution. Purifying selection is estimated to act on 2.5-5.0% of the genome, whereas virtually all remaining sequence is considered to have evolved neutrally and to be devoid of functionality. The third mode of evolution, positive selection of advantageous changes, is considered rare. Such instances have been inferred only for a handful of sites, and these lie almost exclusively within protein-coding genes. Nevertheless, the majority of positively selected sequence is expected to lie within the wealth of functional 'dark matter' present outside of the coding sequence. Here, we review the evolutionary evidence for the majority of human-conserved DNA lying outside of the protein-coding sequence. We argue that within this non-coding fraction lies at least 1 Mb of functional sequence that has accumulated many beneficial nucleotide replacements. Illuminating the functions of this adaptive dark matter will lead to a better understanding of the sequence changes that have shaped the innovative biology of our species.
Notes:
Yanick J Crow, Andrea Leitch, Bruce E Hayward, Anna Garner, Rekha Parmar, Elen Griffith, Manir Ali, Colin Semple, Jean Aicardi, Riyana Babul-Hirji, Clarisse Baumann, Peter Baxter, Enrico Bertini, Kate E Chandler, David Chitayat, Daniel Cau, Catherine Déry, Elisa Fazzi, Cyril Goizet, Mary D King, Joerg Klepper, Didier Lacombe, Giovanni Lanzi, Hermione Lyall, María Luisa Martínez-Frías, Michèle Mathieu, Carole McKeown, Anne Monier, Yvette Oade, Oliver W Quarrell, Christopher D Rittey, R Curtis Rogers, Amparo Sanchis, John B P Stephenson, Uta Tacke, Marianne Till, John L Tolmie, Pam Tomlin, Thomas Voit, Bernhard Weschke, C Geoffrey Woods, Pierre Lebon, David T Bonthron, Chris P Ponting, Andrew P Jackson (2006)  Mutations in genes encoding ribonuclease H2 subunits cause Aicardi-Goutières syndrome and mimic congenital viral brain infection.   Nat Genet 38: 8. 910-916 Aug  
Abstract: Aicardi-Goutières syndrome (AGS) is an autosomal recessive neurological disorder, the clinical and immunological features of which parallel those of congenital viral infection. Here we define the composition of the human ribonuclease H2 enzyme complex and show that AGS can result from mutations in the genes encoding any one of its three subunits. Our findings demonstrate a role for ribonuclease H in human neurological disease and suggest an unanticipated relationship between ribonuclease H2 and the antiviral immune response that warrants further investigation.
Notes:
2005
Nicola J Mulder, Rolf Apweiler, Teresa K Attwood, Amos Bairoch, Alex Bateman, David Binns, Paul Bradley, Peer Bork, Phillip Bucher, Lorenzo Cerutti, Richard Copley, Emmanuel Courcelle, Ujjwal Das, Richard Durbin, Wolfgang Fleischmann, Julian Gough, Daniel Haft, Nicola Harte, Nicolas Hulo, Daniel Kahn, Alexander Kanapin, Maria Krestyaninova, David Lonsdale, Rodrigo Lopez, Ivica Letunic, Martin Madera, John Maslen, Jennifer McDowall, Alex Mitchell, Anastasia N Nikolskaya, Sandra Orchard, Marco Pagni, Chris P Ponting, Emmanuel Quevillon, Jeremy Selengut, Christian J A Sigrist, Ville Silventoinen, David J Studholme, Robert Vaughan, Cathy H Wu (2005)  InterPro, progress and status in 2005.   Nucleic Acids Res 33: Database issue. D201-D205 Jan  
Abstract: InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
Notes:
Chris Ponting, Andrew P Jackson (2005)  Evolution of primary microcephaly genes and the enlargement of primate brains.   Curr Opin Genet Dev 15: 3. 241-248 Jun  
Abstract: Brain size, in relation to body size, has varied markedly during the evolution of mammals. In particular, a large cerebral cortex is a feature that distinguishes humans from our fellow primates. Such anatomical changes must have a basis in genetic alterations, but the molecular processes involved have yet to be defined. However, recent advances from the cloning of two human disease genes promise to make inroads in this important area. Microcephalin (MCPH1) and Abnormal spindle-like microcephaly associated (ASPM) are genes mutated in primary microcephaly, a human neurodevelopmental disorder. In this 'atavistic' condition, brain size is reduced in volume to a size comparable with that of early hominids. Hence, it has been proposed that these genes evolved adaptively with increasing primate brain size. Subsequent studies have lent weight to this hypothesis by showing that both genes have undergone positive selection during great ape evolution. Further functional characterisation of their proteins will contribute to an understanding of the molecular and evolutionary processes that have determined human brain size.
Notes:
Christina M Laukaitis, Stephen R Dlouhy, Richard D Emes, Chris P Ponting, Robert C Karn (2005)  Diverse spatial, temporal, and sexual expression of recently duplicated androgen-binding protein genes in Mus musculus.   BMC Evol Biol 5: 07  
Abstract: BACKGROUND: The genes for salivary androgen-binding protein (ABP) subunits have been evolving rapidly in ancestors of the house mouse Mus musculus, as evidenced both by recent and extensive gene duplication and by high ratios of nonsynonymous to synonymous nucleotide substitution rates. This makes ABP an appropriate model system with which to investigate how recent adaptive evolution of paralogous genes results in functional innovation (neofunctionalization). RESULTS: It was our goal to find evidence for the expression of as many of the Abp paralogues in the mouse genome as possible. We observed expression of six Abpa paralogues and five Abpbg paralogues in ten glands and other organs located predominantly in the head and neck (olfactory lobe of the brain, three salivary glands, lacrimal gland, Harderian gland, vomeronasal organ, and major olfactory epithelium). These Abp paralogues differed dramatically in their specific expression in these different glands and in their sexual dimorphism of expression. We also studied the appearance of expression in both late-stage embryos and postnatal animals prior to puberty and found significantly different timing of the onset of expression among the various paralogues. CONCLUSION: The multiple changes in the spatial expression profile of these genes resulting in various combinations of expression in glands and other organs in the head and face of the mouse strongly suggest that neofunctionalization of these genes, driven by adaptive evolution, has occurred following duplication. The extensive diversification in expression of this family of proteins provides two lines of evidence for a pheromonal role for ABP: 1) different patterns of Abpa/Abpbg expression in different glands; and 2) sexual dimorphism in the expression of the paralogues in a subset of those glands. These expression patterns differ dramatically among various glands that are located almost exclusively in the head and neck, where the sensory organs are located. Since mice are nocturnal, it is expected that they will make extensive use of olfactory as opposed to visual cues. The glands expressing Abp paralogues produce secretions (lacrimal and salivary) or detect odors (MOE and VNO) and thus it appears highly likely that ABP proteins play a role in olfactory communication.
Notes:
Zoë Birtle, Leo Goodstadt, Chris Ponting (2005)  Duplication and positive selection among hominin-specific PRAME genes.   BMC Genomics 6: 09  
Abstract: BACKGROUND: The physiological and phenotypic differences between human and chimpanzee are largely specified by our genomic differences. We have been particularly interested in recent duplications in the human genome as examples of relatively large-scale changes to our genome. We performed an in-depth evolutionary analysis of a region of chromosome 1, which is copy number polymorphic among humans, and that contains at least 32 PRAME (Preferentially expressed antigen of melanoma) genes and pseudogenes. PRAME-like genes are expressed in the testis and in a large number of tumours, and are thought to possess roles in spermatogenesis and oogenesis. RESULTS: Using nucleotide substitution rate estimates for exons and introns, we show that two large segmental duplications, of six and seven human PRAME genes respectively, occurred in the last 3 million years. These duplicated genes are thus hominin-specific, having arisen in our genome since the divergence from chimpanzee. This cluster of PRAME genes appears to have arisen initially from a translocation approximately 95-85 million years ago. We identified multiple sites within human or mouse PRAME sequences which exhibit strong evidence of positive selection. These form a pronounced cluster on one face of the predicted PRAME protein structure. CONCLUSION: We predict that PRAME genes evolved adaptively due to strong competition between rapidly-dividing cells during spermatogenesis and oogenesis. We suggest that as PRAME gene copy number is polymorphic among individuals, positive selection of PRAME alleles may still prevail within the human population.
Notes:
Caleb Webber, Chris P Ponting (2005)  Hotspots of mutation and breakage in dog and human chromosomes.   Genome Res 15: 12. 1787-1797 Dec  
Abstract: Sequencing of the dog genome allows an investigation of the location-dependent evolutionary processes that occurred since the common ancestor of primates and carnivores, approximately 95 million years ago. We investigated variations in G+C nucleotide fraction and synonymous nucleotide substitution rates (Ks) across dog and human genomes. Our results show that dog genes located either in subtelomeric and pericentromeric regions, or in short synteny blocks, possess significantly elevated G+C fraction and Ks values. Human subtelomeric, but not pericentromeric, genes also exhibit these elevations. We then examined 1.048 Gb of human sequence that is likely not to have been located near a primate telomere at any time since the common ancestor of dog and human. We observed that regions of highest G+C or Ks ("hotspots"; median sizes of 0.5 or 1.3 Mb, respectively) within this sequence were preferentially segregated to dog subtelomeres and pericentromeres during the rearrangements that eventually gave rise to the extant canine karyotype. Our data cannot be accounted for solely on the basis of gradually elevating G+C fractions in subtelomeric regions as a consequence of biased gene conversion. Rather, we propose that high G+C sequences are found preferentially within dog subtelomeres as a direct consequence of chromosomal fission occurring more frequently within regions elevated in G+C.
Notes:
Kerstin Lindblad-Toh, Claire M Wade, Tarjei S Mikkelsen, Elinor K Karlsson, David B Jaffe, Michael Kamal, Michele Clamp, Jean L Chang, Edward J Kulbokas, Michael C Zody, Evan Mauceli, Xiaohui Xie, Matthew Breen, Robert K Wayne, Elaine A Ostrander, Chris P Ponting, Francis Galibert, Douglas R Smith, Pieter J DeJong, Ewen Kirkness, Pablo Alvarez, Tara Biagi, William Brockman, Jonathan Butler, Chee-Wye Chin, April Cook, James Cuff, Mark J Daly, David DeCaprio, Sante Gnerre, Manfred Grabherr, Manolis Kellis, Michael Kleber, Carolyne Bardeleben, Leo Goodstadt, Andreas Heger, Christophe Hitte, Lisa Kim, Klaus-Peter Koepfli, Heidi G Parker, John P Pollinger, Stephen M J Searle, Nathan B Sutter, Rachael Thomas, Caleb Webber, Jennifer Baldwin, Adal Abebe, Amr Abouelleil, Lynne Aftuck, Mostafa Ait-Zahra, Tyler Aldredge, Nicole Allen, Peter An, Scott Anderson, Claudel Antoine, Harindra Arachchi, Ali Aslam, Laura Ayotte, Pasang Bachantsang, Andrew Barry, Tashi Bayul, Mostafa Benamara, Aaron Berlin, Daniel Bessette, Berta Blitshteyn, Toby Bloom, Jason Blye, Leonid Boguslavskiy, Claude Bonnet, Boris Boukhgalter, Adam Brown, Patrick Cahill, Nadia Calixte, Jody Camarata, Yama Cheshatsang, Jeffrey Chu, Mieke Citroen, Alville Collymore, Patrick Cooke, Tenzin Dawoe, Riza Daza, Karin Decktor, Stuart DeGray, Norbu Dhargay, Kimberly Dooley, Kathleen Dooley, Passang Dorje, Kunsang Dorjee, Lester Dorris, Noah Duffey, Alan Dupes, Osebhajajeme Egbiremolen, Richard Elong, Jill Falk, Abderrahim Farina, Susan Faro, Diallo Ferguson, Patricia Ferreira, Sheila Fisher, Mike FitzGerald, Karen Foley, Chelsea Foley, Alicia Franke, Dennis Friedrich, Diane Gage, Manuel Garber, Gary Gearin, Georgia Giannoukos, Tina Goode, Audra Goyette, Joseph Graham, Edward Grandbois, Kunsang Gyaltsen, Nabil Hafez, Daniel Hagopian, Birhane Hagos, Jennifer Hall, Claire Healy, Ryan Hegarty, Tracey Honan, Andrea Horn, Nathan Houde, Leanne Hughes, Leigh Hunnicutt, M Husby, Benjamin Jester, Charlien Jones, Asha Kamat, Ben Kanga, Cristyn Kells, Dmitry Khazanovich, Alix Chinh Kieu, Peter Kisner, Mayank Kumar, Krista Lance, Thomas Landers, Marcia Lara, William Lee, Jean-Pierre Leger, Niall Lennon, Lisa Leuper, Sarah LeVine, Jinlei Liu, Xiaohong Liu, Yeshi Lokyitsang, Tashi Lokyitsang, Annie Lui, Jan Macdonald, John Major, Richard Marabella, Kebede Maru, Charles Matthews, Susan McDonough, Teena Mehta, James Meldrim, Alexandre Melnikov, Louis Meneus, Atanas Mihalev, Tanya Mihova, Karen Miller, Rachel Mittelman, Valentine Mlenga, Leonidas Mulrain, Glen Munson, Adam Navidi, Jerome Naylor, Tuyen Nguyen, Nga Nguyen, Cindy Nguyen, Thu Nguyen, Robert Nicol, Nyima Norbu, Choe Norbu, Nathaniel Novod, Tenchoe Nyima, Peter Olandt, Barry O'Neill, Keith O'Neill, Sahal Osman, Lucien Oyono, Christopher Patti, Danielle Perrin, Pema Phunkhang, Fritz Pierre, Margaret Priest, Anthony Rachupka, Sujaa Raghuraman, Rayale Rameau, Verneda Ray, Christina Raymond, Filip Rege, Cecil Rise, Julie Rogers, Peter Rogov, Julie Sahalie, Sampath Settipalli, Theodore Sharpe, Terrance Shea, Mechele Sheehan, Ngawang Sherpa, Jianying Shi, Diana Shih, Jessie Sloan, Cherylyn Smith, Todd Sparrow, John Stalker, Nicole Stange-Thomann, Sharon Stavropoulos, Catherine Stone, Sabrina Stone, Sean Sykes, Pierre Tchuinga, Pema Tenzing, Senait Tesfaye, Dawa Thoulutsang, Yama Thoulutsang, Kerri Topham, Ira Topping, Tsamla Tsamla, Helen Vassiliev, Vijay Venkataraman, Andy Vo, Tsering Wangchuk, Tsering Wangdi, Michael Weiand, Jane Wilkinson, Adam Wilson, Shailendra Yadav, Shuli Yang, Xiaoping Yang, Geneva Young, Qing Yu, Joanne Zainoun, Lisa Zembek, Andrew Zimmer, Eric S Lander (2005)  Genome sequence, comparative analysis and haplotype structure of the domestic dog.   Nature 438: 7069. 803-819 Dec  
Abstract: Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.
Notes:
Eitan E Winter, Chris P Ponting (2005)  Mammalian BEX, WEX and GASP genes: coding and non-coding chimaerism sustained by gene conversion events.   BMC Evol Biol 5: 10  
Abstract: BACKGROUND: The identification of sequence innovations in the genomes of mammals facilitates understanding of human gene function, as well as sheds light on the molecular mechanisms which underlie these changes. Although gene duplication plays a major role in genome evolution, studies regarding concerted evolution events among gene family members have been limited in scope and restricted to protein-coding regions, where high sequence similarity is easily detectable. RESULTS: We describe a mammalian-specific expansion of more than 20 rapidly-evolving genes on human chromosome Xq22.1. Many of these are highly divergent in their protein-coding regions yet contain a conserved sequence motif in their 5' UTRs which appears to have been maintained by multiple events of concerted evolution. These events have led to the generation of chimaeric genes, each with a 5' UTR and a protein-coding region that possess independent evolutionary histories. We suggest that concerted evolution has occurred via gene conversion independently in different mammalian lineages, and these events have resulted in elevated G+C levels in the encompassing genomic regions. These concerted evolution events occurred within and between genes from three separate protein families ('brain-expressed X-linked' [BEX], WWbp5-like X-linked [WEX] and G-protein-coupled receptor-associated sorting protein [GASP]), which often are expressed in mammalian brains and associated with receptor mediated signalling and apoptosis. CONCLUSION: Despite high protein-coding divergence among mammalian-specific genes, we identified a DNA motif common to these genes' 5' UTR exons. The motif has undergone concerted evolution events independently of its neighbouring protein-coding regions, leading to formation of evolutionary chimaeric genes. These findings have implications for the identification of non protein-coding regulatory elements and their lineage-specific evolution in mammals.
Notes:
K Coward, C P Ponting, H - Y Chang, O Hibbitt, P Savolainen, K T Jones, J Parrington (2005)  Phospholipase Czeta, the trigger of egg activation in mammals, is present in a non-mammalian species.   Reproduction 130: 2. 157-163 Aug  
Abstract: The activation of the egg to begin development into an embryo is triggered by a sperm-induced increase in intracellular egg Ca2+. There has been much controversy about how the sperm induces this fundamental developmental event, but recent studies suggest that, in mammals, egg activation is triggered by a testis-specific phospholipase C: PLCzeta. Since the discovery of PLCzeta, it has been unclear whether its role in triggering egg activation is common to all vertebrates, or is confined to mammals. Here, we demonstrate for the first time that PLCzeta is present in a non-mammalian vertebrate. Using genomic and cDNA databases, we have identified the cDNA encoding a PLCzeta orthologue in the domestic chicken that, like the mammalian isoforms, is a testis-specific gene. The chicken PLCzeta cDNA is 2152 bp in size and encodes an open reading frame of 639 amino acids. When injected into mouse oocytes, chicken PLCzeta cRNA triggers Ca2+ oscillations, indicating that it has functional properties similar to those of mammalian PLCzeta. Our findings suggest that PLCzeta may have a universal role in triggering egg activation in vertebrates.
Notes:
2004
Richard A Gibbs, George M Weinstock, Michael L Metzker, Donna M Muzny, Erica J Sodergren, Steven Scherer, Graham Scott, David Steffen, Kim C Worley, Paula E Burch, Geoffrey Okwuonu, Sandra Hines, Lora Lewis, Christine DeRamo, Oliver Delgado, Shannon Dugan-Rocha, George Miner, Margaret Morgan, Alicia Hawes, Rachel Gill, Celera, Robert A Holt, Mark D Adams, Peter G Amanatides, Holly Baden-Tillson, Mary Barnstead, Soo Chin, Cheryl A Evans, Steve Ferriera, Carl Fosler, Anna Glodek, Zhiping Gu, Don Jennings, Cheryl L Kraft, Trixie Nguyen, Cynthia M Pfannkoch, Cynthia Sitter, Granger G Sutton, J Craig Venter, Trevor Woodage, Douglas Smith, Hong-Mei Lee, Erik Gustafson, Patrick Cahill, Arnold Kana, Lynn Doucette-Stamm, Keith Weinstock, Kim Fechtel, Robert B Weiss, Diane M Dunn, Eric D Green, Robert W Blakesley, Gerard G Bouffard, Pieter J De Jong, Kazutoyo Osoegawa, Baoli Zhu, Marco Marra, Jacqueline Schein, Ian Bosdet, Chris Fjell, Steven Jones, Martin Krzywinski, Carrie Mathewson, Asim Siddiqui, Natasja Wye, John McPherson, Shaying Zhao, Claire M Fraser, Jyoti Shetty, Sofiya Shatsman, Keita Geer, Yixin Chen, Sofyia Abramzon, William C Nierman, Paul H Havlak, Rui Chen, K James Durbin, Amy Egan, Yanru Ren, Xing-Zhi Song, Bingshan Li, Yue Liu, Xiang Qin, Simon Cawley, A J Cooney, Lisa M D'Souza, Kirt Martin, Jia Qian Wu, Manuel L Gonzalez-Garay, Andrew R Jackson, Kenneth J Kalafus, Michael P McLeod, Aleksandar Milosavljevic, Davinder Virk, Andrei Volkov, David A Wheeler, Zhengdong Zhang, Jeffrey A Bailey, Evan E Eichler, Eray Tuzun, Ewan Birney, Emmanuel Mongin, Abel Ureta-Vidal, Cara Woodwark, Evgeny Zdobnov, Peer Bork, Mikita Suyama, David Torrents, Marina Alexandersson, Barbara J Trask, Janet M Young, Hui Huang, Huajun Wang, Heming Xing, Sue Daniels, Darryl Gietzen, Jeanette Schmidt, Kristian Stevens, Ursula Vitt, Jim Wingrove, Francisco Camara, M Mar Albà, Josep F Abril, Roderic Guigo, Arian Smit, Inna Dubchak, Edward M Rubin, Olivier Couronne, Alexander Poliakov, Norbert Hübner, Detlev Ganten, Claudia Goesele, Oliver Hummel, Thomas Kreitler, Young-Ae Lee, Jan Monti, Herbert Schulz, Heike Zimdahl, Heinz Himmelbauer, Hans Lehrach, Howard J Jacob, Susan Bromberg, Jo Gullings-Handley, Michael I Jensen-Seaman, Anne E Kwitek, Jozef Lazar, Dean Pasko, Peter J Tonellato, Simon Twigger, Chris P Ponting, Jose M Duarte, Stephen Rice, Leo Goodstadt, Scott A Beatson, Richard D Emes, Eitan E Winter, Caleb Webber, Petra Brandt, Gerald Nyakatura, Margaret Adetobi, Francesca Chiaromonte, Laura Elnitski, Pallavi Eswara, Ross C Hardison, Minmei Hou, Diana Kolbe, Kateryna Makova, Webb Miller, Anton Nekrutenko, Cathy Riemer, Scott Schwartz, James Taylor, Shan Yang, Yi Zhang, Klaus Lindpaintner, T Dan Andrews, Mario Caccamo, Michele Clamp, Laura Clarke, Valerie Curwen, Richard Durbin, Eduardo Eyras, Stephen M Searle, Gregory M Cooper, Serafim Batzoglou, Michael Brudno, Arend Sidow, Eric A Stone, Bret A Payseur, Guillaume Bourque, Carlos López-Otín, Xose S Puente, Kushal Chakrabarti, Sourav Chatterji, Colin Dewey, Lior Pachter, Nicolas Bray, Von Bing Yap, Anat Caspi, Glenn Tesler, Pavel A Pevzner, David Haussler, Krishna M Roskin, Robert Baertsch, Hiram Clawson, Terrence S Furey, Angie S Hinrichs, Donna Karolchik, William J Kent, Kate R Rosenbloom, Heather Trumbower, Matt Weirauch, David N Cooper, Peter D Stenson, Bin Ma, Michael Brent, Manimozhiyan Arumugam, David Shteynberg, Richard R Copley, Martin S Taylor, Harold Riethman, Uma Mudunuri, Jane Peterson, Mark Guyer, Adam Felsenfeld, Susan Old, Stephen Mockrin, Francis Collins (2004)  Genome sequence of the Brown Norway rat yields insights into mammalian evolution.   Nature 428: 6982. 493-521 Apr  
Abstract: The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
Notes:
Ivica Letunic, Richard R Copley, Steffen Schmidt, Francesca D Ciccarelli, Tobias Doerks, Jörg Schultz, Chris P Ponting, Peer Bork (2004)  SMART 4.0: towards genomic data integration.   Nucleic Acids Res 32: Database issue. D142-D144 Jan  
Abstract: SMART (Simple Modular Architecture Research Tool) is a web tool (http://smart.embl.de/) for the identification and annotation of protein domains, and provides a platform for the comparative study of complex domain architectures in genes and proteins. The January 2004 release of SMART contains 685 protein domains. New developments in SMART are centred on the integration of data from completed metazoan genomes. SMART now uses predicted proteins from complete genomes in its source sequence databases, and integrates these with predictions of orthology. New visualization tools have been developed to allow analysis of gene intron-exon structure within the context of protein domain structure, and to align these displays to provide schematic comparisons of orthologous genes, or multiple transcripts from the same gene. Other improvements include the ability to query SMART by Gene Ontology terms, improved structure database searching and batch retrieval of multiple entries.
Notes:
Martin S Taylor, Chris P Ponting, Richard R Copley (2004)  Occurrence and consequences of coding sequence insertions and deletions in Mammalian genomes.   Genome Res 14: 4. 555-566 Apr  
Abstract: Nucleotide insertion and deletion (indel) events, together with substitutions, represent the major mutational processes of gene evolution. Through the alignment of 8148 orthologous genes from human, mouse, and rat, we have identified 1743 indel events within rodent protein-coding sequences. Using human as an out-group, we reconstructed the mutational event underlying each of these indels. Overall, we found an excess of deletions over insertions, particularly for the rat lineage (70% excess). Sequence slippage accounts for at least 52% of insertions and 38% of deletions. We have also evaluated the selective tolerance of identifiable protein structures to indels. Transmembrane domains are the least, and low complexity regions, the most tolerant. Mapping of indels onto known protein structures demonstrated that structural cores are markedly less tolerant to indels than are loop regions. There is a specific enrichment of CpG dinucleotides in close proximity to insertion events, and both insertions and deletions are more common in higher G+C content sequences.
Notes:
Eitan E Winter, Leo Goodstadt, Chris P Ponting (2004)  Elevated rates of protein secretion, evolution, and disease among tissue-specific genes.   Genome Res 14: 1. 54-61 Jan  
Abstract: Variation in gene expression has been held responsible for the functional and morphological specialization of tissues. The tissue specificity of genes is known to correlate positively with gene evolution rates. We show here, using large data sets, that when a gene is expressed highly in a small number of tissues, its protein is more likely to be secreted and more likely to be mutated in genetic diseases with Mendelian inheritance. We find that secreted proteins are evolving at faster rates than nonsecreted proteins, and that their evolutionary rates are highly correlated with tissue specificity. However, the impact of secretion on evolutionary rates is countered by tissue-specific constraints that have been held constant over the past 75 million years. We find that disease genes are underrepresented among intracellular and slowly evolving housekeeping genes. These findings illuminate major selective pressures that have shaped the gene repertoires expressed in different mammalian tissues.
Notes:
Richard D Emes, Matthew C Riley, Christina M Laukaitis, Leo Goodstadt, Robert C Karn, Chris P Ponting (2004)  Comparative evolutionary genomics of androgen-binding protein genes.   Genome Res 14: 8. 1516-1529 Aug  
Abstract: Allelic variation within the mouse androgen-binding protein (ABP) alpha subunit gene (Abpa) has been suggested to promote assortative mating and thus prezygotic isolation. This is consistent with the elevated evolutionary rates observed for the Abpa gene, and the Abpb and Abpg genes whose products (ABPbeta and ABPgamma) form heterodimers with ABPalpha. We have investigated the mouse sequence that contains the three Abpa/b/g genes, and orthologous regions in rat, human, and chimpanzee genomes. Our studies reveal extensive "remodeling" of this region: Duplication rates of Abpa-like and Abpbg-like genes in mouse are >2 orders of magnitude higher than the average rate for all mouse genes; synonymous nucleotide substitution rates are twofold higher; and the Abpabg genomic region has expanded nearly threefold since divergence of the rodents. During this time, one in six amino acid sites in ABPbetagamma-like proteins appear to have been subject to positive selection; these may constitute a site of interaction with receptors or ligands. Greater adaptive variation among Abpbg-like sequences than among Abpa-like sequences suggests that assortative mating preferences are more influenced by variation in Abpbg-like genes. We propose a role for ABPalpha/beta/gamma proteins as pheromones, or in modulating odorant detection. This would account for the extraordinary adaptive evolution of these genes, and surrounding genomic regions, in murid rodents.
Notes:
Richard D Emes, Scott A Beatson, Chris P Ponting, Leo Goodstadt (2004)  Evolution and comparative genomics of odorant- and pheromone-associated genes in rodents.   Genome Res 14: 4. 591-602 Apr  
Abstract: Chemical cues influence a range of behavioral responses in rodents. The involvement of protein odorants and odorant receptors in mediating reproductive behavior, foraging, and predator avoidance suggests that their genes may have been subject to adaptive evolution. We have estimated the consequences of selection on rodent pheromones, their receptors, and olfactory receptors. These families were chosen on the basis of multiple gene duplications since the common ancestor of rat and mouse. For each family, codons were identified that are likely to have been subject to adaptive evolution. The majority of such sites are situated on the solvent-accessible surfaces of putative pheromones and the lumenal portions of their likely receptors. We predict that these contribute to physicochemical and functional diversity within pheromone-receptor interaction sites.
Notes:
Gane Ka-Shu Wong, Bin Liu, Jun Wang, Yong Zhang, Xu Yang, Zengjin Zhang, Qingshun Meng, Jun Zhou, Dawei Li, Jingjing Zhang, Peixiang Ni, Songgang Li, Longhua Ran, Heng Li, Jianguo Zhang, Ruiqiang Li, Shengting Li, Hongkun Zheng, Wei Lin, Guangyuan Li, Xiaoling Wang, Wenming Zhao, Jun Li, Chen Ye, Mingtao Dai, Jue Ruan, Yan Zhou, Yuanzhe Li, Ximiao He, Yunze Zhang, Jing Wang, Xiangang Huang, Wei Tong, Jie Chen, Jia Ye, Chen Chen, Ning Wei, Guoqing Li, Le Dong, Fengdi Lan, Yongqiao Sun, Zhenpeng Zhang, Zheng Yang, Yingpu Yu, Yanqing Huang, Dandan He, Yan Xi, Dong Wei, Qiuhui Qi, Wenjie Li, Jianping Shi, Miaoheng Wang, Fei Xie, Jianjun Wang, Xiaowei Zhang, Pei Wang, Yiqiang Zhao, Ning Li, Ning Yang, Wei Dong, Songnian Hu, Changqing Zeng, Weimou Zheng, Bailin Hao, Ladeana W Hillier, Shiaw-Pyng Yang, Wesley C Warren, Richard K Wilson, Mikael Brandström, Hans Ellegren, Richard P M A Crooijmans, Jan J van der Poel, Henk Bovenhuis, Martien A M Groenen, Ivan Ovcharenko, Laurie Gordon, Lisa Stubbs, Susan Lucas, Tijana Glavina, Andrea Aerts, Pete Kaiser, Lisa Rothwell, John R Young, Sally Rogers, Brian A Walker, Andy van Hateren, Jim Kaufman, Nat Bumstead, Susan J Lamont, Huaijun Zhou, Paul M Hocking, David Morrice, Dirk-Jan de Koning, Andy Law, Neil Bartley, David W Burt, Henry Hunt, Hans H Cheng, Ulrika Gunnarsson, Per Wahlberg, Leif Andersson, Ellen Kindlund, Martti T Tammi, Björn Andersson, Caleb Webber, Chris P Ponting, Ian M Overton, Paul E Boardman, Haizhou Tang, Simon J Hubbard, Stuart A Wilson, Jun Yu, Jian Wang, Huanming Yang (2004)  A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms.   Nature 432: 7018. 717-722 Dec  
Abstract: We describe a genetic variation map for the chicken genome containing 2.8 million single-nucleotide polymorphisms (SNPs). This map is based on a comparison of the sequences of three domestic chicken breeds (a broiler, a layer and a Chinese silkie) with that of their wild ancestor, red jungle fowl. Subsequent experiments indicate that at least 90% of the variant sites are true SNPs, and at least 70% are common SNPs that segregate in many domestic breeds. Mean nucleotide diversity is about five SNPs per kilobase for almost every possible comparison between red jungle fowl and domestic lines, between two different domestic lines, and within domestic lines--in contrast to the notion that domestic animals are highly inbred relative to their wild ancestors. In fact, most of the SNPs originated before domestication, and there is little evidence of selective sweeps for adaptive alleles on length scales greater than 100 kilobases.
Notes:
Leo Goodstadt, Chris P Ponting (2004)  Vitamin K epoxide reductase: homology, active site and catalytic mechanism.   Trends Biochem Sci 29: 6. 289-292 Jun  
Abstract: Vitamin K epoxide reductase (VKOR) recycles reduced vitamin K, which is used subsequently as a co-factor in the gamma-carboxylation of glutamic acid residues in blood coagulation enzymes. VKORC1, a subunit of the VKOR complex, has recently been shown to possess this activity. Here, we show that VKORC1 is a member of a large family of predicted enzymes that are present in vertebrates, Drosophila, plants, bacteria and archaea. Four cysteine residues and one residue, which is either serine or threonine, are identified as likely active-site residues. In some plant and bacterial homologues the VKORC1 homologous domain is fused with domains of the thioredoxin family of oxidoreductases. These might reduce disulfide bonds of VKORC1-like enzymes as a prerequisite for their catalytic activities.
Notes:
Scott Beatson, Chris P Ponting (2004)  GIFT domains: linking eukaryotic intraflagellar transport and glycosylation to bacterial gliding.   Trends Biochem Sci 29: 8. 396-399 Aug  
Abstract: We describe GIFT [for GldG, intraflagellar transport (IFT)] domains in the flavobacterial gliding protein GldG and eukaryotic IFT-52. In eukaryotes, domain homologues are also found in the eukaryotic oligosaccharyltransferase complex and in subtilisin kexin isozyme-1 (SKI-1 or S1P). A distant evolutionary relationship to periplasmic-binding proteins hints that GIFT domains might possess oligosaccharide-binding functions.
Notes:
Hui Huang, Eitan E Winter, Huajun Wang, Keith G Weinstock, Heming Xing, Leo Goodstadt, Peter D Stenson, David N Cooper, Douglas Smith, M Mar Albà, Chris P Ponting, Kim Fechtel (2004)  Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes.   Genome Biol 5: 7. 06  
Abstract: BACKGROUND: Model organisms have contributed substantially to our understanding of the etiology of human disease as well as having assisted with the development of new treatment modalities. The availability of the human, mouse and, most recently, the rat genome sequences now permit the comprehensive investigation of the rodent orthologs of genes associated with human disease. Here, we investigate whether human disease genes differ significantly from their rodent orthologs with respect to their overall levels of conservation and their rates of evolutionary change. RESULTS: Human disease genes are unevenly distributed among human chromosomes and are highly represented (99.5%) among human-rodent ortholog sets. Differences are revealed in evolutionary conservation and selection between different categories of human disease genes. Although selection appears not to have greatly discriminated between disease and non-disease genes, synonymous substitution rates are significantly higher for disease genes. In neurological and malformation syndrome disease systems, associated genes have evolved slowly whereas genes of the immune, hematological and pulmonary disease systems have changed more rapidly. Amino-acid substitutions associated with human inherited disease occur at sites that are more highly conserved than the average; nevertheless, 15 substituting amino acids associated with human disease were identified as wild-type amino acids in the rat. Rodent orthologs of human trinucleotide repeat-expansion disease genes were found to contain substantially fewer of such repeats. Six human genes that share the same characteristics as triplet repeat-expansion disease-associated genes were identified; although four of these genes are expressed in the brain, none is currently known to be associated with disease. CONCLUSIONS: Most human disease genes have been retained in rodent genomes. Synonymous nucleotide substitutions occur at a higher rate in disease genes, a finding that may reflect increased mutation rates in the chromosomal regions in which disease genes are found. Rodent orthologs associated with neurological function exhibit the greatest evolutionary conservation; this suggests that rodent models of human neurological disease are likely to most faithfully represent human disease processes. However, with regard to neurological triplet repeat expansion-associated human disease genes, the contraction, relative to human, of rodent trinucleotide repeats suggests that rodent loci may not achieve a 'critical repeat threshold' necessary to undergo spontaneous pathological repeat expansions. The identification of six genes in this study that have multiple characteristics associated with repeat expansion-disease genes raises the possibility that not all human loci capable of facilitating neurological disease by repeat expansion have as yet been identified.
Notes:
2003
Nicola J Mulder, Rolf Apweiler, Teresa K Attwood, Amos Bairoch, Daniel Barrell, Alex Bateman, David Binns, Margaret Biswas, Paul Bradley, Peer Bork, Phillip Bucher, Richard R Copley, Emmanuel Courcelle, Ujjwal Das, Richard Durbin, Laurent Falquet, Wolfgang Fleischmann, Sam Griffiths-Jones, Daniel Haft, Nicola Harte, Nicolas Hulo, Daniel Kahn, Alexander Kanapin, Maria Krestyaninova, Rodrigo Lopez, Ivica Letunic, David Lonsdale, Ville Silventoinen, Sandra E Orchard, Marco Pagni, David Peyruc, Chris P Ponting, Jeremy D Selengut, Florence Servant, Christian J A Sigrist, Robert Vaughan, Evgueni M Zdobnov (2003)  The InterPro Database, 2003 brings increased coverage and new features.   Nucleic Acids Res 31: 1. 315-318 Jan  
Abstract: InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
Notes:
Roderic Guigo, Emmanouil T Dermitzakis, Pankaj Agarwal, Chris P Ponting, Genis Parra, Alexandre Reymond, Josep F Abril, Evan Keibler, Robert Lyle, Catherine Ucla, Stylianos E Antonarakis, Michael R Brent (2003)  Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes.   Proc Natl Acad Sci U S A 100: 3. 1140-1145 Feb  
Abstract: A primary motivation for sequencing the mouse genome was to accelerate the discovery of mammalian genes by using sequence conservation between mouse and human to identify coding exons. Achieving this goal proved challenging because of the large proportion of the mouse and human genomes that is apparently conserved but apparently does not code for protein. We developed a two-stage procedure that exploits the mouse and human genome sequences to produce a set of genes with a much higher rate of experimental verification than previously reported prediction methods. RT-PCR amplification and direct sequencing applied to an initial sample of mouse predictions that do not overlap previously known genes verified the regions flanking one intron in 139 predictions, with verification rates reaching 76%. On average, the confirmed predictions show more restricted expression patterns than the mouse orthologs of known human genes, and two-thirds lack homologs in fish genomes, demonstrating the sensitivity of this dual-genome approach to hard-to-find genes. We verified 112 previously unknown homologs of known proteins, including two homeobox proteins relevant to developmental biology, an aquaporin, and a homolog of dystrophin. We estimate that transcription and splicing can be verified for >1,000 gene predictions identified by this method that do not overlap known genes. This is likely to constitute a significant fraction of the previously unknown, multiexon mammalian genes.
Notes:
Lee Smith, Nick Van Hateren, John Willan, Rosario Romero, Gonzalo Blanco, Pam Siggers, James Walsh, Ruby Banerjee, Paul Denny, Chris Ponting, Andy Greenfield (2003)  Candidate testis-determining gene, Maestro (Mro), encodes a novel HEAT repeat protein.   Dev Dyn 227: 4. 600-607 Aug  
Abstract: Mammalian sex determination depends on the presence or absence of SRY transcripts in the embryonic gonad. Expression of SRY initiates a pathway of gene expression resulting in testis development. Here, we describe a novel gene potentially functioning in this pathway using a cDNA microarray screen for genes exhibiting sexually dimorphic expression during murine gonad development. Maestro (Mro) transcripts are first detected in the developing male gonad before overt testis differentiation. By 12.5 days postcoitus (dpc), Mro transcription is restricted to the developing testis cords and its expression is not germ cell-dependent. No expression is observed in female gonads between 10.5 and 14.5 dpc. Maestro encodes a protein containing HEAT-like repeats that localizes to the nucleolus in cell transfection assays. Maestro maps to a region of mouse chromosome 18 containing a genetic modifier of XX sex reversal. We discuss the possible function of Maestro in light of these data.
Notes:
Richard R Copley, Leo Goodstadt, Chris Ponting (2003)  Eukaryotic domain evolution inferred from genome comparisons.   Curr Opin Genet Dev 13: 6. 623-628 Dec  
Abstract: Comparative analyses of eukaryotic genomes are providing insights into the mode and tempo of domain family evolution. Gene duplication, the source of family expansion, far exceeds the rate of emergence of domains from non-coding sequence, and the rate of recruitment of domains into novel architectures. Domain families that appear to be restricted to certain lineages are likely to be the result of gene duplication, coupled with rapid sequence diversification. If such families are evidence of past adaptation, then their functions must relate to the underlying mechanism of selection: competition among organisms.
Notes:
Sebastian Maurer-Stroh, Nicholas J Dickens, Luke Hughes-Davies, Tony Kouzarides, Frank Eisenhaber, Chris P Ponting (2003)  The Tudor domain 'Royal Family': Tudor, plant Agenet, Chromo, PWWP and MBT domains.   Trends Biochem Sci 28: 2. 69-74 Feb  
Abstract: We have identified a family of 'Agenet' domains that are plant-specific homologs of Tudor domains. This finding has been extended, using a combination of sequence- and structure-dependent approaches, to show that the three beta-stranded core regions of Tudor, PWWP, chromatin-binding (Chromo) and MBT domains are homologous because they originate from a common ancestor. In addition, we have revealed pairs of tandem repeats in the fragile X mental retardation protein (FMRP) family that are also members of this Tudor domain 'Royal Family'.
Notes:
Youming Zhang, Nicholas I Leaves, Gavin G Anderson, Chris P Ponting, John Broxholme, Richard Holt, Pauline Edser, Sumit Bhattacharyya, Andy Dunham, Ian M Adcock, Louise Pulleyn, Peter J Barnes, John I Harper, Gonçalo Abecasis, Lon Cardon, Melanie White, John Burton, Lucy Matthews, Richard Mott, Mark Ross, Roger Cox, Miriam F Moffatt, William O C M Cookson (2003)  Positional cloning of a quantitative trait locus on chromosome 13q14 that influences immunoglobulin E levels and asthma.   Nat Genet 34: 2. 181-186 Jun  
Abstract: Atopic or immunoglobulin E (IgE)-mediated diseases include the common disorders of asthma, atopic dermatitis and allergic rhinitis. Chromosome 13q14 shows consistent linkage to atopy and the total serum IgE concentration. We previously identified association between total serum IgE levels and a novel 13q14 microsatellite (USAT24G1; ref. 7) and have now localized the underlying quantitative-trait locus (QTL) in a comprehensive single-nucleotide polymorphism (SNP) map. We found replicated association to IgE levels that was attributed to several alleles in a single gene, PHF11. We also found association with these variants to severe clinical asthma. The gene product (PHF11) contains two PHD zinc fingers and probably regulates transcription. Distinctive splice variants were expressed in immune tissues and cells.
Notes:
Nicholas J Dickens, Chris P Ponting (2003)  THoR: a tool for domain discovery and curation of multiple alignments.   Genome Biol 4: 8. 07  
Abstract: We describe a tool, THoR, that automatically creates and curates multiple sequence alignments representing protein domains. This exploits both PSI-BLAST and HMMER algorithms and provides an accurate and comprehensive alignment for any domain family. The entire process is designed for use via a web-browser, with simple links and cross-references to relevant information, to assist the assessment of biological significance. THoR has been benchmarked for accuracy using the SMART and pufferfish genome databases.
Notes:
Richard D Emes, Leo Goodstadt, Eitan E Winter, Chris P Ponting (2003)  Comparison of the genomes of human and mouse lays the foundation of genome zoology.   Hum Mol Genet 12: 7. 701-709 Apr  
Abstract: The extensive similarities between the genomes of human and model organisms are the foundation of much of modern biology, with model organism experimentation permitting valuable insights into biological function and the aetiology of human disease. In contrast, differences among genomes have received less attention. Yet these can be expected to govern the physiological and morphological distinctions apparent among species, especially if such differences are the result of evolutionary adaptation. A recent comparison of the draft sequences of mouse and human genomes has shed light on the selective forces that have predominated in their recent evolutionary histories. In particular, mouse-specific clusters of homologues associated with roles in reproduction, immunity and host defence appear to be under diversifying positive selective pressure, as indicated by high ratios of non-synonymous to synonymous substitution rates. These clusters are also frequently punctuated by homologous pseudogenes. They thus have experienced numerous gene death, as well as gene birth, events. These regions appear, therefore, to have borne the brunt of adaptive evolution that underlies physiological and behavioural innovation in mice. We predict that the availability of numerous animal genomes will give rise to a new field of genome zoology in which differences in animal physiology and ethology are illuminated by the study of genomic sequence variations.
Notes:
Maxine Allen, Andrea Heinzmann, Emiko Noguchi, Gonçalo Abecasis, John Broxholme, Chris P Ponting, Sumit Bhattacharyya, Jon Tinsley, Youming Zhang, Richard Holt, E Yvonne Jones, Nick Lench, Alisoun Carey, Helene Jones, Nicholas J Dickens, Claire Dimon, Rosie Nicholls, Crystal Baker, Luzheng Xue, Elizabeth Townsend, Michael Kabesch, Stephan K Weiland, David Carr, Erika von Mutius, Ian M Adcock, Peter J Barnes, G Mark Lathrop, Mark Edwards, Miriam F Moffatt, William O C M Cookson (2003)  Positional cloning of a novel gene influencing asthma from chromosome 2q14.   Nat Genet 35: 3. 258-263 Nov  
Abstract: Asthma is a common disease in children and young adults. Four separate reports have linked asthma and related phenotypes to an ill-defined interval between 2q14 and 2q32 (refs. 1-4), and two mouse genome screens have linked bronchial hyper-responsiveness to the region homologous to 2q14 (refs. 5,6). We found and replicated association between asthma and the D2S308 microsatellite, 800 kb distal to the IL1 cluster on 2q14. We sequenced the surrounding region and constructed a comprehensive, high-density, single-nucleotide polymorphism (SNP) linkage disequilibrium (LD) map. SNP association was limited to the initial exons of a solitary gene of 3.6 kb (DPP10), which extends over 1 Mb of genomic DNA. DPP10 encodes a homolog of dipeptidyl peptidases (DPPs) that cleave terminal dipeptides from cytokines and chemokines, and it presents a potential new target for asthma therapy.
Notes:
J D Vargas, B Herpers, A T McKie, S Gledhill, J McDonnell, M van den Heuvel, K E Davies, C P Ponting (2003)  Stromal cell-derived receptor 2 and cytochrome b561 are functional ferric reductases.   Biochim Biophys Acta 1651: 1-2. 116-123 Sep  
Abstract: Iron has a variety of functions in cellular organisms ranging from electron transport and DNA synthesis to adenosine triphosphate (ATP) and neurotransmitter synthesis. Failure to regulate the homeostasis of iron can lead to cognition and demyelination disorders when iron levels are deficient, and to neurodegenerative disorders when iron is in excess. In this study we show that three members of the b561 family of predicted ferric reductases, namely mouse cytochrome b561 and mouse and fly stromal cell-derived receptor 2 (SDR2), have ferric reductase activity. Given that a fourth member, duodenal cytochrome b (Dcytb), has previously been shown to be a ferric reductase, it is likely that all remaining members of this family also exhibit this activity. Furthermore, we show that the rat sdr2 message is predominantly expressed in the liver and kidney, with low expression in the duodenum. In hypotransferrinaemic (hpx) mice, sdr2 expression in the liver and kidney is reduced, suggesting that it may be regulated by iron. Moreover, we demonstrate the presence of mouse sdr2 in the choroid plexus and in the ependymal cells lining the four ventricles, through in situ hybridization analysis.
Notes:
Rasmus Hartmann-Petersen, Colin A M Semple, Chris P Ponting, Klavs B Hendil, Colin Gordon (2003)  UBA domain containing proteins in fission yeast.   Int J Biochem Cell Biol 35: 5. 629-636 May  
Abstract: The ubiquitin-proteasome pathway for intracellular proteolysis is involved in a series of cellular and molecular functions, including the degradation of bulk proteins, cell cycle control, DNA repair, antigen presentation, vesicle transport and the regulation of signal transudation pathways and transcription. Considering this variety of cell biological processes, it is puzzling that until recently only very few proteins were known to possess the ability to interact specifically with ubiquitin chains. However, several ubiquitin binding proteins have now been identified and the binding domains have been characterised on both the functional and structural levels. One example of a widespread ubiquitin binding module is the ubiquitin associated (UBA) domain. Here, we discuss the approximately 15 UBA domain containing proteins encoded in the relatively small genome of the fission yeast Schizosaccharomyces pombe. The proteins display remarkable differences in their domain organisation, indicating that these potential ubiquitin binding proteins are involved in various cell activities.
Notes:
Luke Hughes-Davies, David Huntsman, Margarida Ruas, Francois Fuks, Jacqueline Bye, Suet-Feung Chin, Jonathon Milner, Lindsay A Brown, Forrest Hsu, Blake Gilks, Torsten Nielsen, Michael Schulzer, Stephen Chia, Joseph Ragaz, Anthony Cahn, Lori Linger, Hilal Ozdag, Elena Cattaneo, E S Jordanova, Edward Schuuring, David S Yu, Ashok Venkitaraman, Bruce Ponder, Aidan Doherty, Samuel Aparicio, David Bentley, Charles Theillet, Chris P Ponting, Carlos Caldas, Tony Kouzarides (2003)  EMSY links the BRCA2 pathway to sporadic breast and ovarian cancer.   Cell 115: 5. 523-535 Nov  
Abstract: The BRCA2 gene is mutated in familial breast and ovarian cancer, and its product is implicated in DNA repair and transcriptional regulation. Here we identify a protein, EMSY, which binds BRCA2 within a region (exon 3) deleted in cancer. EMSY is capable of silencing the activation potential of BRCA2 exon 3, associates with chromatin regulators HP1beta and BS69, and localizes to sites of repair following DNA damage. EMSY maps to chromosome 11q13.5, a region known to be involved in breast and ovarian cancer. We show that the EMSY gene is amplified almost exclusively in sporadic breast cancer (13%) and higher-grade ovarian cancer (17%). In addition, EMSY amplification is associated with worse survival, particularly in node-negative breast cancer, suggesting that it may be of prognostic value. The remarkable clinical overlap between sporadic EMSY amplification and familial BRCA2 deletion implicates a BRCA2 pathway in sporadic breast and ovarian cancer.
Notes:
2002
Ivica Letunic, Leo Goodstadt, Nicholas J Dickens, Tobias Doerks, Joerg Schultz, Richard Mott, Francesca Ciccarelli, Richard R Copley, Chris P Ponting, Peer Bork (2002)  Recent improvements to the SMART domain-based sequence annotation resource.   Nucleic Acids Res 30: 1. 242-244 Jan  
Abstract: SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain. We have implemented new analysis methods and updated others. New advanced queries provide direct access to the SMART relational database using SQL. This database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats. SMART output can now be easily included in users' documents. A SMART mirror has been created at http://smart.ox.ac.uk.
Notes:
Chris P Ponting, Mike Hutton, Andrew Nyborg, Matthew Baker, Karen Jansen, Todd E Golde (2002)  Identification of a novel family of presenilin homologues.   Hum Mol Genet 11: 9. 1037-1044 May  
Abstract: Presenilin 1 and presenilin 2 are polytopic membrane proteins, whose genes are mutated in some individuals with Alzheimer's disease. Presenilins have been shown to influence limited proteolysis of amyloid beta protein precursor (APP), Notch and ErbB4, and have been proposed to be gamma-secretases that perform the terminal cleavage of APP. In this model, two conserved and apparently intramembranous aspartic acids participate in catalysis. Highly sequence-similar presenilin homologues are known in plants, invertebrates and vertebrates. In this work, we have used a combination of different sequence database search methods to identify a new family of proteins homologous to presenilins. Members of this family, which we term presenilin homologues (PSH), have significant sequence similarities to presenilins and also possess two conserved aspartic acid residues within adjacent predicted transmembrane segments. The PSH family is found throughout the eukaryotes, in fungi as well as plants and animals, and in archaea. Five PSHs are detectable in the human genome, of which three possess "protease-associated" domains that are consistent with the proposed protease function of PSs. Based on these findings, we propose that PSs and PSHs represent different sub-branches of a larger family of polytopic membrane-associated aspartyl proteases.
Notes:
Tobias Doerks, Richard R Copley, Jörg Schultz, Chris P Ponting, Peer Bork (2002)  Systematic identification of novel protein domain families associated with nuclear functions.   Genome Res 12: 1. 47-56 Jan  
Abstract: A systematic computational analysis of protein sequences containing known nuclear domains led to the identification of 28 novel domain families. This represents a 26% increase in the starting set of 107 known nuclear domain families used for the analysis. Most of the novel domains are present in all major eukaryotic lineages, but 3 are species specific. For about 500 of the 1200 proteins that contain these new domains, nuclear localization could be inferred, and for 700, additional features could be predicted. For example, we identified a new domain, likely to have a role downstream of the unfolded protein response; a nematode-specific signalling domain; and a widespread domain, likely to be a noncatalytic homolog of ubiquitin-conjugating enzymes.
Notes:
Nicola J Mulder, Rolf Apweiler, Terri K Attwood, Amos Bairoch, Alex Bateman, David Binns, Margaret Biswas, Paul Bradley, Peer Bork, Phillip Bucher, Richard Copley, Emmanuel Courcelle, Richard Durbin, Laurent Falquet, Wolfgang Fleischmann, Jerome Gouzy, Sam Griffith-Jones, Daniel Haft, Henning Hermjakob, Nicolas Hulo, Daniel Kahn, Alexander Kanapin, Maria Krestyaninova, Rodrigo Lopez, Ivica Letunic, Sandra Orchard, Marco Pagni, David Peyruc, Chris P Ponting, Florence Servant, Christian J A Sigrist (2002)  InterPro: an integrated documentation resource for protein families, domains and functional sites.   Brief Bioinform 3: 3. 225-235 Sep  
Abstract: The exponential increase in the submission of nucleotide sequences to the nucleotide sequence database by genome sequencing centres has resulted in a need for rapid, automatic methods for classification of the resulting protein sequences. There are several signature and sequence cluster-based methods for protein classification, each resource having distinct areas of optimum application owing to the differences in the underlying analysis methods. In recognition of this, InterPro was developed as an integrated documentation resource for protein families, domains and functional sites, to rationalise the complementary efforts of the individual protein signature database projects. The member databases - PRINTS, PROSITE, Pfam, ProDom, SMART and TIGRFAMs - form the InterPro core. Related signatures from each member database are unified into single InterPro entries. Each InterPro entry includes a unique accession number, functional descriptions and literature references, and links are made back to the relevant member database(s). Release 4.0 of InterPro (November 2001) contains 4,691 entries, representing 3,532 families, 1,068 domains, 74 repeats and 15 sites of post-translational modification (PTMs) encoded by different regular expressions, profiles, fingerprints and hidden Markov models (HMMs). Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (2,141,621 InterPro hits from 586,124 SWISS-PROT and TrEMBL protein sequences). The database is freely accessible for text- and sequence-based searches.
Notes:
Chris P Ponting, Robert R Russell (2002)  The natural history of protein domains.   Annu Rev Biophys Biomol Struct 31: 45-71 10  
Abstract: Genome sequencing and structural genomics projects are providing new insights into the evolutionary history ofprote in domains. As methods for sequence and structure comparison improve, more distantly related domains are shown to be homologous. Thus there is a need for domain families to be classified within a hierarchy similar to Linnaeus' Systema Naturae, the classification of species. With such a hierarchy in mind, we discuss the evolution of domains, their combination into proteins, and evidence as to the likely origin of protein domains. We also discuss when and how analysis of domains can be used to understand details of protein function. Unconventional features of domain evolution such as intragenomic competition, domain insertion, horizontal gene transfer, and convergent evolution are seen as analogs of organismal evolutionary events. These parallels illustrate how the concept of domains can be applied to provide insights into evolutionary biology.
Notes:
Robert H Waterston, Kerstin Lindblad-Toh, Ewan Birney, Jane Rogers, Josep F Abril, Pankaj Agarwal, Richa Agarwala, Rachel Ainscough, Marina Alexandersson, Peter An, Stylianos E Antonarakis, John Attwood, Robert Baertsch, Jonathon Bailey, Karen Barlow, Stephan Beck, Eric Berry, Bruce Birren, Toby Bloom, Peer Bork, Marc Botcherby, Nicolas Bray, Michael R Brent, Daniel G Brown, Stephen D Brown, Carol Bult, John Burton, Jonathan Butler, Robert D Campbell, Piero Carninci, Simon Cawley, Francesca Chiaromonte, Asif T Chinwalla, Deanna M Church, Michele Clamp, Christopher Clee, Francis S Collins, Lisa L Cook, Richard R Copley, Alan Coulson, Olivier Couronne, James Cuff, Val Curwen, Tim Cutts, Mark Daly, Robert David, Joy Davies, Kimberly D Delehaunty, Justin Deri, Emmanouil T Dermitzakis, Colin Dewey, Nicholas J Dickens, Mark Diekhans, Sheila Dodge, Inna Dubchak, Diane M Dunn, Sean R Eddy, Laura Elnitski, Richard D Emes, Pallavi Eswara, Eduardo Eyras, Adam Felsenfeld, Ginger A Fewell, Paul Flicek, Karen Foley, Wayne N Frankel, Lucinda A Fulton, Robert S Fulton, Terrence S Furey, Diane Gage, Richard A Gibbs, Gustavo Glusman, Sante Gnerre, Nick Goldman, Leo Goodstadt, Darren Grafham, Tina A Graves, Eric D Green, Simon Gregory, Roderic Guigó, Mark Guyer, Ross C Hardison, David Haussler, Yoshihide Hayashizaki, LaDeana W Hillier, Angela Hinrichs, Wratko Hlavina, Timothy Holzer, Fan Hsu, Axin Hua, Tim Hubbard, Adrienne Hunt, Ian Jackson, David B Jaffe, L Steven Johnson, Matthew Jones, Thomas A Jones, Ann Joy, Michael Kamal, Elinor K Karlsson, Donna Karolchik, Arkadiusz Kasprzyk, Jun Kawai, Evan Keibler, Cristyn Kells, W James Kent, Andrew Kirby, Diana L Kolbe, Ian Korf, Raju S Kucherlapati, Edward J Kulbokas, David Kulp, Tom Landers, J P Leger, Steven Leonard, Ivica Letunic, Rosie Levine, Jia Li, Ming Li, Christine Lloyd, Susan Lucas, Bin Ma, Donna R Maglott, Elaine R Mardis, Lucy Matthews, Evan Mauceli, John H Mayer, Megan McCarthy, W Richard McCombie, Stuart McLaren, Kirsten McLay, John D McPherson, Jim Meldrim, Beverley Meredith, Jill P Mesirov, Webb Miller, Tracie L Miner, Emmanuel Mongin, Kate T Montgomery, Michael Morgan, Richard Mott, James C Mullikin, Donna M Muzny, William E Nash, Joanne O Nelson, Michael N Nhan, Robert Nicol, Zemin Ning, Chad Nusbaum, Michael J O'Connor, Yasushi Okazaki, Karen Oliver, Emma Overton-Larty, Lior Pachter, Genís Parra, Kymberlie H Pepin, Jane Peterson, Pavel Pevzner, Robert Plumb, Craig S Pohl, Alex Poliakov, Tracy C Ponce, Chris P Ponting, Simon Potter, Michael Quail, Alexandre Reymond, Bruce A Roe, Krishna M Roskin, Edward M Rubin, Alistair G Rust, Ralph Santos, Victor Sapojnikov, Brian Schultz, Jörg Schultz, Matthias S Schwartz, Scott Schwartz, Carol Scott, Steven Seaman, Steve Searle, Ted Sharpe, Andrew Sheridan, Ratna Shownkeen, Sarah Sims, Jonathan B Singer, Guy Slater, Arian Smit, Douglas R Smith, Brian Spencer, Arne Stabenau, Nicole Stange-Thomann, Charles Sugnet, Mikita Suyama, Glenn Tesler, Johanna Thompson, David Torrents, Evanne Trevaskis, John Tromp, Catherine Ucla, Abel Ureta-Vidal, Jade P Vinson, Andrew C Von Niederhausern, Claire M Wade, Melanie Wall, Ryan J Weber, Robert B Weiss, Michael C Wendl, Anthony P West, Kris Wetterstrand, Raymond Wheeler, Simon Whelan, Jamey Wierzbowski, David Willey, Sophie Williams, Richard K Wilson, Eitan Winter, Kim C Worley, Dudley Wyman, Shan Yang, Shiaw-Pyng Yang, Evgeny M Zdobnov, Michael C Zody, Eric S Lander (2002)  Initial sequencing and comparative analysis of the mouse genome.   Nature 420: 6915. 520-562 Dec  
Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
Notes:
J D Vargas, E Culetto, C P Ponting, I Miguel-Aliaga, K E Davies, D B Sattelle (2002)  Cloning and developmental expression analysis of ltd-1, the Caenorhabditis elegans homologue of the mouse kyphoscoliosis (ky) gene.   Mech Dev 117: 1-2. 289-292 Sep  
Abstract: We have characterized the developmental expression pattern of the Caenorhabditis elegans homologue of the mouse ky gene. The Ky protein has a putative key function in muscle development and has homologues in invertebrates, fungi and a cyanobacterium. The C. elegans Ky homologue gene has been named ltd-1 for LIM and transglutaminase domains gene. The LTD-1::GFP construct is expressed in developing hypodermal cells from the twofold stage embryo through adulthood. These data define the ltd-1 gene as a novel marker for C. elegans epithelial cell development.
Notes:
Chris P Ponting (2002)  Novel domains and orthologues of eukaryotic transcription elongation factors.   Nucleic Acids Res 30: 17. 3643-3652 Sep  
Abstract: The passage of RNA polymerase II across eukaryotic genes is impeded by the nucleosome, an octamer of histones H2A, H2B, H3 and H4 dimers. More than a dozen factors in the yeast Saccharomyces cerevisiae are known to facilitate transcription elongation through chromatin. In order to better understand the evolution and function of these factors, their sequences have been compared with known protein, EST and DNA sequences. Elongator subcomplex components Elp4p and Elp6p are shown to be homologues of ATPases, yet with substitutions of amino acids critical for ATP hydrolysis, and novel orthologues of Elp5p are detectable in human, and other animal, sequences. The yeast CP complex is shown to contain a likely inactive homologue of M24 family metalloproteases in Spt16p/Cdc68p and a 2-fold repeat in Pob3p, the orthologue of mammalian SSRP1. Archaeal DNA-directed RNA polymerase subunit E" is shown to be the orthologue of eukaryotic Spt4p, and Spt5p and prokaryotic NusG are shown to contain a novel 'NGN' domain. Spt6p is found to contain a domain homologous to the YqgF family of RNases, although this domain may also lack catalytic activity. These findings imply that much of the transcription elongation machinery of eukaryotes has been acquired subsequent to their divergence from prokaryotes.
Notes:
Eitan Winter, Chris P Ponting (2002)  TRAM, LAG1 and CLN8: members of a novel family of lipid-sensing domains?   Trends Biochem Sci 27: 8. 381-383 Aug  
Abstract: A family of membrane-associated proteins related to yeast Lag1p and mammalian TRAM has been identified. The family includes the protein product of CLN8, a gene mutated in progressive epilepsy with mental retardation. Mouse CLN8 is also mutated in the mnd/mnd mouse, a model for neuronal ceroid lipofuscinoses. The identification of these homologues has potential implications for our understanding of ceramide synthesis, lipid regulation and protein translocation in the endoplasmic reticulum.
Notes:
Richard Mott, Jörg Schultz, Peer Bork, Chris P Ponting (2002)  Predicting protein cellular localization using a domain projection method.   Genome Res 12: 8. 1168-1174 Aug  
Abstract: We investigate the co-occurrence of domain families in eukaryotic proteins to predict protein cellular localization. Approximately half (300) of SMART domains form a "small-world network", linked by no more than seven degrees of separation. Projection of the domains onto two-dimensional space reveals three clusters that correspond to cellular compartments containing secreted, cytoplasmic, and nuclear proteins. The projection method takes into account the existence of "bridging" domains, that is, instances where two domains might not occur with each other but frequently co-occur with a third domain; in such circumstances the domains are neighbors in the projection. While the majority of domains are specific to a compartment ("locale"), and hence may be used to localize any protein that contains such a domain, a small subset of domains either are present in multiple locales or occur in transmembrane proteins. Comparison with previously annotated proteins shows that SMART domain data used with this approach can predict, with 92% accuracy, the localizations of 23% of eukaryotic proteins. The coverage and accuracy will increase with improvements in domain database coverage. This method is complementary to approaches that use amino-acid composition or identify sorting sequences; these methods may be combined to further enhance prediction accuracy.
Notes:
2001
E S Lander, L M Linton, B Birren, C Nusbaum, M C Zody, J Baldwin, K Devon, K Dewar, M Doyle, W FitzHugh, R Funke, D Gage, K Harris, A Heaford, J Howland, L Kann, J Lehoczky, R LeVine, P McEwan, K McKernan, J Meldrim, J P Mesirov, C Miranda, W Morris, J Naylor, C Raymond, M Rosetti, R Santos, A Sheridan, C Sougnez, N Stange-Thomann, N Stojanovic, A Subramanian, D Wyman, J Rogers, J Sulston, R Ainscough, S Beck, D Bentley, J Burton, C Clee, N Carter, A Coulson, R Deadman, P Deloukas, A Dunham, I Dunham, R Durbin, L French, D Grafham, S Gregory, T Hubbard, S Humphray, A Hunt, M Jones, C Lloyd, A McMurray, L Matthews, S Mercer, S Milne, J C Mullikin, A Mungall, R Plumb, M Ross, R Shownkeen, S Sims, R H Waterston, R K Wilson, L W Hillier, J D McPherson, M A Marra, E R Mardis, L A Fulton, A T Chinwalla, K H Pepin, W R Gish, S L Chissoe, M C Wendl, K D Delehaunty, T L Miner, A Delehaunty, J B Kramer, L L Cook, R S Fulton, D L Johnson, P J Minx, S W Clifton, T Hawkins, E Branscomb, P Predki, P Richardson, S Wenning, T Slezak, N Doggett, J F Cheng, A Olsen, S Lucas, C Elkin, E Uberbacher, M Frazier, R A Gibbs, D M Muzny, S E Scherer, J B Bouck, E J Sodergren, K C Worley, C M Rives, J H Gorrell, M L Metzker, S L Naylor, R S Kucherlapati, D L Nelson, G M Weinstock, Y Sakaki, A Fujiyama, M Hattori, T Yada, A Toyoda, T Itoh, C Kawagoe, H Watanabe, Y Totoki, T Taylor, J Weissenbach, R Heilig, W Saurin, F Artiguenave, P Brottier, T Bruls, E Pelletier, C Robert, P Wincker, D R Smith, L Doucette-Stamm, M Rubenfield, K Weinstock, H M Lee, J Dubois, A Rosenthal, M Platzer, G Nyakatura, S Taudien, A Rump, H Yang, J Yu, J Wang, G Huang, J Gu, L Hood, L Rowen, A Madan, S Qin, R W Davis, N A Federspiel, A P Abola, M J Proctor, R M Myers, J Schmutz, M Dickson, J Grimwood, D R Cox, M V Olson, R Kaul, N Shimizu, K Kawasaki, S Minoshima, G A Evans, M Athanasiou, R Schultz, B A Roe, F Chen, H Pan, J Ramser, H Lehrach, R Reinhardt, W R McCombie, M de la Bastide, N Dedhia, H Blöcker, K Hornischer, G Nordsiek, R Agarwala, L Aravind, J A Bailey, A Bateman, S Batzoglou, E Birney, P Bork, D G Brown, C B Burge, L Cerutti, H C Chen, D Church, M Clamp, R R Copley, T Doerks, S R Eddy, E E Eichler, T S Furey, J Galagan, J G Gilbert, C Harmon, Y Hayashizaki, D Haussler, H Hermjakob, K Hokamp, W Jang, L S Johnson, T A Jones, S Kasif, A Kaspryzk, S Kennedy, W J Kent, P Kitts, E V Koonin, I Korf, D Kulp, D Lancet, T M Lowe, A McLysaght, T Mikkelsen, J V Moran, N Mulder, V J Pollara, C P Ponting, G Schuler, J Schultz, G Slater, A F Smit, E Stupka, J Szustakowski, D Thierry-Mieg, J Thierry-Mieg, L Wagner, J Wallis, R Wheeler, A Williams, Y I Wolf, K H Wolfe, S P Yang, R F Yeh, F Collins, M S Guyer, J Peterson, A Felsenfeld, K A Wetterstrand, A Patrinos, M J Morgan, P de Jong, J J Catanese, K Osoegawa, H Shizuya, S Choi, Y J Chen, J Szustakowki (2001)  Initial sequencing and analysis of the human genome.   Nature 409: 6822. 860-921 Feb  
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
Notes:
R R Copley, R B Russell, C P Ponting (2001)  Sialidase-like Asp-boxes: sequence-similar structures within different protein folds.   Protein Sci 10: 2. 285-292 Feb  
Abstract: Sequence similarity is the most common measure currently used to infer homology between proteins. Typically, homologous protein domains show sequence similarity over their entire lengths. Here we identify Asp box motifs, initially found as repeats in sialidases and neuraminidases, in new structural and sequence contexts. These motifs represent significantly similar sequences, localized to beta hairpins within proteins that are otherwise different in sequence and three-dimensional structure. By performing a combined sequence- and structure-based analysis we detect Asp boxes in more than nine protein families, including bacterial ribonucleases, sulfite oxidases, reelin, netrins, some lipoprotein receptors, and a variety of glycosyl hydrolases. Although the function common to each of these proteins, if any, remains unclear, we discuss possible functions of Asp boxes on the basis of previously determined experimental results and discuss different evolutionary scenarios for the origin of Asp-box containing proteins.
Notes:
P M Clissold, C P Ponting (2001)  JmjC: cupin metalloenzyme-like domains in jumonji, hairless and phospholipase A2beta.   Trends Biochem Sci 26: 1. 7-9 Jan  
Abstract: On the basis of significant sequence similarity, we have identified JmjC domains in more than 100 eukaryotic and bacterial sequences. These include human hairless, mutated in individuals with alopecia universalis, retinoblastoma-binding protein 2 and several putative chromatin-associated proteins. JmjC domains are predicted to be metalloenzymes that adopt the cupin fold, and are candidates for enzymes that regulate chromatin remodelling.
Notes:
S E Newey, E V Howman, C P Ponting, M A Benson, R Nawrotzki, N Y Loh, K E Davies, D J Blake (2001)  Syncoilin, a novel member of the intermediate filament superfamily that interacts with alpha-dystrobrevin in skeletal muscle.   J Biol Chem 276: 9. 6645-6655 Mar  
Abstract: Dystrophin coordinates the assembly of a complex of structural and signaling proteins that are required for normal muscle function. A key component of the dystrophin protein complex is alpha-dystrobrevin, a dystrophin-associated protein whose absence results in neuromuscular junction defects and muscular dystrophy. To gain further insights into the role of alpha-dystrobrevin in skeletal muscle, we used the yeast two-hybrid system to identify a novel alpha-dystrobrevin-binding partner called syncoilin. Syncoilin is a new member of the intermediate filament superfamily and is highly expressed in skeletal and cardiac muscle. In normal skeletal muscle, syncoilin is concentrated at the neuromuscular junction, where it colocalizes and coimmunoprecipitates with alpha-dystrobrevin-1. Expression studies in mammalian cells demonstrate that, while alpha-dystrobrevin and syncoilin associate directly, overexpression of syncoilin does not result in the self-assembly of intermediate filaments. Finally, unlike many components of the dystrophin protein complex, we show that syncoilin expression is up-regulated in dystrophin-deficient muscle. These data suggest that alpha-dystrobrevin provides a link between the dystrophin protein complex and the intermediate filament network at the neuromuscular junction, which may be important for the maintenance and maturation of the synapse.
Notes:
C P Ponting (2001)  Plagiarized bacterial genes in the human book of life.   Trends Genet 17: 5. 235-237 May  
Abstract: The initial analysis of the human genome draft sequence reveals that our 'book of life' is multi-authored. A small but significant proportion of our genes owes their heritage not to antecedent eukaryotes but instead to bacteria. The publicly funded Human Genome Project study indicates that about 0.5% of all human genes were copied into the genome from bacterial sources. Detailed sequence analyses point to these 'horizontal gene transfer' events having occurred relatively recently. So how did the human 'book of life' evolve to be a chimaera, part animal and part bacterium? And what was the probable evolutionary impact of such gene plagiarism?
Notes:
G Blanco, G R Coulton, A Biggin, C Grainge, J Moss, M Barrett, A Berquin, G Maréchal, M Skynner, P van Mier, A Nikitopoulou, M Kraus, C P Ponting, R M Mason, S D Brown (2001)  The kyphoscoliosis (ky) mouse is deficient in hypertrophic responses and is caused by a mutation in a novel muscle-specific protein.   Hum Mol Genet 10: 1. 9-16 Jan  
Abstract: The ky mouse mutant exhibits a primary degenerative myopathy preceding chronic thoraco-lumbar kyphoscoliosis. The histopathology of the ky mutant suggests that Ky protein activity is crucial for normal muscle growth and function as well as the maturation and stabilization of the neuromuscular junction. Muscle hypertrophy in response to increasing demand is deficient in the ky mutant, whereas adaptive fibre type shifts take place. The ky locus has previously been localized to a small region of mouse chromosome 9 and we have now identified the gene and the mutation underlying the kyphoscoliotic mouse. The ky transcript encodes a novel protein that is detected only in skeletal muscle and heart. The identification of the ky gene will allow detailed analysis of the impact of primary myopathy on idiopathic scoliosis in mice and man.
Notes:
L Goodstadt, C P Ponting (2001)  Sequence variation and disease in the wake of the draft human genome.   Hum Mol Genet 10: 20. 2209-2214 Oct  
Abstract: The sequencing phase of the human genome project will soon be over. In its wake, repertoires of sequence polymorphisms among the human population are being sampled and a battery of functional genomics projects, from gene and protein expression studies to whole proteome interaction experiments, are generating vast quantities of data. Now that the data, or the means to generate data, are available it is the application of this information in enhancing our understanding of biology that represents the next formidable challenge. Two prominent issues should be considered. First, existing data must be analysed using the best methods available. The prediction of enzymatic activity for bestrophin, whose gene is mutated in Best macular dystrophy, is described in this review. This is an example of the experimentally testable hypotheses that can result from such detailed and exhaustive analyses. Secondly, the torrents of data from high-throughput studies will need to be made more accessible to all using web-based resources that integrate and digest complementary data types. The internet sites that showcase the human genome sequence are blazing a new trail. Ultimately, the success of genome sequencing and functional genomics will be measured not by the quantity and accuracy of raw data generated, but how rapidly they can be harnessed to span the divide between genotype and phenotype.
Notes:
C P Ponting, R Mott, P Bork, R R Copley (2001)  Novel protein domains and repeats in Drosophila melanogaster: insights into structure, function, and evolution.   Genome Res 11: 12. 1996-2008 Dec  
Abstract: Sequence database searching methods such as BLAST, are invaluable for predicting molecular function on the basis of sequence similarities among single regions of proteins. Searches of whole databases however, are not optimized to detect multiple homologous regions within a single polypeptide. Here we have used the prospero algorithm to perform self-comparisons of all predicted Drosophila melanogaster gene products. Predicted repeats, and their homologs from all species, were analyzed further to detect hitherto unappreciated evolutionary relationships. Results included the identification of novel tandem repeats in the human X-linked retinitis pigmentosa type-2 gene product, repeated segments in cystinosin, associated with a defect in cystine transport, and 'nested' homologous domains in dysferlin, whose gene is mutated in limb girdle muscular dystrophy. Novel signaling domain families were found that may regulate the microtubule-based cytoskeleton and ubiquitin-mediated proteolysis, respectively. Two families of glycosyl hydrolases were shown to contain internal repetitions that hint at their evolution via a piecemeal, modular approach. In addition, three examples of fruit fly genes were detected with tandem exons that appear to have arisen via internal duplication. These findings demonstrate how completely sequenced genomes can be exploited to further understand the relationships between molecular structure, function, and evolution.
Notes:
A N Lupas, C P Ponting, R B Russell (2001)  On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world?   J Struct Biol 134: 2-3. 191-203 May/Jun  
Abstract: This paper presents and discusses evidence suggesting how the diversity of domain folds in existence today might have evolved from peptide ancestors. We apply a structure similarity detection method to detect instances where localized regions of different protein folds contain highly similar sequences and structures. Results of performing an all-on-all comparison of known structures are described and compared with other recently published findings. The numerous instances of local sequence and structure similarities within different protein folds, together with evidence from proteins containing sequence and structure repeats, argues in favor of the evolution of modern single polypeptide domains from ancient short peptide ancestors (antecedent domain segments (ADSs)). In this model, ancient protein structures were formed by self-assembling aggregates of short polypeptides. Subsequently, and perhaps concomitantly with the evolution of higher fidelity DNA replication and repair systems, single polypeptide domains arose from the fusion of ADSs genes. Thus modern protein domains may have a polyphyletic origin.
Notes:
M Brockington, D J Blake, P Prandini, S C Brown, S Torelli, M A Benson, C P Ponting, B Estournet, N B Romero, E Mercuri, T Voit, C A Sewry, P Guicheney, F Muntoni (2001)  Mutations in the fukutin-related protein gene (FKRP) cause a form of congenital muscular dystrophy with secondary laminin alpha2 deficiency and abnormal glycosylation of alpha-dystroglycan.   Am J Hum Genet 69: 6. 1198-1209 Dec  
Abstract: The congenital muscular dystrophies (CMD) are a heterogeneous group of autosomal recessive disorders presenting in infancy with muscle weakness, contractures, and dystrophic changes on skeletal-muscle biopsy. Structural brain defects, with or without mental retardation, are additional features of several CMD syndromes. Approximately 40% of patients with CMD have a primary deficiency (MDC1A) of the laminin alpha2 chain of merosin (laminin-2) due to mutations in the LAMA2 gene. In addition, a secondary deficiency of laminin alpha2 is apparent in some CMD syndromes, including MDC1B, which is mapped to chromosome 1q42, and both muscle-eye-brain disease (MEB) and Fukuyama CMD (FCMD), two forms with severe brain involvement. The FCMD gene encodes a protein of unknown function, fukutin, though sequence analysis predicts it to be a phosphoryl-ligand transferase. Here we identify the gene for a new member of the fukutin protein family (fukutin related protein [FKRP]), mapping to human chromosome 19q13.3. We report the genomic organization of the FKRP gene and its pattern of tissue expression. Mutations in the FKRP gene have been identified in seven families with CMD characterized by disease onset in the first weeks of life and a severe phenotype with inability to walk, muscle hypertrophy, marked elevation of serum creatine kinase, and normal brain structure and function. Affected individuals had a secondary deficiency of laminin alpha2 expression. In addition, they had both a marked decrease in immunostaining of muscle alpha-dystroglycan and a reduction in its molecular weight on western blot analysis. We suggest these abnormalities of alpha-dystroglycan are caused by its defective glycosylation and are integral to the pathology seen in MDC1C.
Notes:
L Goodstadt, C P Ponting (2001)  CHROMA: consensus-based colouring of multiple alignments for publication.   Bioinformatics 17: 9. 845-846 Sep  
Abstract: CHROMA annotates multiple protein sequence alignments by consensus to produce formatted and coloured text suitable for incorporation into other documents for publication. The package is designed to be flexible and reliable, and has a simple-to-use graphical user interface running under Microsoft Windows. Both the executables and source code for CHROMA running under Windows and Linux (portable command-line only) are freely available at http://www.lg.ndirect.co.uk/chroma. Software enquiries should be directed to CHROMA@lg.ndirect.co.uk.
Notes:
M A Andrade, C Perez-Iratxeta, C P Ponting (2001)  Protein repeats: structures, functions, and evolution.   J Struct Biol 134: 2-3. 117-131 May/Jun  
Abstract: Internal repetition within proteins has been a successful strategem on multiple separate occasions throughout evolution. Such protein repeats possess regular secondary structures and form multirepeat assemblies in three dimensions of diverse sizes and functions. In general, however, internal repetition affords a protein enhanced evolutionary prospects due to an enlargement of its available binding surface area. Constraints on sequence conservation appear to be relatively lax, due to binding functions ensuing from multiple, rather than, single repeats. Considerable sequence divergence as well as the short lengths of sequence repeats mean that repeat detection can be a particularly arduous task. We also consider the conundrum of how multiple repeats, which show strong structural and functional interdependencies, ever evolved from a single repeat ancestor. In this review, we illustrate each of these points by referring to six prolific repeat types (repeats in beta-propellers and beta-trefoils and tetratricopeptide, ankyrin, armadillo/HEAT, and leucine-rich repeats) and in other less-prolific but nonetheless interesting repeats.
Notes:
R D Emes, C P Ponting (2001)  A new sequence motif linking lissencephaly, Treacher Collins and oral-facial-digital type 1 syndromes, microtubule dynamics and cell migration.   Hum Mol Genet 10: 24. 2813-2820 Nov  
Abstract: A previously unidentified sequence motif has been identified in the products of genes mutated in Miller-Dieker lissencephaly, Treacher Collins, oral-facial-digital type 1 and contiguous syndrome ocular albinism with late onset sensorineural deafness syndromes. An additional homologous motif was detected in a gene product fused to the fibroblast growth factor receptor type 1 in patients with an atypical stem cell myeloproliferative disorder. In total, over 100 eukaryotic intracellular proteins are shown to possess a LIS1 homology (LisH) motif, including several katanin p60 subunits, muskelin, tonneau, LEUNIG, Nopp140, aimless and numerous WD repeat-containing beta-propeller proteins. It is suggested that LisH motifs contribute to the regulation of microtubule dynamics, either by mediating dimerization, or else by binding cytoplasmic dynein heavy chain or microtubules directly. The predicted secondary structure of LisH motifs, and their occurrence in homologues of Gbeta beta-propeller subunits, suggests that they are analogues of Ggamma subunits, and might associate with the periphery of beta-propeller domains. The finding of LisH motifs in both treacle and Nopp140 reinforces previous observations of functional similarities between these nucleolar proteins. Uncharacterized LisH motif-containing proteins represent candidates for other diseases associated with aberrant microtubule dynamics and defects of cell migration, nucleokinesis or chromosome segregation.
Notes:
C P Ponting (2001)  Issues in predicting protein function from sequence.   Brief Bioinform 2: 1. 19-29 Mar  
Abstract: Identifying homologues, defined as genes that arose from a common evolutionary ancestor, is often a relatively straightforward task, thanks to recent advances made in estimating the statistical significance of sequence similarities found from database searches. The extent by which homologues possess similarities in function, however, is less amenable to statistical analysis. Consequently, predicting function by homology is a qualitative, rather than quantitative, process and requires particular care to be taken. This review focuses on the various approaches that have been developed to predict function from the scale of the atom to that of the organism. Similarities in homologues' functions differ considerably at each of these different scales and also vary for different domain families. It is argued that due attention should be paid to all available clues to function, including orthologue identification, conservation of particular residue types, and the co-occurrence of domains in proteins. Pitfalls in database searching methods arising from amino acid compositional bias and database size effects are also discussed.
Notes:
C P Ponting (2001)  Domain homologues of dopamine beta-hydroxylase and ferric reductase: roles for iron metabolism in neurodegenerative disorders?   Hum Mol Genet 10: 17. 1853-1858 Aug  
Abstract: One of the defining characteristics of neurodegenerative diseases, including Parkinson's, Alzheimer's and Huntington's diseases, is abnormal accumulations of iron, specifically in affected areas. Following injection of iron in rat brains, a relatively selective lesion of dopamine neurons, similar to parkinsonism, occurs. These observations indicate that Fe(II)-mediated generation of free radical species, by the Fenton reaction, might contribute to the pathoetiology of these diseases. Iron is known to possess multiple roles in the biosynthesis of catecholamines in dopaminergic neurons. These include, as Fe(II), facilitating the production of dopamine from phenylalanine by tyrosine hydroxylase, and as heme, assisting the recycling of ascorbate by cytochrome b-561 required for the generation of norepinephrine from dopamine by dopamine beta-hydroxylase. In this study, it is demonstrated that a human and mouse gene product, stromal cell-derived receptor 2, is a homologue of cytochrome b-561 and duodenal cytochrome b, and is thus predicted to be active as a ferric reductase. Moreover, this protein also contains a domain homologous to the N-terminal regulatory region of dopamine beta-hydroxylase. These findings from sequence analysis lead to a prediction that stromal cell-derived receptor 2 is a catecholamine-regulated ferric reductase active in the brain. Dysfunction of cytochrome b-561 or stromal cell-derived receptor 2, therefore, might predispose individuals to abnormal accumulation of Fe(III) and/or generation of cytotoxic free radicals as a consequence of a rapid cycling between Fe(III) and Fe(II). The hypothesis that aberrant ferric reductase activities are involved in the progression of neurodegenerative diseases should open up new avenues of research, and possibly therapy, for these devastating diseases.
Notes:
C P Ponting, N J Dickens (2001)  Genome cartography through domain annotation.   Genome Biol 2: 7.  
Abstract: The evolutionary history of eukaryotic proteins involves rapid sequence divergence, addition and deletion of domains, and fusion and fission of genes. Although the protein repertoires of distantly related species differ greatly, their domain repertoires do not. To account for the great diversity of domain contexts and an unexpected paucity of ortholog conservation, we must categorize the coding regions of completely sequenced genomes into domain families, as well as protein families.
Notes:
2000
M N Hodgkin, M R Masson, D Powner, K M Saqib, C P Ponting, M J Wakelam (2000)  Phospholipase D regulation and localisation is dependent upon a phosphatidylinositol 4,5-biphosphate-specific PH domain.   Curr Biol 10: 1. 43-46 Jan  
Abstract: The signalling pathway leading, for example, to actin cytoskeletal reorganisation, secretion or superoxide generation involves phospholipase D (PLD)-catalysed hydrolysis of phosphatidylcholine to generate phosphatidic acid, which appears to mediate the messenger functions of this pathway. Two PLD genes (PLD1 and PLD2) with similar domain structures have been doned and progress has been made in identifying the protein regulators of PLD1 activation, for example Arf and Rho family members. The activities of both PLD isoforms are dependent on phosphatidylinositol 4,5-bisphosphate (PI(4,5)P2) and our sequence analysis suggested the presence of a pleckstrin homology (PH) domain in PLD1, although its absence has also been daimed. Investigation of the inositide dependence showed that a bis-phosphorylated lipid with a vicinal pair of phosphates was required for PLD1 activity. Furthermore, PLD1 bound specifically and with high affinity to lipid surfaces containing PI(4,5)P2 independently of the substrate phosphatidylcholine, suggesting a key role for the PH domain in PLD function. Importantly, a glutathione-S-transferase (GST) fusion protein comprising GST and the PH domain of PLD1 (GST-PLD1-PH) also bound specifically to supported lipid monolayers containing PI(4,5)P2. Point mutations within the PLD1 PH domain inhibited enzyme activity, whereas deletion of the domain both inhibited enzyme activity and disrupted normal PLD1 localisation. Thus, the functional PH domain regulates PLD by mediating its interaction with polyphosphoinositide-containing membranes; this might also induce a conformational change, thereby regulating catalytic activity.
Notes:
J Schultz, R R Copley, T Doerks, C P Ponting, P Bork (2000)  SMART: a web-based tool for the study of genetically mobile domains.   Nucleic Acids Res 28: 1. 231-234 Jan  
Abstract: SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures (http://SMART.embl-heidelberg.de ). More than 400 domain families found in signalling, extra-cellular and chromatin-associated proteins are detectable. These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues. Each domain found in a non-redundant protein database as well as search parameters and taxonomic information are stored in a relational database system. User interfaces to this database allow searches for proteins containing specific combinations of domains in defined taxa.
Notes:
M A Andrade, C P Ponting, T J Gibson, P Bork (2000)  Homology-based method for identification of protein repeats using statistical significance estimates.   J Mol Biol 298: 3. 521-537 May  
Abstract: Short protein repeats, frequently with a length between 20 and 40 residues, represent a significant fraction of known proteins. Many repeats appear to possess high amino acid substitution rates and thus recognition of repeat homologues is highly problematic. Even if the presence of a certain repeat family is known, the exact locations and the number of repetitive units often cannot be determined using current methods. We have devised an iterative algorithm based on optimal and sub-optimal score distributions from profile analysis that estimates the significance of all repeats that are detected in a single sequence. This procedure allows the identification of homologues at alignment scores lower than the highest optimal alignment score for non-homologous sequences. The method has been used to investigate the occurrence of eleven families of repeats in Saccharomyces cerevisiae, Caenorhabditis elegans and Homo sapiens accounting for 1055, 2205 and 2320 repeats, respectively. For these examples, the method is both more sensitive and more selective than conventional homology search procedures. The method allowed the detection in the SwissProt database of more than 2000 previously unrecognised repeats belonging to the 11 families. In addition, the method was used to merge several repeat families that previously were supposed to be distinct, indicating common phylogenetic origins for these families.
Notes:
S E Newey, M A Benson, C P Ponting, K E Davies, D J Blake (2000)  Alternative splicing of dystrobrevin regulates the stoichiometry of syntrophin binding to the dystrophin protein complex.   Curr Biol 10: 20. 1295-1298 Oct  
Abstract: Dystrophin coordinates the assembly of a complex of structural and signalling proteins that is required for normal muscle function. A key component of the dystrophin-associated protein complex (DPC) is alpha-dystrobrevin, a dystrophin-related and -associated protein whose absence results in muscular dystrophy and neuromuscular junction defects [1,2]. The current model of the DPC predicts that dystrophin and dystrobrevin each bind a single syntrophin molecule [3]. The syntrophins are PDZ-domain-containing proteins that facilitate the recruitment of signalling proteins such as nNOS (neuronal nitric oxide synthase) to the DPC [4]. Here we show, using yeast two-hybrid analysis and biochemical binding studies, that alpha-dystrobrevin in fact contains two independent syntrophin-binding sites in tandem. The previously undescribed binding site is situated within an alternatively spliced exon of alpha-dystrobrevin, termed the variable region-3 (vr3) sequence, which is specifically expressed in skeletal and cardiac muscle [5,6]. Analysis of the syntrophin-binding region of dystrobrevin reveals a tandem pair of predicted alpha helices with significant sequence similarity. These alpha helices, each termed a syntrophin-binding motif, are also highly conserved in dystrophin and utrophin. Together these data show that there are four potential syntrophin-binding sites per dystrophin complex in skeletal muscle: two on dystrobrevin and two on dystrophin or utrophin. Furthermore, alternative splicing of dystrobrevin provides a mechanism for regulating the stoichiometry of syntrophin association with the DPC. This is likely to have important consequences for the recruitment of specific signalling molecules to the DPC and ultimately for its function.
Notes:
C P Ponting, R B Russell (2000)  Identification of distant homologues of fibroblast growth factors suggests a common ancestor for all beta-trefoil proteins.   J Mol Biol 302: 5. 1041-1047 Oct  
Abstract: Determination of the structures of fibroblast growth factors and interleukin-1s has previously revealed that they both adopt a beta-trefoil fold, similar to those found in Kunitz soybean trypsin inhibitors, ricin-like toxins, plant agglutinins and hisactophilin. These families possess distinct functions and occur in different subcellular localisations, and they appear to lack significant similarities in their sequences, ligands and modes of ligand binding. We have analysed the significance of sequence identities observed after structure alignment and provide statistical evidence that these beta-trefoil proteins are all homologues, having arisen from a common ancestor. In addition, we have explored the sequence space of all beta-trefoil proteins and have determined that the actin-binding proteins fascins, and other proteins of unknown function, are beta-trefoil family homologues. Unlike other beta-trefoil proteins, the triplicated repeats in each of the four beta-trefoil domains of fascins are significantly similar in sequence. This hints at how the beta-trefoil fold arose from the duplication of an ancestral gene encoding a homotrimeric single-repeat protein. The combined analysis of structure and sequence databases for detecting significant similarities is suggested as a highly sensitive approach to determining the common ancestry of extremely divergent homologues.
Notes:
C P Ponting (2000)  Proteins of the endoplasmic-reticulum-associated degradation pathway: domain detection and function prediction.   Biochem J 351 Pt 2: 527-535 Oct  
Abstract: Sequence database searches, using iterative-profile and Hidden-Markov-model approaches, were used to detect hitherto-undetected homologues of proteins that regulate the endoplasmic reticulum (ER)-associated degradation pathway. The translocon-associated subunit Sec63p (Sec=secretory) was shown to contain a domain of unknown function found twice in several Brr2p-like RNA helicases (Brr2=bad response to refrigeration 2). Additionally, Cue1p (Cue=coupling of ubiquitin conjugation to ER degradation), a yeast protein that recruits the ubiquitin-conjugating (UBC) enzyme Ubc7p to an ER-associated complex, was found to be one of a large family of putative scaffolding-domain-containing proteins that include the autocrine motility factor receptor and fungal Vps9p (Vps=vacuolar protein sorting). Two other yeast translocon-associated molecules, Sec72p and Hrd3p (Hrd=3-hydroxy-3-methylglutaryl-CoA reductase degradation), were shown to contain multiple tetratricopeptide-repeat-like sequences. From this observation it is suggested that Sec72p associates with a heat-shock protein, Hsp70, in a manner analogous to that known for Hop (Hsp70/Hsp90 organizing protein). Finally, the luminal portion of Ire1p (Ire=high inositol-requiring), thought to convey the sensing function of this transmembrane kinase and endoribonuclease, was shown to contain repeats similar to those in beta-propeller proteins. This finding hints at the mechanism by which Ire1p may sense extended unfolded proteins at the expense of compact folded molecules.
Notes:
J Schultz, T Doerks, C P Ponting, R R Copley, P Bork (2000)  More than 1,000 putative new human signalling proteins revealed by EST data mining.   Nat Genet 25: 2. 201-204 Jun  
Abstract: Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based searches with a domain identification protocol, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.
Notes:
S Rea, F Eisenhaber, D O'Carroll, B D Strahl, Z W Sun, M Schmid, S Opravil, K Mechtler, C P Ponting, C D Allis, T Jenuwein (2000)  Regulation of chromatin structure by site-specific histone H3 methyltransferases.   Nature 406: 6796. 593-599 Aug  
Abstract: The organization of chromatin into higher-order structures influences chromosome function and epigenetic gene regulation. Higher-order chromatin has been proposed to be nucleated by the covalent modification of histone tails and the subsequent establishment of chromosomal subdomains by non-histone modifier factors. Here we show that human SUV39H1 and murine Suv39h1--mammalian homologues of Drosophila Su(var)3-9 and of Schizosaccharomyces pombe clr4--encode histone H3-specific methyltransferases that selectively methylate lysine 9 of the amino terminus of histone H3 in vitro. We mapped the catalytic motif to the evolutionarily conserved SET domain, which requires adjacent cysteine-rich regions to confer histone methyltransferase activity. Methylation of lysine 9 interferes with phosphorylation of serine 10, but is also influenced by pre-existing modifications in the amino terminus of H3. In vivo, deregulated SUV39H1 or disrupted Suv39h activity modulate H3 serine 10 phosphorylation in native chromatin and induce aberrant mitotic divisions. Our data reveal a functional interdependence of site-specific H3 tail modifications and suggest a dynamic mechanism for the regulation of higher-order chromatin.
Notes:
1999
C P Ponting, J Schultz, F Milpetz, P Bork (1999)  SMART: identification and annotation of domains from signalling and extracellular protein sequences.   Nucleic Acids Res 27: 1. 229-232 Jan  
Abstract: SMART is a simple modular architecture research tool and database that provides domain identification and annotation on the WWW (http://coot.embl-heidelberg.de/SMART). The tool compares query sequences with its databases of domain sequences and multiple alignments whilst concurrently identifying compositionally biased regions such as signal peptide, transmembrane and coiled coil segments. Annotated and unannotated regions of the sequence can be used as queries in searches of sequence databases. The SMART alignment collection represents more than 250 signalling and extracellular domains. Each alignment is curated to assign appropriate domain boundaries and to ensure its quality. In addition, each domain is annotated extensively with respect to cellular localisation, species distribution, functional class, tertiary structure and functionally important residues.
Notes:
L Aravind, C P Ponting (1999)  The cytoplasmic helical linker domain of receptor histidine kinase and methyl-accepting proteins is common to many prokaryotic signalling proteins.   FEMS Microbiol Lett 176: 1. 111-116 Jul  
Abstract: Mutations in the cytoplasmic linker regions of receptor histidine kinase and chemoreceptor proteins have been shown previously to significantly impair receptor functions. Here we demonstrate significant sequence similarities between these regions in numerous histidine kinases, methyl-accepting proteins, adenylyl cyclases and other prokaryotic signalling proteins. It is suggested that these 'HAMP domains' possess roles of regulating the phosphorylation or methylation of homodimeric receptors by transmitting the conformational changes in periplasmic ligand-binding domains to cytoplasmic signalling kinase and methyl-acceptor domains.
Notes:
C P Ponting (1999)  Raf-like Ras/Rap-binding domains in RGS12- and still-life-like signalling proteins.   J Mol Med 77: 10. 695-698 Oct  
Abstract: Ras proteins play critical roles in regulating cell growth and differentiation, and mutated Ras genes are expressed in a variety of human cancers. Consequently, much interest has centered on the binding partners of Ras, including the Ras-binding domain (RBD) of Raf kinase. Here evidence is presented that domains homologous to the Raf RBD are present in tandem in RGS12, RGS14 and LOCO, and singly in molecules similar to mouse Tiam-1. In addition, RGS12, RGS14 and LOCO are shown to contain single "LGN motifs" that are guanine nucleotide exchange factors specific for the alpha-subunit of G proteins. These findings indicate "cross-talk" interactions between signalling pathways involving Ras and Rap and pathways involving Rho, Rac and G alpha GTPases.
Notes:
R R Copley, J Schultz, C P Ponting, P Bork (1999)  Protein families in multicellular organisms.   Curr Opin Struct Biol 9: 3. 408-415 Jun  
Abstract: The complete sequence of the nematode worm Caenorhabditis elegans contains the genetic machinery that is required to undertake the core biological processes of single cells. However, the genome also encodes proteins that are associated with multicellularity, as well as others that are lineage-specific expansions of phylogenetically widespread families and yet more that are absent in non-nematodes. Ongoing analysis is beginning to illuminate the similarities and differences among human proteins and proteins that are encoded by the genomes of the multicellular worm and the unicellular yeast, and will be essential in determining the reliability of transferring experimental data among phylogenetically distant species.
Notes:
C P Ponting, M J Pallen (1999)  beta-propeller repeats and a PDZ domain in the tricorn protease: predicted self-compartmentalisation and C-terminal polypeptide-binding strategies of substrate selection.   FEMS Microbiol Lett 179: 2. 447-451 Oct  
Abstract: Prokaryotic proteases demonstrate a variety of substrate-selection strategies that prevent uncontrolled protein degradation. Proteasomes and ClpXP-like proteases form oligomeric structures that exclude large substrates from central solvated chambers containing their active sites. Monomeric prolyl oligopeptidases have been shown to contain beta-propeller structures that similarly reduce access to their catalytic residues. By contrast, Tsp-like enzymes contain PDZ domains that are thought to specifically target C-terminal polypeptides. We have investigated the sequence of Thermoplasma acidophilum tricorn protease using recently-developed database search methods. The tricorn protease is known to associate into a 20 hexamer capsid enclosing an extremely large cavity that is 37 nm in diameter. It is unknown, however, how this enzyme selects its small oligopeptide substrates. Our results demonstrate the presence in tricorn protease of a PDZ domain and two predicted six-bladed beta-propeller domains. We suggest that the PDZ domain is involved in targeting non-polar C-terminal peptides, similar to those generated by the T. acidophilum proteasome, whereas the beta-propeller domains serve to exclude large substrates from the tricorn protease active site in a similar manner to that previously indicated for prolyl oligopeptidase.
Notes:
C P Ponting, L Aravind, J Schultz, P Bork, E V Koonin (1999)  Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer.   J Mol Biol 289: 4. 729-745 Jun  
Abstract: Phyletic distributions of eukaryotic signalling domains were studied using recently developed sensitive methods for protein sequence analysis, with an emphasis on the detection and accurate enumeration of homologues in bacteria and archaea. A major difference was found between the distributions of enzyme families that are typically found in all three divisions of cellular life and non-enzymatic domain families that are usually eukaryote-specific. Previously undetected bacterial homologues were identified for# plant pathogenesis-related proteins, Pad1, von Willebrand factor type A, src homology 3 and YWTD repeat-containing domains. Comparisons of the domain distributions in eukaryotes and prokaryotes enabled distinctions to be made between the domains originating prior to the last common ancestor of all known life forms and those apparently originating as consequences of horizontal gene transfer events. A number of transfers of signalling domains from eukaryotes to bacteria were confidently identified, in contrast to only a single case of apparent transfer from eukaryotes to archaea.
Notes:
A A Schäffer, Y I Wolf, C P Ponting, E V Koonin, L Aravind, S F Altschul (1999)  IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices.   Bioinformatics 15: 12. 1000-1011 Dec  
Abstract: MOTIVATION: Many studies have shown that database searches using position-specific score matrices (PSSMs) or profiles as queries are more effective at identifying distant protein relationships than are searches that use simple sequences as queries. One popular program for constructing a PSSM and comparing it with a database of sequences is Position-Specific Iterated BLAST (PSI-BLAST). RESULTS: This paper describes a new software package, IMPALA, designed for the complementary procedure of comparing a single query sequence with a database of PSI-BLAST-generated PSSMs. We illustrate the use of IMPALA to search a database of PSSMs for protein folds, and one for protein domains involved in signal transduction. IMPALA's sensitivity to distant biological relationships is very similar to that of PSI-BLAST. However, IMPALA employs a more refined analysis of statistical significance and, unlike PSI-BLAST, guarantees the output of the optimal local alignment by using the rigorous Smith-Waterman algorithm. Also, it is considerably faster when run with a large database of PSSMs than is BLAST or PSI-BLAST when run against the complete non-redundant protein database.
Notes:
P Mohaghegh, N R Rodrigues, N Owen, C P Ponting, T T Le, A H Burghes, K E Davies (1999)  Analysis of mutations in the tudor domain of the survival motor neuron protein SMN.   Eur J Hum Genet 7: 5. 519-525 Jul  
Abstract: Autosomal recessive childhood onset spinal muscular atrophy (SMA) is a leading cause of infant mortality caused by mutations in the survival motor neuron (SMN) gene. The SMN protein is involved in RNA processing and is localised in structures called GEMs in the nucleus. Nothing is yet understood about why mutations in SMN gene result in the selective motor neuron loss observed in patients. The SMN protein domains conserved across several species may indicate functionally significant regions. Exon 3 of SMN contains homology to a tudor domain, where a Type I SMA patient has been reported to harbour a missense mutation. We have generated missense mutants in this region of SMN and have tested their ability to form GEMs when transfected into HeLa cells. Our results show such mutant SMN proteins still localise to GEMs. Furthermore, exon 7 deleted SMN protein appears to exert a dominant negative effect on localisation of endogenous SMN protein. However, exon 3 mutant protein and exon 5 deleted protein exert no such effect.
Notes:
1998
E Schröder, A C Willis, C P Ponting (1998)  Porcine natural-killer-enhancing factor-B: oligomerisation and identification as a calpain substrate in vitro.   Biochim Biophys Acta 1383: 2. 279-291 Apr  
Abstract: Natural-killer-enhancing factor-B (NKEF-B) (monomeric mass = 21.82 kDa) was purified from the cytosol of porcine red blood cells and its identity was established by microsequencing. NKEF-B oligomerisation was investigated by gel filtration and small-angle X-ray scattering (SAXS). Native NKEF-B readily forms disulphide-linked dimers, but when fully reduced, the protein forms discrete oligomers containing 16 +/- 1 monomers. A total of 40% of the purified enzyme was deduced to be cysteinylated, which is consistent with the modification of one or both of two putative active site cysteine residues. In vitro, NKEF-B was found to be a specific substrate of mu- and m-calpains, the calcium-dependent cysteine proteases. The cleavage events were followed by SDS-PAGE and the cleavage sites pinpointed by N-terminally sequencing the resulting digestion fragments. This in vitro cleavage data provides support to the hypothesis that calpromotin (NKEF-B), an erythron peroxiredoxin involved in the regulation of calcium-dependent potassium transport across the plasma membrane, is cleaved by calpain in vivo.
Notes:
J Schultz, F Milpetz, P Bork, C P Ponting (1998)  SMART, a simple modular architecture research tool: identification of signaling domains.   Proc Natl Acad Sci U S A 95: 11. 5857-5864 May  
Abstract: Accurate multiple alignments of 86 domains that occur in signaling proteins have been constructed and used to provide a Web-based tool (SMART: simple modular architecture research tool) that allows rapid identification and annotation of signaling domain sequences. The majority of signaling proteins are multidomain in character with a considerable variety of domain combinations known. Comparison with established databases showed that 25% of our domain set could not be deduced from SwissProt and 41% could not be annotated by Pfam. SMART is able to determine the modular architectures of single sequences or genomes; application to the entire yeast genome revealed that at least 6.7% of its genes contain one or more signaling domains, approximately 350 greater than previously annotated. The process of constructing SMART predicted (i) novel domain homologues in unexpected locations such as band 4.1-homologous domains in focal adhesion kinases; (ii) previously unknown domain families, including a citron-homology domain; (iii) putative functions of domain families after identification of additional family members, for example, a ubiquitin-binding role for ubiquitin-associated domains (UBA); (iv) cellular roles for proteins, such predicted DEATH domains in netrin receptors further implicating these molecules in axonal guidance; (v) signaling domains in known disease genes such as SPRY domains in both marenostrin/pyrin and Midline 1; (vi) domains in unexpected phylogenetic contexts such as diacylglycerol kinase homologues in yeast and bacteria; and (vii) likely protein misclassifications exemplified by a predicted pleckstrin homology domain in a Candida albicans protein, previously described as an integrin.
Notes:
A Kharrat, S Millevoi, E Baraldi, C P Ponting, P Bork, A Pastore (1998)  Conformational stability studies of the pleckstrin DEP domain: definition of the domain boundaries.   Biochim Biophys Acta 1385: 1. 157-164 Jun  
Abstract: Pleckstrin is the major substrate of protein kinase C in platelets. It contains at its N- and C-termini two pleckstrin homology (PH) domains which have been proposed to mediate protein-protein and protein-lipid interactions. A new module, called DEP, has recently been identified by sequence analysis in the central region of pleckstrin. In order to study this module, several recombinant polypeptides corresponding to the DEP module and N- and C-termini extended forms have been expressed. Using circular dichroism (CD) and nuclear magnetic resonance (NMR) techniques, the domain boundaries have been determined that yield a soluble and folded pleckstrin DEP domain. This comprises 93 amino acids with an alpha/beta fold in agreement with secondary structure predictions. Stability studies indicate that the regions surrounding the DEP domain do not contribute to its stability suggesting that the phosphorylation sites at S113, T114 and S117 are in an unstructured region. Identification of the regions of pleckstrin that are folded shall facilitate determination of its structure and function.
Notes:
L Aravind, C P Ponting (1998)  Homologues of 26S proteasome subunits are regulators of transcription and translation.   Protein Sci 7: 5. 1250-1254 May  
Abstract: Single copies of an alpha-helical-rich motif are demonstrated to be present within subunits of the large multiprotein 26S proteasome and eukaryotic initiation factor-3 (eIF3) complexes, and within proteins involved in transcriptional regulation. In addition, p40 and p47 subunits of eIF3 are shown to be homologues of the proteasome subunit Mov34, and transcriptional regulators JAB1/pad1. Finally, the proteasome subunit S5a and the p44 subunit of the basal transcription factor IIH (TFIIH) are identified as homologues. The presence of homologous, and sometimes identical, proteins in contrasting functional contexts suggests that the large multisubunit complexes of the 26S proteasome, eIF3 and TFIIH perform overlapping cellular roles.
Notes:
E Schröder, C P Ponting (1998)  Evidence that peroxiredoxins are novel members of the thioredoxin fold superfamily.   Protein Sci 7: 11. 2465-2468 Nov  
Abstract: Peroxiredoxins catalyze reduction of hydrogen peroxide or alkyl peroxide, to water or the corresponding alcohol. Detailed analysis of their sequences indicates that these enzymes possess a thioredoxin (Trx)-like fold and consequently are homologues of both thioredoxin and glutathione peroxidase (GPx). Sequence- and structure-based multiple sequence alignments indicate that the peroxiredoxin active site cysteine and GPx active site selenocysteine are structurally equivalent. Homologous peroxiredoxin and GPx enzymes are predicted to catalyze equivalent reactions via similar reaction intermediates.
Notes:
F Conejero-Lara, J Parrado, A I Azuaga, C M Dobson, C P Ponting (1998)  Analysis of the interactions between streptokinase domains and human plasminogen.   Protein Sci 7: 10. 2190-2199 Oct  
Abstract: The contrasting roles of streptokinase (SK) domains in binding human Glu1-plasminogen (Plg) have been studied using a set of proteolytic fragments, each of which encompasses one or more of SK's three structural domains (A, B, C). Direct binding experiments have been performed using gel filtration chromatography and surface plasmon resonance. The latter technique has allowed estimation of association and dissociation rate constants for interactions between Plg and intact SK or SK fragments. Each of the SK fragments that contains domain B (fragments A2-B-C, A2-B, B-C, and B) binds Plg with similar affinity, at a level approximately 100- to 1,000-fold lower than intact SK. Experiments using 10 mM 6-aminohexanoic acid or 50 mM benzamidine demonstrate that either of these two lysine analogues abolishes interaction of domain B with Plg. Isolated domain C does not show detectable binding to Plg. Moreover, the additional presence of domain C within other SK fragments (B-C and A2-B-C) does not alter significantly their affinities for Plg. In addition, Plg-binding by a noncovalent complex of two SK fragments that contains domains A and B is similar to that of domain B. By contrast, species containing domain B and both domains A and C (intact SK and the two-chain complex A1 x A2-B-C) show a significantly higher affinity for Plg, which could not be completely inhibited by saturating amounts of 6-AHA. These results show that SK domain B interacts with Plg in a lysine-dependent manner and that although domains A and C do not appear independently to possess affinity for Plg, they function cooperatively to establish the additional interactions with Plg to form an efficient native-like Plg activator complex.
Notes:
K Talbot, I Miguel-Aliaga, P Mohaghegh, C P Ponting, K E Davies (1998)  Characterization of a gene encoding survival motor neuron (SMN)-related protein, a constituent of the spliceosome complex.   Hum Mol Genet 7: 13. 2149-2156 Dec  
Abstract: Mutations in the gene encoding the Survival Motor Neuron (SMN) protein are responsible for autosomal recessive proximal spinal muscular atrophy (SMA). SMN orthologues have been identified in the nematode worm Caenorhabditis elegans and the yeast Schizosaccharomyces pombe but, to date, no human paralogues have been described. Here we describe identification and characterization of an SMN-related protein (SMNrp) gene that encodes a novel protein of 239 amino acids, which has recently been identified as a constituent of the spliceosome complex and designated SPF30. Significant similarity to the SMN protein is apparent only within a central region of SMNrp that represents a tudor domain. The SMNrp/SPF30 gene has been mapped to chromosome 10q23. It is differentially expressed, with abundant levels in skeletal muscle. An exclusively nuclear localization for SMNrp in cultured cells and muscle sections was revealed using GFP fusion constructs and thereafter confirmed with a polyclonal antibody raised against SMNrp. Overexpression of SMNrp as a fusion protein in HeLa cells in culture induced dose-dependent apoptosis with positive TUNEL staining. In addition to a possible role for this protein as a pro-apoptotic factor, SMN and its related protein share significant similarities in sequence and cellular function.
Notes:
R B Russell, C P Ponting (1998)  Protein fold irregularities that hinder sequence analysis.   Curr Opin Struct Biol 8: 3. 364-371 Jun  
Abstract: The detection of homologous protein sequences frequently provides useful predictions of function and structure. Methods for homology searching have continued to improve, such that very distant evolutionary relationships can now be detected. Little attention has been paid, however, to the problems of detecting homology when domains are inserted or permuted. Here we review recent occurrences of these phenomena and discuss methods that permit their detection.
Notes:
C S Cockell, J M Marshall, K M Dawson, S A Cederholm-Williams, C P Ponting (1998)  Evidence that the conformation of unliganded human plasminogen is maintained via an intramolecular interaction between the lysine-binding site of kringle 5 and the N-terminal peptide.   Biochem J 333 ( Pt 1): 99-105 Jul  
Abstract: Human Glu-plasminogen adopts at least three conformations that provide a means for regulating the specificity of its activation in vivo. It has been proposed previously that the closed (alpha) conformation of human Glu-plasminogen is maintained through physical interaction of the kringle 5 domain and a lysine residue within the N-terminal peptide (NTP). To examine this hypothesis, site-directed mutagenesis was used to generate variant proteins containing substitutions either for aspartic acid residues within the anionic centre of the kringle 5 domain or for conserved lysine residues within the NTP. Size-exclusion HPLC and rates of plasminogen activation by urokinase-type plasminogen activator were used to determine the conformational states of these variants. Variants with substitutions within the kringle 5 lysine-binding site demonstrated extended conformations, as did variants with alanine substitutions for Lys50 and Lys62. In contrast, molecules in which NTP residues Lys20 or Lys33 were replaced were shown to adopt closed conformations. We conclude that the lysine-binding site of kringle 5 is involved in maintaining the closed conformation of human Glu-plasminogen via an interaction with the NTP, probably through Lys50 and/or Lys62. These conclusions advance the current model for the initial stages of fibrinolysis during which fibrin is thought to compete with the NTP for the kringle 5 lysine-binding site.
Notes:
1997
C P Ponting (1997)  Evidence for PDZ domains in bacteria, yeast, and plants.   Protein Sci 6: 2. 464-468 Feb  
Abstract: Several dozen signaling proteins are now known to contain 80-100 residue repeats, called PDZ (or DHR or GLGF) domains, several of which interact with the C-terminal tetrapeptide motifs X-Ser/Thr-X-Val-COO- of ion channels and/or receptors. PDZ domains have previously been noted only in mammals, flies, and worms, suggesting that the primordial PDZ domain arose relatively late in eukaryotic evolution. Here, techniques of sequence analysis-including local alignment, profile, and motif database searches-indicate that PDZ domain homologues are present in yeast, plants, and bacteria. It is suggested that two PDZ domains occur in bacterial high-temperature requirement A (htrA) and one in tail-specific protease (tsp) homologues, and that a yeast htrA homologue contains four PDZ domains. Sequence comparisons suggest that the spread of PDZ domains in these diverse organisms may have occurred via horizontal gene transfer. The known affinity of Escherichia coli tsp for C-terminal polypeptides is proposed to be mediated by its PDZ-like domain, in a similar manner to the binding of C-terminal polypeptides by animal PDZ domains.
Notes:
C P Ponting (1997)  P100, a transcriptional coactivator, is a human homologue of staphylococcal nuclease.   Protein Sci 6: 2. 459-463 Feb  
Abstract: Staphylococcus aureus nuclease (SNase) homologues, previously thought to be restricted to bacteria and archaea, are demonstrated by sequence analysis to be present also in eukaryotes. The human cellular coactivator p100 is shown to contain four repeats, each of which is a SNase homologue. Surprisingly, these repeats are unlikely to possess SNase-like activities as each lacks equivalent SNase catalytic residues, yet they may mediate p100's single-stranded DNA-binding function. Products of Corydalis sempervirens and Saccharomyces cerevisiae open reading frames are predicted to adopt the same fold and possess similar functions as SNase. Five additional hypothetical proteins of bacterial origin are also predicted to be active SNase-like nucleases, including one that appears to be C-terminally truncated in a manner analogous to an engineered active SNase variant. Conservation of Asp-19 and Asp-83 among these homologues suggests a re-evaluation of the roles of these residues in Ca(2+)-binding and/or catalysis.
Notes:
J Schultz, C P Ponting, K Hofmann, P Bork (1997)  SAM as a protein interaction domain involved in developmental regulation.   Protein Sci 6: 1. 249-253 Jan  
Abstract: More than 60 previously undetected SAM domain-containing proteins have been identified using profile searching methods. Among these are over 40 EPH-related receptor tyrosine kinases (RPTK), Drosophila bicaudal-C, a p53 from Loligo forbesi, and diacyglycerol-kinase isoform delta. This extended dataset suggests that SAM is an evolutionary conserved protein binding domain that is involved in the regulation of numerous developmental processes among diverse eukaryotes. A conserved tyrosine in the SAM sequences of the EPH related RPTKs is likely to mediate cell-cell initiated signal transduction via the binding of SH2 containing proteins to phosphotyrosine.
Notes:
C P Ponting, Y D Cai, P Bork (1997)  The breast cancer gene product TSG101: a regulator of ubiquitination?   J Mol Med 75: 7. 467-469 Jul  
Abstract: Sequence analysis is a powerful tool to obtain structural and functional information about genes and their products. Here we show that TSG101, a gene subjected to somatic mutations in breast cancer, contains an amino terminal domain that is a homologue of ubiquitin conjugating enzymes (UBCs) and not, as previously proposed, DNA-binding domains. As the UBC active site residue is replaced in the TSG101 sequence in a similar manner to several other members of the UBC family, we propose a role for TSG101 in regulating the ubiquitination of short-lived gene products.
Notes:
K Talbot, C P Ponting, A M Theodosiou, N R Rodrigues, R Surtees, R Mountford, K E Davies (1997)  Missense mutation clustering in the survival motor neuron gene: a role for a conserved tyrosine and glycine rich region of the protein in RNA metabolism?   Hum Mol Genet 6: 3. 497-500 Mar  
Abstract: The Survival Motor Neuron (SMN) gene shows deletions in the majority of patients with Spinal Muscular Atrophy (SMA), a disease of motor neuron degeneration. To date only two missense mutations have been reported in SMN in patients with SMA. The fact that no SMN-homologues have been forthcoming from data-base searching has resulted in a lack of hypotheses concerning the structural and functional consequences of these mutations. Recently SMN has been shown to interact with heterogeneous nuclear ribonucleoproteins (hnRNPs) suggesting a role in mRNA metabolism. We describe a novel missense mutation and the subsequent identification of a triplicated tyrosine-glycine (Y-G) peptide sequence at the C-terminal of SMN which encompasses each of the three predicted amino acid sequence substitutions. We have identified apparent orthologues of SMN in Caenorhabditis elegans and Schizosaccharomyces pombe. These sequences retain the highly conserved Y-G motif and provide additional support for a role of SMN in mRNA metabolism.
Notes:
P Bork, J Schultz, C P Ponting (1997)  Cytoplasmic signalling domains: the next generation.   Trends Biochem Sci 22: 8. 296-298 Aug  
Abstract: Since the late 1980s, when Src-homology SH2 and SH3 domains were identified, the repertoire of non-catalytic signalling domains has increased to number over 30. As it is expected that further regulatory domains shall be found, unravelling the complex network of their interactions remains an on-going challenge.
Notes:
L Mølgaard, C P Ponting, U Christensen (1997)  Glycosylation at Asn-289 facilitates the ligand-induced conformational changes of human Glu-plasminogen.   FEBS Lett 405: 3. 363-368 Apr  
Abstract: Glu-plasminogen exists in two major glycoforms (I and II). Glycoform I contains carbohydrate chains linked to Asn-289 and Thr-346, whereas glycoform II is glycosylated only at Thr-346. Disparities in carbohydrate content lead to differences in the important functional properties of the zymogen, e.g. the kinetics of activation. The kinetics of the large ligand-induced conformational changes of each of the Glu-plasminogen glycoforms have been studied using stopped-flow fluorescence. The results are in accordance with a conformational change governed by positive co-operative binding at two weak lysine-binding sites. Additional glycosylation at Asn-289 in Glu-plasminogen I results in a two-fold increase in the overall dissociation constant of a ligand, trans-4-aminomethyl-cyclohexane carboxylic acid. This effect stems directly from the reaction step during which the conformational changes occur. This implies a higher population of Glu-plasminogen I in the open conformation even in the absence of ligands, and thus accounts for a higher rate of activation of Glu-plasminogen I, in comparison with Glu-plasminogen II.
Notes:
C P Ponting, C Phillips, K E Davies, D J Blake (1997)  PDZ domains: targeting signalling molecules to sub-membranous sites.   Bioessays 19: 6. 469-479 Jun  
Abstract: PDZ (also called DHR or GLGF) domains are found in diverse membrane-associated proteins including members of the MAGUK family of guanylate kinase homologues, several protein phosphatases and kinases, neuronal nitric oxide synthase, and several dystrophin-associated proteins, collectively known as syntrophins. Many PDZ domain-containing proteins appear to be localised to highly specialised submembranous sites, suggesting their participation in cellular junction formation, receptor or channel clustering, and intracellular signalling events. PDZ domains of several MAGUKs interact with the C-terminal polypeptides of a subset of NMDA receptor subunits and/or with Shaker-type K+ channels. Other PDZ domains have been shown to bind similar ligands of other transmembrane receptors. Recently, the crystal structures of PDZ domains, with and without ligand, have been determined. These demonstrate the mode of ligand-binding and the structural bases for sequence conservation among diverse PDZ domains.
Notes:
1996
C P Ponting, P J Parker (1996)  Extending the C2 domain family: C2s in PKCs delta, epsilon, eta, theta, phospholipases, GAPs, and perforin.   Protein Sci 5: 1. 162-166 Jan  
Abstract: Various membrane lipid metabolites, generated by phospholipases C and D (PLCs, PLDs), are known to regulate the activities of protein kinases C (PKCs) and GTP-ase activating proteins (GAPs) in a range of cellular processes. Conventional Ca(2+)-dependent PKCs (alpha, beta I, beta II, and gamma), PLCs and various GAPs are all known to contain copies of a phospholipid-binding domain, termed C2 or CalB. Here we recognize that C2 domains are also present in "new" Ca(2+)-independent PKCs (delta, epsilon, eta, and theta), other kinases, a eukaryotic PLD, the breakpoint cluster region (BCR) gene product, and two further GAPS. Twenty-two previously unrecognized C2 domain sequences are presented, which include a single copy in the mammalian poreforming proteins, perforin.
Notes:
J Parrado, F Conejero-Lara, R A Smith, J M Marshall, C P Ponting, C M Dobson (1996)  The domain organization of streptokinase: nuclear magnetic resonance, circular dichroism, and functional characterization of proteolytic fragments.   Protein Sci 5: 4. 693-704 Apr  
Abstract: Streptococcus equisimilis streptokinase (SK) is a bacterial protein of unknown tertiary structure and domain organization that is used extensively to treat acute myocardial infarction following coronary thrombosis. Six fragments of SK were generated by limited proteolysis with chymotrypsin and purified. NMR and CD experiments have shown that the secondary and tertiary structure present in the native molecule is preserved within all fragments, except the N-terminal fragment SK7. NMR spectra demonstrate the presence in SK of three structurally autonomous domains and a less structured C-terminal "tail." Cleavage within the N-terminal domain generates an N-terminal fragment, SK7, which remains noncovalently associated with the remainder of the molecule; in isolation, SK7 adopts an unfolded conformation. The abilities of these fragments to induce active site formation within human plasminogen upon formation of their heterodimeric complex were assayed. The lowest mass SK fragment exhibiting Plg-dependent activator activity was shown to be SK27 (mass 27,000, residues 147-380), which contains both central and C-terminal domains, although this activity was reduced approximately 6,000-fold relative to that of full-length SK. The activity of a 36,000 mass fragment, SK36 (residues 64-380), which differs from SK27 in possessing a portion of the N-terminal domain, was reduced to 0.1-1.0% of that of SK. Other fragments (masses 7,000, 11,000, 16,000, 17,000, 25,000, and 26,000), representing either single domains or single domains extended by portions of other domains, were inactive. However, SK7 (residues 1-63), at a 100-fold molar excess concentration, greatly potentiated the activities of SK27 and SK36, by up to 50- and > 130-fold, respectively. These findings demonstrate that all of SK's three domains are essential for native-like SK activity. The central and C-terminal domains mediate plasminogen-binding and active site-generating functions, whereas the N-terminal domain mediates an activity-potentiating function.
Notes:
F Conejero-Lara, J Parrado, A I Azuaga, R A Smith, C P Ponting, C M Dobson (1996)  Thermal stability of the three domains of streptokinase studied by circular dichroism and nuclear magnetic resonance.   Protein Sci 5: 12. 2583-2591 Dec  
Abstract: Streptococcus equisimilis streptokinase (SK) is a single-chain protein of 414 residues that is used extensively in the clinical treatment of acute myocardial infarction due to its ability to activate human plasminogen (Plg). The mechanism by which this occurs is poorly understood due to the lack of structural details concerning both molecules and their complex. We reported recently (Parrado J et al., 1996, Protein Sci 5:693-704) that SK is composed of three structural domains (A, B, and C) with a C-terminal tail that is relatively unstructured. Here, we report thermal unfolding experiments, monitored by CD and NMR, using samples of intact SK, five isolated SK fragments, and two two-chain noncovalent complexes between complementary fragments of the protein. These experiments have allowed the unfolding processes of specific domains of the protein to be monitored and their relative stabilities and interdomain interactions to be characterized. Results demonstrate that SK can exist in a number of partially unfolded states, in which individual domains of the protein behave as single cooperative units. Domain B unfolds cooperatively in the first thermal transition at approximately 46 degrees C and its stability is largely independent of the presence of the other domains. The high-temperature transition in intact SK (at approximately 63 degrees C) corresponds to the unfolding of both domains A and C. Thermal stability of domain C is significantly increased by its isolation from the rest of the chain. By contrast, cleavage of the Phe 63-Ala 64 peptide bond within domain A causes thermal destabilization of this domain. The two resulting domain portions (A1 and A2) adopt unstructured conformations when separated. A1 binds with high affinity to all fragments that contain the A2 portion, with a concomitant restoration of the native-like fold of domain A. This result demonstrates that the mechanism whereby A1 stimulates the plasminogen activator activities of complementary SK fragments is the reconstitution of the native-like structure of domain A.
Notes:
D T Haynie, C P Ponting (1996)  The N-terminal domains of tensin and auxilin are phosphatase homologues.   Protein Sci 5: 12. 2643-2646 Dec  
Abstract: Tensin, an actin filament capping protein, and auxilin, a component of receptor-mediated endocytosis, are known to have 350 residue regions of significant sequence similarity near their N-termini (Schröder et al., 1995, Eur J Biochem 228:297-304). Here we demonstrate that these regions are homologous, not only to each other, but also to the catalytic domain of a putative protein tyrosine phosphatase (PTP) from Saccharomyces cerevisiae and to other PTPs. We propose that the PTP-like portion of the homology region of tensin and auxilin represents a distinct domain. A detailed sequence comparison indicates that the PTP-like domain in tensin is unlikely to exhibit phosphatase activity, whereas in auxilin it may possess a different phosphatase specificity from tyrosine phosphatases. It is probable that the PTP-like domains in tensin and auxilin mediate binding interactions with phosphorylated polypeptides; they may therefore represent members of a distinct class of phosphopeptide recognition domain.
Notes:
C P Ponting (1996)  Novel domains in NADPH oxidase subunits, sorting nexins, and PtdIns 3-kinases: binding partners of SH3 domains?   Protein Sci 5: 11. 2353-2357 Nov  
Abstract: Two SH3 domain-containing cytosolic components of the NADPH oxidase, p47phox and p40phox, are shown by analyses of their sequences to contain single copies of a novel class of domain, the PX (phox) domain. Homologous domains are demonstrated to be present in the Cpk class of phosphatidylinositol 3-kinase, S. cerevisiae Bem1p, and S. pombe Scd2, and a large family of human sorting nexin 1 (SNX1) homologues. The majority of these domains contains a polyproline motif, typical of SH3 domain-binding proteins. Two further findings are reported. A third NADPH oxidase subunit, p67phox, is shown to contain four tetratricopeptide repeats (TPRs) within its N-terminal RaclGTP-binding region, and a 28 residue motif in p40phox is demonstrated to be present in protein kinase C isoforms iota/lambda and zeta, and in three ZZ domain-containing proteins.
Notes:
J Parrado, P R Escuredo, F Conejero-Lara, M Kotik, C P Ponting, J A Asenjo, C M Dobson (1996)  Molecular characterisation of a thermoactive beta-1,3-glucanase from Oerskovia xanthineolytica.   Biochim Biophys Acta 1296: 2. 145-151 Sep  
Abstract: Molecular characterisation of a lytic thermoactive beta-1,3-glucanase from Oerskovia xanthineolytica LL-G109 has been performed. A molecular mass of 27 195.6 +/- 1.3 Da and an isoelectric point of 4.85 were determined by electrospray mass spectrometry and from its titration curve, respectively. Its thermoactivity profile shows it to be a heat-stable enzyme with a temperature optimum of 65 degrees C. The secondary structure content of the protein was estimated by circular dichroism to be approx. 25% alpha-helix, 7% random coil, and 68% beta-sheet and beta-turn structure. Nuclear magnetic resonance spectra confirm the high content of beta-structure. Furthermore, the presence of a compact hydrophobic core is indicated by the presence of slowly exchanging amide hydrogens and the enzyme's relatively high resistance to proteolysis. The N-terminal sequences of the intact protein and of a tryptic peptide each exhibit significant similarity to family 16 of glycosyl hydrolases whose overall fold is known to contain almost exclusively beta-sheets and surface loops. Moreover, the sequenced tryptic peptide appears to encompass residues of the Oerskovia xanthineolytica glucanase active site, since it contains a portion of the family 16 active-site motif E-[L/I/V]-D-[L/I/V]-E.
Notes:
A J Doherty, L C Serpell, C P Ponting (1996)  The helix-hairpin-helix DNA-binding motif: a structural basis for non-sequence-specific recognition of DNA.   Nucleic Acids Res 24: 13. 2488-2497 Jul  
Abstract: One, two or four copies of the 'helix-hairpin-helix' (HhH) DNA-binding motif are predicted to occur in 14 homologous families of proteins. The predicted DNA-binding function of this motif is shown to be consistent with the crystallographic structure of rat polymerase beta, complexed with DNA template-primer [Pelletier, H., Sawaya, M.R., Kumar, A., Wilson, S.H. and Kraut, J. (1994) Science 264, 1891-1903] and with biochemical data. Five crystal structures of predicted HhH motifs are currently known: two from rat pol beta and one each in endonuclease III, AlkA and the 5' nuclease domain of Taq pol I. These motifs are more structurally similar to each other than to any other structure in current databases, including helix-turn-helix motifs. The clustering of the five HhH structures separately from other bi-helical structures in searches indicates that all members of the 14 families of proteins described herein possess similar HhH structures. By analogy with the rat pol beta structure, it is suggested that each of these HhH motifs bind DNA in a non-sequence-specific manner, via the formation of hydrogen bonds between protein backbone nitrogens and DNA phosphate groups. This type of interaction contrasts with the sequence-specific interactions of other motifs, including helix-turn-helix structures. Additional evidence is provided that alphaherpesvirus virion host shutoff proteins are members of the polymerase I 5'-nuclease and FEN1-like endonuclease gene family, and that a novel HhH-containing DNA-binding domain occurs in the kinesin-like molecule nod, and in other proteins such as cnjB, emb-5 and SPT6.
Notes:
C P Ponting, I D Kerr (1996)  A novel family of phospholipase D homologues that includes phospholipid synthases and putative endonucleases: identification of duplicated repeats and potential active site residues.   Protein Sci 5: 5. 914-922 May  
Abstract: Phosphatidylcholine-specific phospholipase D (PLD) enzymes catalyze hydrolysis of phospholipid phosphodiester bonds, and also transphosphatidylation of phospholipids to acceptor alcohols. Bacterial and plant PLD enzymes have not been shown previously to be homologues or to be homologous to any other protein. Here we show, using sequence analysis methods, that bacterial and plant PLDs show significant sequence similarities both to each other, and to two other classes of phospholipid-specific enzymes, bacterial cardiolipin synthases, and eukaryotic and bacterial phosphatidylserine synthases, indicating that these enzymes form an homologous family. This family is suggested also to include two Poxviridae proteins of unknown function (p37K and protein K4), a bacterial endonuclease (nuc), an Escherichia coli putative protein (o338) containing an N-terminal domain showing similarities with helicase motifs V and VI, and a Synechocystis sp. putative protein with a C-terminal domain likely to possess a DNA-binding function. Surprisingly, four regions of sequence similarity that occur once in nuc and o338, appear twice in all other homologues, indicating that the latter molecules are bi-lobed, having evolved from an ancestor or ancestors that underwent a gene duplication and fusion event. It is suggested that, for each of these enzymes, conserved histidine, lysine, aspartic acid, and/or asparagine residues may be involved in a two-step ping pong mechanism involving an enzyme-substrate intermediate.
Notes:
1995
G Spraggon, C Phillips, U K Nowak, C P Ponting, D Saunders, C M Dobson, D I Stuart, E Y Jones (1995)  The crystal structure of the catalytic domain of human urokinase-type plasminogen activator.   Structure 3: 7. 681-691 Jul  
Abstract: BACKGROUND: Urokinase-type plasminogen activator (u-PA) promotes fibrinolysis by catalyzing the conversion of plasminogen to the active protease plasmin via the cleavage of a peptide bond. When localized to the external cell surface it contributes to tissue remodelling and cellular migration; inhibition of its activity impedes the spread of cancer. u-PA has three domains: an N-terminal receptor-binding growth factor domain, a central kringle domain and a C-terminal catalytic protease domain. The biological roles of the fibrinolytic enzymes render them therapeutic targets, however, until now no structure of the protease domain has been available. Solution of the structure of the u-PA serine protease was undertaken to provide such data. RESULTS: The crystal structure of the catalytic domain of recombinant, non-glycosylated human u-PA, complexed with the inhibitor Glu-Gly-Arg chloromethyl ketone (EGRcmk), has been determined at a nominal resolution of 2.5 A and refined to a crystallographic R-factor of 22.4% on all data (20.4% on data > 3 sigma). The enzyme has the expected topology of a trypsin-like serine protease. CONCLUSIONS: The enzyme has an S1 specificity pocket similar to that of trypsin, a restricted, less accessible, hydrophobic S2 pocket and a solvent-accessible S3 pocket which is capable of accommodating a wide range of residues. The EGRcmk inhibitor binds covalently at the active site to form a tetrahedral hemiketal structure. Although the overall structure is similar to that of homologous serine proteases, at six positions insertions of extra residues in loop regions create unique surface areas. One of these loop regions is highly mobile despite being anchored by the disulphide bridge which is characteristic of a small subset of serine proteases namely tissuetype plasminogen activator, Factor XII and Complement Factor I.
Notes:
C P Ponting (1995)  SAM: a novel motif in yeast sterile and Drosophila polyhomeotic proteins.   Protein Sci 4: 9. 1928-1930 Sep  
Abstract: Single copies of an approximately 65-70 residue domain are shown to be present in the sequences of 14 eukaryotic proteins, including yeast byr2, STE11, ste4, and STE50, which are essential participants in sexual differentiation. This domain, named SAM (sterile alpha motif), appears to participate in other developmental processes because it is also present in Drosophila polyhomeotic gene product and related homologues, which are thought to regulate determination of segmental specification in early embryogenesis. Its appearance in byr2 and STE11, which are MEK kinases, and in proteins containing pleckstrain homology, src homology 3, and discs-large homologous region domains, suggests possible participation in signal transduction pathways.
Notes:
1994
C P Ponting (1994)  Acid sphingomyelinase possesses a domain homologous to its activator proteins: saposins B and D.   Protein Sci 3: 2. 359-361 Feb  
Abstract: An N-terminal region of the acid sphingomyelinase sequence (residues 89-165) is shown to be homologous to saposin-type sequences. By analogy with the known functions of saposins, this sphingomyelinase saposin-type domain may possess lipid-binding and/or sphingomyelinase-activator properties. This finding may prove to be important in the understanding of Niemann-Pick disease, which results from sphingomyelinase deficiency.
Notes:
K M Dawson, J M Marshall, R H Raper, R J Gilbert, C P Ponting (1994)  Substitution of arginine 719 for glutamic acid in human plasminogen substantially reduces its affinity for streptokinase.   Biochemistry 33: 40. 12042-12047 Oct  
Abstract: In isolation human plasminogen possesses no enzymatic activity, yet upon formation of an equimolar complex with the bacterial protein streptokinase, it acquires a plasminogen activator function. The region(s) of plasminogen and of streptokinase which mediate complex formation has (have) not been previously published. Here it is reported that a single-residue substitution (Arg719-->Glu) in the serine protease domain of full-length Glu-plasminogen substantially reduces its affinity for streptokinase. The plasminogen variant displays no other significant differences from the wild-type molecule with respect to activation by two-chain urokinase-type plasminogen activator, recognition by monoclonal antibodies, or ability to undergo conformational change. It is concluded that Arg719 in human plasminogen is an important determinant of the streptokinase binding site, although further sites are likely to contribute both to the affinity of plasminogen for streptokinase and to mechanisms by which the active site is formed within the complex.
Notes:
J M Marshall, A J Brown, C P Ponting (1994)  Conformational studies of human plasminogen and plasminogen fragments: evidence for a novel third conformation of plasminogen.   Biochemistry 33: 12. 3599-3606 Mar  
Abstract: The conformations of Glu-plasminogen and defined proteolytic fragments, in the presence and absence of 6-aminohexanoic acid (6-AHA), trans-4-(aminomethyl)cyclohexanecarboxylic acid (t-AMCHA), and benzamidine, were studied using three methods: size-exclusion high-performance liquid chromatography (SE-HPLC), small-angle X-ray scattering (SAXS), and dynamic laser light scattering (DLLS). The well-documented conformational change of Glu-plasminogen with 6-AHA or t-AMCHA was measured as a decrease in molecular elution time by SE-HPLC (8.93 +/- 0.01 to 8.32 +/- 0.01 min) and increases in radius of gyration (30.7 +/- 0.1 to 49.8 +/- 0.3 A) and Stokes radius (40.6 +/- 0.3 to 48.5 +/- 0.3 A) by SAXS and DLLS, respectively. The addition of benzamidine to Glu-plasminogen resulted in a conformation (radius of gyration 41.0 +/- 0.4 A and Stokes radius 46.6 +/- 0.3 A) distinct from that in the presence of 6-AHA. 6-AHA, but not benzamidine, induced significant conformational changes in Lys-plasminogen and kringles 1 + 2 + 3 + 4 + 5. We conclude that Glu-plasminogen adopts three distinct conformations involving two intramolecular interactions: one mediated by regions of the NH2-terminal peptide and kringle 5, competed for by 6-AHA or benzamidine, and the other possibly between kringles 3 and 4, competed for by 6-AHA but not benzamidine.
Notes:
1993
G Opdenakker, P M Rudd, C P Ponting, R A Dwek (1993)  Concepts and principles of glycobiology.   FASEB J 7: 14. 1330-1337 Nov  
Abstract: In biological systems oligosaccharides are normally conjugated to proteins or lipids. The heterogeneity and branching of oligosaccharides allow glycoconjugates to display a further level of structural and functional diversity compared with linear proteins and nucleic acids or with lipids. This review summarizes some general principles that are emerging from the new field of glycobiology which, by addressing the molecular interactions of glycoconjugates in biological systems, spans the classical physicochemical, biological, and biochemical sciences. We discuss the genesis of glycoforms, the functional roles for glycosylation, and some general aspects of structure/function relationships with reference to N-glycosylated animal glycoproteins including the enzymes ribonuclease and tissue plasminogen activator, IgG, the family of C-type lectins, and receptor ligands.
Notes:
G Spraggon, D Stuart, C Ponting, C Finnis, D Sleep, Y Jones (1993)  Crystallization and X-ray diffraction study of recombinant platelet-derived endothelial cell growth factor.   J Mol Biol 234: 3. 879-880 Dec  
Abstract: Crystals of recombinant platelet-derived endothelial cell growth factor (PD-ECGF) were obtained by the hanging drop vapour diffusion technique. The crystals belong to the space group P2(1)2(1)2(1) with unit cell dimensions a = 63.7 A, b = 70.4 A, c = 219.6, alpha = beta = gamma = 90 degrees, and probably contain a single dimer in the asymmetric unit. Diffraction to a minimum Bragg spacing of 3.5 A has been obtained using a synchrotron X-ray source.
Notes:
1992
C P Ponting, J M Marshall, S A Cederholm-Williams (1992)  Plasminogen: a structural review.   Blood Coagul Fibrinolysis 3: 5. 605-614 Oct  
Abstract: Plasminogen is the zymogen form of plasmin, a broad specificity serine protease whose activity contributes to a variety of normal and pathological conditions, including intravascular thrombolysis and extracellular proteolysis. Plasminogen contains seven structural units or 'domains', each of which confer specific properties on the molecule. The kringle domains possess fibrin-binding functions and, together with the N-terminal peptide, regulate the ability of plasminogen to adopt at least three dissimilar conformations. These conformational forms influence the rate of formation, following activation by plasminogen activators, of the plasmin active site within its C-terminal serine protease domain. Structural and functional analogies are postulated between these plasminogen structures and the conformations of other proteins related by sequence homology.
Notes:
C P Ponting, S K Holland, S A Cederholm-Williams, J M Marshall, A J Brown, G Spraggon, C C Blake (1992)  The compact domain conformation of human Glu-plasminogen in solution.   Biochim Biophys Acta 1159: 2. 155-161 Sep  
Abstract: A complete understanding of the accelerating mechanisms of plasminogen activation and fibrinolysis necessarily requires structural information on the conformational forms of plasminogen. Given the absence of high-resolution structural data on plasminogen the use of lower resolution approaches has been adopted. Two such approaches have previously indicated a compact conformation of Glu-plasminogen (Tranqui, L., Prandini, M., and Chapel, A. (1979) Biol. Cellulaire, 34, 39-42; Bányai, L. and Patthy, L. (1985) Biochim. Biophys. Acta, 832, 224-227) whereas a third has suggested a fairly extended conformation (Mangel, W., Lin, B. and Ramakrishnan, V. (1990) Science, 248, 69-73). Native Glu-plasminogen has been investigated using small-angle X-ray scattering (SAXS) experiments. It is concluded that this molecule in solution is compact (radius of gyration, RG 3.05 +/- 0.02 nm and maximum intramolecular distance, Im 9.1 +/- 0.3 nm) and that the data are consistent with the right-handed spiral structure observed using electron microscopy by Tranqui et al. (1979). A spiral structure of native plasminogen would have important implications for the conformational response of plasminogen to fibrin and concomitant stimulation of plasminogen activation.
Notes:
1991
1989
Powered by PublicationsList.org.