Abstract: Diversities in human physiology have been partially shaped by adaptation to natural environments and changing cultures. Recent genomic analyses have revealed single nucleotide polymorphisms (SNPs) that are associated with adaptations in immune responses, obvious changes in human body forms, or adaptations to extreme climates in select human populations. Here, we report that the human GIP locus was differentially selected among human populations based on the analysis of a nonsynonymous SNP (rs2291725). Comparative and functional analyses showed that the human GIP gene encodes a cryptic glucose-dependent insulinotropic polypeptide (GIP) isoform (GIP55S or GIP55G) that encompasses the SNP and is resistant to serum degradation relative to the known mature GIP peptide. Importantly, we found that GIP55G, which is encoded by the derived allele, exhibits a higher bioactivity compared with GIP55S, which is derived from the ancestral allele. Haplotype structure analysis suggests that the derived allele at rs2291725 arose to dominance in East Asians â¼8100 yr ago due to positive selection. The combined results suggested that rs2291725 represents a functional mutation and may contribute to the population genetics observation. Given that GIP signaling plays a critical role in homeostasis regulation at both the enteroinsular and enteroadipocyte axes, our study highlights the importance of understanding adaptations in energy-balance regulation in the face of the emerging diabetes and obesity epidemics.
Abstract: There has been growing evidence for extensive diversity of alternative splicing in human populations. Genetic variants within the 5' splice site can cause splicing differences among human individuals and constitute an important class of human disease mutations. In this study, we explored whether natural variations of splicing could reveal important signals of 5' splice site recognition. In seven lymphoblastoid cell lines of Asian, European and African ancestry, we identified 1174 single nucleotide polymorphisms (SNPs) within the consensus 5' splice site. We selected 129 SNPs predicted to significantly alter the splice site activity, and quantitatively examined their splicing impact in the seven individuals. Surprisingly, outside of the essential GT dinucleotide position, only â¼14% of the tested SNPs altered splicing. Bioinformatic and minigene analyses identified signals that could modify the impact of 5' splice site polymorphisms, most notably a strong 3' splice site and the presence of intronic motifs downstream of the 5' splice site. Strikingly, we found that the poly-G run, a known intronic splicing enhancer, was the most significantly enriched motif downstream of exons unaffected by 5' splice site SNPs. In TRIM62, the upstream 3' splice site and downstream intronic poly-G runs functioned redundantly to protect an exon from its 5' splice site polymorphism. Collectively, our study reveals widespread context-dependent robustness to 5' splice site polymorphisms in human transcriptomes. Consequently, certain exons are more susceptible to 5' splice site mutations. Additionally, our work demonstrates that genetic diversity of alternative splicing can provide significant insights into the splicing code of mammalian cells.
Abstract: The Alu element has been a major source of new exons during primate evolution. Thousands of human genes contain spliced exons derived from Alu elements. However, identifying Alu exons that have acquired genuine biological functions remains a major challenge. We investigated the creation and establishment of Alu exons in human genes, using transcriptome profiles of human tissues generated by high-throughput RNA sequencing (RNA-Seq) combined with extensive RT-PCR analysis. More than 25% of Alu exons analyzed by RNA-Seq have estimated transcript inclusion levels of at least 50% in the human cerebellum, indicating widespread establishment of Alu exons in human genes. Genes encoding zinc finger transcription factors have significantly higher levels of Alu exonization. Importantly, Alu exons with high splicing activities are strongly enriched in the 5'-UTR, and two-thirds (10/15) of 5'-UTR Alu exons tested by luciferase reporter assays significantly alter mRNA translational efficiency. Mutational analysis reveals the specific molecular mechanisms by which newly created 5'-UTR Alu exons modulate translational efficiency, such as the creation or elongation of upstream ORFs that repress the translation of the primary ORFs. This study presents genomic evidence that a major functional consequence of Alu exonization is the lineage-specific evolution of translational regulation. Moreover, the preferential creation and establishment of Alu exons in zinc finger genes suggest that Alu exonization may have globally affected the evolution of primate and human transcriptomes by regulating the protein production of master transcriptional regulators in specific lineages.
Abstract: Although recent studies have shown that human genomes contain hundreds of loci that exhibit signatures of positive selection, variants that are associated with adaptation in energy-balance regulation remain elusive. We reasoned that the difficulty in identifying such variants could be due to heterogeneity in selection pressure and that an integrative approach that incorporated experiment-based evidence and population genetics-based statistical judgments would be needed to reveal important metabolic modifiers in humans.
Abstract: Genes that underlie human disease are important subjects of systems biology research. In the present study, we demonstrate that Mendelian and complex disease genes have distinct and consistent protein-protein interaction (PPI) properties. We show that five different network properties can be reduced to two independent metrics when applied to the human PPI network. These two metrics largely coincide with the degree (number of connections) and the clustering coefficient (the number of connections among the neighbors of a particular protein). We demonstrate that disease genes have simultaneously unusually high degree and unusually low clustering coefficient. Such genes can be described as brokers in that they connect many proteins that would not be connected otherwise. We show that these results are robust to the effect of gene age and inspection bias variation. Notably, genes identified in genome-wide association study (GWAS) have network patterns that are almost indistinguishable from the network patterns of nondisease genes and significantly different from the network patterns of complex disease genes identified through non-GWAS means. This suggests either that GWAS focused on a distinct set of diseases associated with an unusual set of genes or that mapping of GWAS-identified single nucleotide polymorphisms onto the causally affected neighboring genes is error prone.
Abstract: Out-of-frame stop codons (OSCs) occur naturally in coding sequences of all organisms, providing a mechanism of early termination of translation in incorrect reading frame so that the metabolic cost associated with frameshift events can be reduced. Given such a functional significance, we expect statistically overrepresented OSCs in coding sequences as a result of a widespread selection. Accordingly, we examined available prokaryotic genomes to look for evidence of this selection.
Abstract: Genes in the same organism vary in the time since their evolutionary origin. Without horizontal gene transfer, young genes are necessarily restricted to a few closely related species, whereas old genes can be broadly distributed across the phylogeny. It has been shown that young genes evolve faster than old genes; however, the evolutionary forces responsible for this pattern remain obscure. Here, we classify human-chimp protein-coding genes into different age classes, according to the breath of their phylogenetic distribution. We estimate the strength of purifying selection and the rate of adaptive selection for genes in different age classes. We find that older genes carry fewer and less frequent nonsynonymous single-nucleotide polymorphisms than younger genes suggesting that older genes experience a stronger purifying selection at the protein-coding level. We infer the distribution of fitness effects of new deleterious mutations and find that older genes have proportionally more slightly deleterious mutations and fewer nearly neutral mutations than younger genes. To investigate the role of adaptive selection of genes in different age classes, we determine the selection coefficient (gamma = 2N(e)s) of genes using the MKPRF approach and estimate the ratio of the rate of adaptive nonsynonymous substitution to synonymous substitution (omega(A)) using the DoFE method. Although the proportion of positively selected genes (gamma > 0) is significantly higher in younger genes, we find no correlation between omega(A) and gene age. Collectively, these results provide strong evidence that younger genes are subject to weaker purifying selection and more tenuous evidence that they also undergo adaptive evolution more frequently.
Abstract: Despite the unique phenotypic properties and clinical importance of Penicillium marneffei, the polyketide synthase genes in its genome have never been characterized. Twenty-three putative polyketide synthase genes and two putative polyketide synthase nonribosomal peptide-synthase hybrid genes were identified in the P. marneffei genome, a diversity much higher than found in other pathogenic thermal dimorphic fungi, such as Histoplasma capsulatum (one polyketide synthase gene) and Coccidioides immitis (10 polyketide synthase genes). These genes were evenly distributed on the phylogenetic tree with polyketide synthase genes of Aspergillus and other fungi, indicating that the high diversity was not a result of lineage-specific gene expansion through recent gene duplication. The melanin-biosynthesis gene cluster had gene order and orientations identical to those in the Talaromyces stipitatus (a teleomorph of Penicillium emmonsii) genome. Phylogenetically, all six genes of the melanin-biosynthesis gene cluster in P. marneffei were also most closely related to those in T. stipitatus, with high bootstrap supports. The polyketide synthase gene of the melanin-biosynthesis gene cluster (alb1) in P. marneffei was knocked down, which was accompanied by loss of melanin pigment production and reduced ornamentation in conidia. The survival of mice challenged with the alb1 knockdown mutant was significantly better than those challenged with wild-type P. marneffei (P < 0.005). The sterilizing doses of hydrogen peroxide, leading to a 50% reduction in survival of conidia, were 11 min for wild-type P. marneffei and 6 min for the alb1 knockdown mutant of P. marneffei, implying that the melanin-biosynthesis gene cluster contributed to virulence through decreased susceptibility to killing by hydrogen peroxide.
Abstract: A number of studies have showed that recently created genes differ from the genes created in deep evolutionary past in many aspects. Here, we determined the age of emergence and propensity for gene loss (PGL) of all human protein-coding genes and compared disease genes with non-disease genes in terms of their evolutionary rate, strength of purifying selection, mRNA expression, and genetic redundancy. The older and the less prone to loss, non-disease genes have been evolving 1.5- to 3-fold slower between humans and chimps than young non-disease genes, whereas Mendelian disease genes have been evolving very slowly regardless of their ages and PGL. Complex disease genes showed an intermediate pattern. Disease genes also have higher mRNA expression heterogeneity across multiple tissues than non-disease genes regardless of age and PGL. Young and middle-aged disease genes have fewer similar paralogs as non-disease genes of the same age. We reasoned that genes were more likely to be involved in human disease if they were under a strong functional constraint, expressed heterogeneously across tissues, and lacked genetic redundancy. Young human genes that have been evolving under strong constraint between humans and chimps might also be enriched for genes that encode important primate or even human-specific functions.
Abstract: Laribacter hongkongensis is a newly discovered Gram-negative bacillus of the Neisseriaceae family associated with freshwater fish-borne gastroenteritis and traveler's diarrhea. The complete genome sequence of L. hongkongensis HLHK9, recovered from an immunocompetent patient with severe gastroenteritis, consists of a 3,169-kb chromosome with G+C content of 62.35%. Genome analysis reveals different mechanisms potentially important for its adaptation to diverse habitats of human and freshwater fish intestines and freshwater environments. The gene contents support its phenotypic properties and suggest that amino acids and fatty acids can be used as carbon sources. The extensive variety of transporters, including multidrug efflux and heavy metal transporters as well as genes involved in chemotaxis, may enable L. hongkongensis to survive in different environmental niches. Genes encoding urease, bile salts efflux pump, adhesin, catalase, superoxide dismutase, and other putative virulence factors-such as hemolysins, RTX toxins, patatin-like proteins, phospholipase A1, and collagenases-are present. Proteomes of L. hongkongensis HLHK9 cultured at 37 degrees C (human body temperature) and 20 degrees C (freshwater habitat temperature) showed differential gene expression, including two homologous copies of argB, argB-20, and argB-37, which encode two isoenzymes of N-acetyl-L-glutamate kinase (NAGK)-NAGK-20 and NAGK-37-in the arginine biosynthesis pathway. NAGK-20 showed higher expression at 20 degrees C, whereas NAGK-37 showed higher expression at 37 degrees C. NAGK-20 also had a lower optimal temperature for enzymatic activities and was inhibited by arginine probably as negative-feedback control. Similar duplicated copies of argB are also observed in bacteria from hot springs such as Thermus thermophilus, Deinococcus geothermalis, Deinococcus radiodurans, and Roseiflexus castenholzii, suggesting that similar mechanisms for temperature adaptation may be employed by other bacteria. Genome and proteome analysis of L. hongkongensis revealed novel mechanisms for adaptations to survival at different temperatures and habitats.
Abstract: Much effort and interest have focused on assessing the importance of natural selection, particularly positive natural selection, in shaping the human genome. Although scans for positive selection have identified candidate loci that may be associated with positive selection in humans, such scans do not indicate whether adaptation is frequent in general in humans. Studies based on the reasoning of the MacDonald-Kreitman test, which, in principle, can be used to evaluate the extent of positive selection, suggested that adaptation is detectable in the human genome but that it is less common than in Drosophila or Escherichia coli. Both positive and purifying natural selection at functional sites should affect levels and patterns of polymorphism at linked nonfunctional sites. Here, we search for these effects by analyzing patterns of neutral polymorphism in humans in relation to the rates of recombination, functional density, and functional divergence with chimpanzees. We find that the levels of neutral polymorphism are lower in the regions of lower recombination and in the regions of higher functional density or divergence. These correlations persist after controlling for the variation in GC content, density of simple repeats, selective constraint, mutation rate, and depth of sequencing coverage. We argue that these results are most plausibly explained by the effects of natural selection at functional sites -- either recurrent selective sweeps or background selection -- on the levels of linked neutral polymorphism. Natural selection at both coding and regulatory sites appears to affect linked neutral polymorphism, reducing neutral polymorphism by 6% genome-wide and by 11% in the gene-rich half of the human genome. These findings suggest that the effects of natural selection at linked sites cannot be ignored in the study of neutral human polymorphism.
Abstract: Assessing genetic diversity within populations is vital for understanding the nature of evolutionary processes at the molecular level. PGEToolbox is a Matlab-based open-sourced software package for data analysis in population genetics. The main features of this software are as follows: 1) capability for handling both DNA sequence polymorphisms and single nucleotide polymorphisms (SNPs), which include genotype and haplotype data; 2) exhaustive population genetic analyses and neutrality tests based on the coalescent theory; 3) extendibility and scalability for complex and large genome-wide datasets; 4) simple yet effective graphic user interfaces and sophisticated visualization of data and results. For academic uses, PGEToolbox is available free of charge at http://bioinformatics.org/pgetoolbox.
Abstract: Exonization of Alu elements is a major mechanism for birth of new exons in primate genomes. Prior analyses of expressed sequence tags show that almost all Alu-derived exons are alternatively spliced, and the vast majority of these exons have low transcript inclusion levels. In this work, we provide genomic and experimental evidence for diverse splicing patterns of exonized Alu elements in human tissues. Using Exon array data of 330 Alu-derived exons in 11 human tissues and detailed RT-PCR analyses of 38 exons, we show that some Alu-derived exons are constitutively spliced in a broad range of human tissues, and some display strong tissue-specific switch in their transcript inclusion levels. Most of such exons are derived from ancient Alu elements in the genome. In SEPN1, mutations of which are linked to a form of congenital muscular dystrophy, the muscle-specific inclusion of an Alu-derived exon may be important for regulating SEPN1 activity in muscle. Realtime qPCR analysis of this SEPN1 exon in macaque and chimpanzee tissues indicates human-specific increase in its transcript inclusion level and muscle specificity after the divergence of humans and chimpanzees. Our results imply that some Alu exonization events may have acquired adaptive benefits during the evolution of primate transcriptomes.
Abstract: MBEToolbox is an extensible MATLAB-based software package for analysis of DNA and protein sequences. MBEToolbox version 2.0 includes enhanced functions for phylogenetic analyses by the maximum likelihood method. For example, it is capable of estimating the synonymous and nonsynonymous substitution rates using a novel or several known codon substitution models. MBEToolbox 2.0 introduces new functions for estimating site-specific evolutionary rates by using a maximum likelihood method or an empirical Bayesian method. It also incorporates several different methods for recombination detection. Multi-platform versions of the software are freely available at http://www.bioinformatics.org/mbetoolbox/.
Abstract: The evolutionary origin of "orphan" genes, genes that lack sequence similarity to any known gene, remains a mystery. One suggestion has been that most orphan genes evolve rapidly so that similarity to other genes cannot be traced after a certain evolutionary distance. This can be tested by examining the divergence rates of genes with different degrees of lineage specificity. Here the lineage specificity (LS) of a gene describes the phylogenetic distribution of that gene's orthologues in related species. Highly lineage-specific genes will be distributed in fewer species in a phylogeny. In this study, we have used the complete genomes of seven ascomycotan fungi and two animals to define several levels of LS, such as Eukaryotes-core, Ascomycota-core, Euascomycetes-specific, Hemiascomycetes-specific, Aspergillus-specific, and Saccharomyces-specific. We compare the rates of gene evolution in groups of higher LS to those in groups with lower LS. Molecular evolutionary analyses indicate an increase in nonsynonymous nucleotide substitution rates in genes with higher LS. Several analyses suggest that LS is correlated with the evolutionary rate of the gene. This correlation is stronger than those of a number of other factors that have been proposed as predictors of a gene's evolutionary rate, including the expression level of genes, gene essentiality or dispensability, and the number of protein-protein interactions. The accelerated evolutionary rates of genes with higher LS may reflect the influence of selection and adaptive divergence during the emergence of orphan genes. These analyses suggest that accelerated rates of gene evolution may be responsible for the emergence of apparently orphan genes.
Abstract: All meiotic genes (except HOP1) and genes encoding putative pheromone processing enzymes, pheromone receptors and pheromone response pathways proteins in Aspergillus fumigatus and Aspergillus nidulans and a putative MAT-1 alpha box mating-type gene were present in the Penicillium marneffei genome. A putative MAT-2 high-mobility group mating-type gene was amplified from a MAT-1 alpha box mating-type gene-negative P. marneffei strain. Among 37 P. marneffei patient strains, MAT-1 alpha box and MAT-2 high-mobility group mating-type genes were present in 23 and 14 isolates, respectively. We speculate that P. marneffei can potentially be a heterothallic fungus that does not switch mating type.
Abstract: Despite extensive laboratory investigations in patients with respiratory tract infections, no microbiological cause can be identified in a significant proportion of patients. In the past 3 years, several novel respiratory viruses, including human metapneumovirus, severe acute respiratory syndrome (SARS) coronavirus (SARS-CoV), and human coronavirus NL63, were discovered. Here we report the discovery of another novel coronavirus, coronavirus HKU1 (CoV-HKU1), from a 71-year-old man with pneumonia who had just returned from Shenzhen, China. Quantitative reverse transcription-PCR showed that the amount of CoV-HKU1 RNA was 8.5 to 9.6 x 10(6) copies per ml in his nasopharyngeal aspirates (NPAs) during the first week of the illness and dropped progressively to undetectable levels in subsequent weeks. He developed increasing serum levels of specific antibodies against the recombinant nucleocapsid protein of CoV-HKU1, with immunoglobulin M (IgM) titers of 1:20, 1:40, and 1:80 and IgG titers of <1:1,000, 1:2,000, and 1:8,000 in the first, second and fourth weeks of the illness, respectively. Isolation of the virus by using various cell lines, mixed neuron-glia culture, and intracerebral inoculation of suckling mice was unsuccessful. The complete genome sequence of CoV-HKU1 is a 29,926-nucleotide, polyadenylated RNA, with G+C content of 32%, the lowest among all known coronaviruses with available genome sequence. Phylogenetic analysis reveals that CoV-HKU1 is a new group 2 coronavirus. Screening of 400 NPAs, negative for SARS-CoV, from patients with respiratory illness during the SARS period identified the presence of CoV-HKU1 RNA in an additional specimen, with a viral load of 1.13 x 10(6) copies per ml, from a 35-year-old woman with pneumonia. Our data support the existence of a novel group 2 coronavirus associated with pneumonia in humans.
Abstract: BACKGROUND: MATLAB is a high-performance language for technical computing, integrating computation, visualization, and programming in an easy-to-use environment. It has been widely used in many areas, such as mathematics and computation, algorithm development, data acquisition, modeling, simulation, and scientific and engineering graphics. However, few functions are freely available in MATLAB to perform the sequence data analyses specifically required for molecular biology and evolution. RESULTS: We have developed a MATLAB toolbox, called MBEToolbox, aimed at filling this gap by offering efficient implementations of the most needed functions in molecular biology and evolution. It can be used to manipulate aligned sequences, calculate evolutionary distances, estimate synonymous and nonsynonymous substitution rates, and infer phylogenetic trees. Moreover, it provides an extensible, functional framework for users with more specialized requirements to explore and analyze aligned nucleotide or protein sequences from an evolutionary perspective. The full functions in the toolbox are accessible through the command-line for seasoned MATLAB users. A graphical user interface, that may be especially useful for non-specialist end users, is also provided. CONCLUSION: MBEToolbox is a useful tool that can aid in the exploration, interpretation and visualization of data in molecular biology and evolution. The software is publicly available at http://web.hku.hk/~jamescai/mbetoolbox/ and http://bioinformatics.org/project/?group_id=454
Abstract: We report the complete sequence of the mitochondrial genome of Penicillium marneffei, the first complete mitochondrial DNA sequence of a thermal dimorphic fungus. This 35 kb mitochondrial genome contains the genes encoding ATP synthase subunits 6, 8, and 9 (atp6, atp8, and atp9), cytochrome oxidase subunits I, II, and III (cox1, cox2, and cox3), apocytochrome b (cob), reduced nicotinamide adenine dinucleotide ubiquinone oxireductase subunits (nad1, nad2, nad3, nad4, nad4L, nad5, and nad6), ribosomal protein of the small ribosomal subunit (rps), 28 tRNAs, and small and large ribosomal RNAs. Analysis of gene contents, gene orders, and gene sequences revealed that the mitochondrial genome of P. marneffei is more closely related to those of molds than yeasts.
Abstract: Penicillium marneffei is a dimorphic fungus that intracellularly infects the reticuloendothelial system of humans and bamboo rats. Endemic in Southeast Asia, it infects 10% of AIDS patients in this region. The absence of a sexual stage and the highly infectious nature of the mould-phase conidia have impaired studies on thermal dimorphic switching and host-microbe interactions. Genomic analysis, therefore, could provide crucial information. Pulsed-field gel electrophoresis of genomic DNA of P. marneffei revealed three or more chromosomes (5.0, 4.0, and 2.2 Mb). Telomeric fingerprinting revealed 6-12 bands, suggesting that there were chromosomes of similar sizes. The genome size of P. marneffei was hence about 17.8-26.2 Mb. G+C content of the genome is 48.8 mol%. Random exploration of the genome of P. marneffei yielded 2303 random sequence tags (RSTs), corresponding to 9% of the genome, with 11.7, 6.3, and 17.4% of the RSTs having sequence similarity to yeast-specific sequences, non-yeast fungus sequences, and both (common sequences), respectively. Analysis of the RSTs revealed genes for information transfer (ribosomal protein genes, tRNA synthetase subunits, translation initiation, and elongation factors), metabolism, and compartmentalization, including several multi-drug-resistance protein genes and homologues of fluconazole-resistance gene. Furthermore, the presence of genes encoding pheromone homologues and ankyrin repeat-containing proteins of other fungi and algae strongly suggests the presence of a sexual stage that presumably exists in the environment.
Abstract: Penicillium marneffei is a thermally dimorphic fungus that alternates between a filamentous and a yeast growth form in response to changes in its environmental temperature. It has become an emerging fungal pathogen endemic in Southeast Asia. Defining the genomics of P. marneffei will provide a better understanding of the fungus.
The draft sequence of the P. marneffei genome assembled from 6.6 coverage of the genome through whole genome shotgun sequencing. The 31 Mb genome obtained from the assembly contains 10,060 protein-coding genes. The complete mitochondrial genome is 35 kb long and its gene content and gene order are very similar to that of Aspergillus. The annotation system and P. marneffei genome database (PMGD) were developed to allow a preliminary annotation of the sequences and provide an intuitive graphic interface to give curators and users ready access to the annotation and the underlying evidence.
Analysis of the gene set of P. marneffei provided insights into the adaptations required by the fungus to cause disease. The genome encodes a diverse set of putative virulence genes such as proteinase, phospholipase, metacaspase and agglutinin, which may enable the fungus to adhere to, colonise and invade the host, adapt to the tissue environment, and avoid the host's humoral and cellular defences of the innate and adaptive immune responses. The gene cluster involved in biosynthesis of melanin, a known virulence factor in some other pathogenic fungi, was identified in the genome, indicating that P. marneffei may produce melanin or melanin-like immunosuppressive compounds that protect the fungus against immune effector cells. More interestingly, P. marneffei genome contains more intragenic tandem repeats (IntraTRs) than other fungi. These IntraTRs encoding repeat domains/motifs may create quantitative variation in surface proteins, allowing the fungus to âdisguiseâ itself to slip past the vigilant defences of the host immune system. The genome sequence of P. marneffei also revealed a number of genes associated with mating processes and sexual development, suggesting an unidentified sexual cycle in the fungus.
The extent and evolutionary patterns of duplicate genes in P. marneffei and other ascomycetes were compared. All ascomycetes show a certain degree of redundancy (though its extent can vary considerably), which may provide the foundation for the specialisation of fungal genes and form the basis for fungal diversification. An inverse relationship between the lineage specificity of a gene and gene's evolutionary rate was also discovered, implying that an accelerated evolutionary rate may be responsible for the emergence of lineage specific genes.
The genome sequence of P. marneffei has provided our first glimpse into the genomic basis of the physiology of the dimorphic filamentous fungus.