Abstract: Candida glabrata is a common opportunistic human pathogen leading to significant mortality in immunosuppressed and immunodeficient individuals. We carried out proteomic analysis of C. glabrata using high resolution Fourier transform mass spectrometry with MS resolution of 60,000 and MS/MS resolution of 7500. On the basis of 32,453 unique peptides identified from 118,815 peptide-spectrum matches, we validated 4421 of the 5283 predicted protein-coding genes (83%) in the C. glabrata genome. Further, searching the tandem mass spectra against a six frame translated genome database of C. glabrata resulted in identification of 11 novel protein coding genes and correction of gene boundaries for 14 predicted gene models. A subset of novel protein-coding genes and corrected gene models were validated at the transcript level by RT-PCR and sequencing. Our study illustrates how proteogenomic analysis enabled by high resolution mass spectrometry can enrich genome annotation and should be an integral part of ongoing genome sequencing and annotation efforts.
Abstract: Visceral leishmaniasis or kala azar is the most severe form of leishmaniasis and is caused by the protozoan parasite Leishmania donovani. There is no published report on L. donovani genome sequence available till date, although the genome sequences of three related Leishmania species are already available. Thus, we took a proteogenomic approach to identify proteins from two different life stages of L. donovani. From our analysis of the promastigote (insect) and amastigote (human) stages of L. donovani, we identified a total of 22,322 unique peptides from a homology-based search against proteins from three Leishmania species. These peptides were assigned to 3711 proteins in L. infantum, 3287 proteins in L. major, and 2433 proteins in L. braziliensis. Of the 3711 L. donovani proteins that were identified, the expression of 1387 proteins was detectable in both life stages of the parasite, while 901 and 1423 proteins were identified only in promastigotes and amastigotes life stages, respectively. In addition, we also identified 13 N-terminally and one C-terminally extended proteins based on the proteomic data search against the six-frame translated genome of the three related Leishmania species. Here, we report results from proteomic profiling of L. donovani, an organism with an unsequenced genome.
Abstract: Mangifera indica (Mango) is an important fruit crop in tropical countries with India being the leading producer in the world. Substantial research efforts are being devoted to produce fruit that have desirable characteristics including those that pertain to taste, hardiness and resistance to pests. Characterization of the genome and proteome of mango would help in the improvement of cultivars. As the mango genome has not yet been sequenced, we employed a mass spectrometry-based approach followed by database searches of mango-derived ESTs and proteins along with proteins from six other closely related plant species to characterize its proteome. In addition to this, de novo sequencing followed by homology-based protein identification was also carried out. The LC-MS/MS analysis of the mango leaf proteome was performed using an accurate mass quadrupole time-of-flight mass spectrometer. This integrative approach enabled the identification of 1001 peptides that matched to 538 proteins. To our knowledge, this study is the first high-throughput analysis of mango leaf proteome and could pave the way for further genomic, transcriptomic and proteomic studies.
Abstract: The study of the human urinary proteome has the potential to offer significant insights into normal physiology as well as disease pathology. The information obtained from such studies could be applied to the diagnosis of various diseases. The high sensitivity, resolution, and mass accuracy of the latest generation of mass spectrometers provides an opportunity to accurately catalog the proteins present in human urine, including those present at low levels. To this end, we carried out a comprehensive analysis of human urinary proteome from healthy individuals using high-resolution Fourier transform mass spectrometry. Importantly, we used the Orbitrap for detecting ions in both MS (resolution 60 000) and MS/MS (resolution 15 000) modes. To increase the depth of our analysis, we characterized both unfractionated as well as lectin-enriched proteins in our experiments. In all, we identified 1,823 proteins with less than 1% false discovery rate, of which 671 proteins have not previously been reported as constituents of human urine. This data set should serve as a comprehensive reference list for future studies aimed at identification and characterization of urinary biomarkers for various diseases.
Abstract: The ability to sequence DNA rapidly, inexpensively and in a high-throughput fashion provides a unique opportunity to sequence whole genomes of a large number of species. The cataloging of protein-coding genes from these species, however, remains a non-trivial task with the majority of initial genome annotation dependent on the use of gene prediction algorithms. Recent advances in mass spectrometry-based proteomics now enable generation of accurate and comprehensive protein sequence of tissues and organisms. Proteogenomics allows us to harness the wealth of information available at the proteome level and apply it to the available genomic information of organisms. This includes identifying novel genes and splice isoforms, assigning correct start sites and validating predicted exons and genes. It is also possible to use proteogenomics to identify protein variants that could cause diseases, to identify protein biomarkers and to study genome variation. We anticipate proteogenomics to become a powerful approach that will be routinely employed by 'Genome and Proteome Centers' of the future.
Abstract: Esophageal squamous cell carcinoma (ESCC) is among the top ten most frequent malignancies worldwide. In this study, our objective was to identify potential biomarkers for ESCC through a quantitative proteomic approach using the isobaric tags for relative and absolute quantitation (iTRAQ) approach. We compared the protein expression profiles of ESCC tumor tissues with the corresponding adjacent normal tissue from ten patients. LC-MS/MS analysis of strong cation exchange chromatography fractions was carried out on an Accurate Mass QTOF mass spectrometer, which led to the identification of 687 proteins. In all, 257 proteins were identified as differentially expressed in ESCC as compared to normal. We found several previously known protein biomarkers to be upregulated in ESCC including thrombospondin 1 (THBS1), periostin 1 (POSTN) and heat shock 70 kDa protein 9 (HSPA9) confirming the validity of our approach. In addition, several novel proteins that had not been reported previously were identified in our screen. These novel biomarker candidates included prosaposin (PSAP), plectin 1 (PLEC1) and protein disulfide isomerase A 4 (PDIA4) that were further validated to be overexpressed by immunohistochemical labeling using tissue microarrays. The success of our study shows that this mass spectrometric strategy can be applied to cancers in general to develop a panel of candidate biomarkers, which can then be validated by other techniques.
Abstract: Anopheles gambiae is a major mosquito vector responsible for malaria transmission, whose genome sequence was reported in 2002. Genome annotation is a continuing effort, and many of the approximately 13,000 genes listed in VectorBase for Anopheles gambiae are predictions that have still not been validated by any other method. To identify protein-coding genes of An. gambiae based on its genomic sequence, we carried out a deep proteomic analysis using high-resolution Fourier transform mass spectrometry for both precursor and fragment ions. Based on peptide evidence, we were able to support or correct more than 6000 gene annotations including 80 novel gene structures and about 500 translational start sites. An additional validation by RT-PCR and cDNA sequencing was successfully performed for 105 selected genes. Our proteogenomic analysis led to the identification of 2682 genome search-specific peptides. Numerous cases of encoded proteins were documented in regions annotated as intergenic, introns, or untranslated regions. Using a database created to contain potential splice sites, we also identified 35 novel splice junctions. This is a first report to annotate the An. gambiae genome using high-accuracy mass spectrometry data as a complementary technology for genome annotation.
Abstract: Oligodendrocytes (OLs) are glial cells of the central nervous system, which produce myelin. Cultured OLs provide immense therapeutic opportunities for treating a variety of neurological conditions. One of the most promising sources for such therapies is human embryonic stem cells (ESCs) as well as providing a model to study human OL development. For these purposes, an investigation of proteome level changes is critical for understanding the process of OL differentiation. In this report, an iTRAQ-based quantitative proteomic approach was used to study multiple steps during OL differentiation including neural progenitor cells, glial progenitor cells and oligodendrocyte progenitor cells (OPCs) compared to undifferentiated ESCs. Using a 1% false discovery rate cutoff, ∼3145 proteins were quantitated and several demonstrated progressive stage-specific expression. Proteins such as transferrin, neural cell adhesion molecule 1, apolipoprotein E and wingless-related MMTV integration site 5A showed increased expression from the neural progenitor cell to the OPC stage. Several proteins that have demonstrated evidence or been suspected in OL maturation were also found upregulated in OPCs including fatty acid-binding protein 4, THBS1, bone morphogenetic protein 1, CRYAB, transferrin, tenascin C, COL3A1, TGFBI and EPB41L3. Thus, by providing the first extensive proteomic profiling of human ESC differentiation into OPCs, this study provides many novel proteins that are potentially involved in OL development.
Abstract: The identification of secreted proteins that are differentially expressed between non-neoplastic and esophageal squamous cell carcinoma (ESCC) cells can provide potential biomarkers of ESCC. We used a SILAC-based quantitative proteomic approach to compare the secretome of ESCC cells with that of non-neoplastic esophageal squamous epithelial cells. Proteins were resolved by SDS-PAGE, and tandem mass spectrometry analysis (LC-MS/MS) of in-gel trypsin-digested peptides was carried out on a high-accuracy qTOF mass spectrometer. In total, we identified 441 proteins in the combined secretomes, including 120 proteins with > 2-fold upregulation in the ESCC secretome vs. that of non-neoplastic esophageal squamous epithelial cells. In this study, several potential protein biomarkers previously known to be increased in ESCC including matrix metalloproteinase 1, transferrin receptor, and transforming growth factor beta-induced 68 kDa were identified as overexpressed in the ESCC-derived secretome. In addition, we identified several novel proteins that have not been previously reported to be associated with ESCC. Among the novel candidate proteins identified, protein disulfide isomerase family a member 3 (PDIA3), GDP dissociation inhibitor 2 (GDI2), and lectin galactoside binding soluble 3 binding protein (LGALS3BP) were further validated by immunoblot analysis and immunohistochemical labeling using tissue microarrays. This tissue microarray analysis showed overexpression of protein disulfide isomerase family a member 3, GDP dissociation inhibitor 2, and lectin galactoside binding soluble 3 binding protein in 93%, 93% and 87% of 137 ESCC cases, respectively. Hence, we conclude that these potential biomarkers are excellent candidates for further evaluation to test their role and efficacy in the early detection of ESCC.
Abstract: Sharing proteomic data with the biomedical community through a unified proteomic resource, especially in the context of individual proteins, is a challenging prospect. We have developed a community portal, designated as Human Proteinpedia (http://www.humanproteinpedia.org/), for sharing both unpublished and published human proteomic data through the use of a distributed annotation system designed specifically for this purpose. This system allows laboratories to contribute and maintain protein annotations, which are also mapped to the corresponding proteins through the Human Protein Reference Database (HPRD; http://www.hprd.org/). Thus, it is possible to visualize data pertaining to experimentally validated posttranslational modifications (PTMs), protein isoforms, protein-protein interactions (PPIs), tissue expression, expression in cell lines, subcellular localization and enzyme substrates in the context of individual proteins. With enthusiastic participation of the proteomics community, the past 15 months have witnessed data contributions from more than 75 labs around the world including 2710 distinct experiments, >1.9 million peptides, >4.8 million MS/MS spectra, 150,368 protein expression annotations, 17,410 PTMs, 34,624 PPIs and 2906 subcellular localization annotations. Human Proteinpedia should serve as an integrated platform to store, integrate and disseminate such proteomic data and is inching towards evolving into a unified human proteomics resource.