hosted by
publicationslist.org
    

Gajendra PS Raghava

Dr. G.P.S.Raghava, Scientist and Head of Bioinformatics Centre
Institute of Microbial Technology; Sec-39A; Chandigarh, INDIA
raghava@imtech.res.in
Dr Raghava is a scientist working at Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh, India. He did M.Sc. form Meerut University, M.Tech from IIT Delhi and PhD from IMTECH Chandigarh. He worked as Postdoctoral fellow at Oxford university UK (1996-98), Bioinformatics specialist at UAMS, USA (2002-3 & 2006) and visiting professor at POSTECH, South Korea (2004). His group developed more than 100 web servers, 100 research papers, 50 Copyrights, 10 databases and mirror sites. He is responsible for setting Bioinformatics infrastructure at IMTECH and at UAMS Little Rock, USA (http://bic.uams.edu/). He got following major awards/recognition i) Shanti Swarup Bhatnagar Award in Biological Science, 2008 ii) National Bioscience Award for Carrier Development, for year 2005-2006 (by Department of Biotechnology, Govt. India); iii) NASI-Reliance Industries Platinum Jubilee Award, 2009; iv) Thomson Reuters Research Excellence - India Research Front Awards, 2009; v) J. C. Bose national fellowship 2010by DST, vi) Fellow of National Academy of Sciences (F.N.A.Sc); and vii) Fellow of Indian Academy of Science (F.A.Sc.) Banglore.

Journal articles

2012
Harinder Singh, Jagat Singh Chauhan, M Michael Gromiha, Gajendra P S Raghava (2012)  ccPDB: compilation and creation of data sets from Protein Data Bank.   Nucleic Acids Res 40: Database issue. D486-D489 Jan  
Abstract: ccPDB (http://crdd.osdd.net/raghava/ccpdb/) is a database of data sets compiled from the literature and Protein Data Bank (PDB). First, we collected and compiled data sets from the literature used for developing bioinformatics methods to annotate the structure and function of proteins. Second, data sets were derived from the latest release of PDB using standard protocols. Third, we developed a powerful module for creating a wide range of customized data sets from the current release of PDB. This is a flexible module that allows users to create data sets using a simple six step procedure. In addition, a number of web services have been integrated in ccPDB, which include submission of jobs on PDB-based servers, annotation of protein structures and generation of patterns. This database maintains >30 types of data sets such as secondary structure, tight-turns, nucleotide interacting residues, metals interacting residues, DNA/RNA binding residues and so on.
Notes:
Aadil H Bhat, Homchoru Mondal, Jagat S Chauhan, Gajendra P S Raghava, Amrish Methi, Alka Rao (2012)  ProGlycProt: a repository of experimentally characterized prokaryotic glycoproteins.   Nucleic Acids Res 40: Database issue. D388-D393 Jan  
Abstract: ProGlycProt (http://www.proglycprot.org/) is an open access, manually curated, comprehensive repository of bacterial and archaeal glycoproteins with at least one experimentally validated glycosite (glycosylated residue). To facilitate maximum information at one point, the database is arranged under two sections: (i) ProCGP-the main data section consisting of 95 entries with experimentally characterized glycosites and (ii) ProUGP-a supplementary data section containing 245 entries with experimentally identified glycosylation but uncharacterized glycosites. Every entry in the database is fully cross-referenced and enriched with available published information about source organism, coding gene, protein, glycosites, glycosylation type, attached glycan, associated oligosaccharyl/glycosyl transferases (OSTs/GTs), supporting references, and applicable additional information. Interestingly, ProGlycProt contains as many as 174 entries for which information is unavailable or the characterized glycosites are unannotated in Swiss-Prot release 2011_07. The website supports a dedicated structure gallery of homology models and crystal structures of characterized glycoproteins in addition to two new tools developed in view of emerging information about prokaryotic sequons (conserved sequences of amino acids around glycosites) that are never or rarely seen in eukaryotic glycoproteins. ProGlycProt provides an extensive compilation of experimentally identified glycosites (334) and glycoproteins (340) of prokaryotes that could serve as an information resource for research and technology applications in glycobiology.
Notes:
2011
Atul Tyagi, Firoz Ahmed, Nishant Thakur, Arun Sharma, Gajendra P S Raghava, Manoj Kumar (2011)  HIVsirDB: a database of HIV inhibiting siRNAs.   PLoS One 6: 10. 10  
Abstract: Human immunodeficiency virus (HIV) is responsible for millions of deaths every year. The current treatment involves the use of multiple antiretroviral agents that may harm patients due to their toxic nature. RNA interference (RNAi) is a potent candidate for the future treatment of HIV, uses short interfering RNA (siRNA/shRNA) for silencing HIV genes. In this study, attempts have been made to create a database HIVsirDB of siRNAs responsible for silencing HIV genes.
Notes:
Deepak Singla, Meenakshi Anurag, Debasis Dash, Gajendra P S Raghava (2011)  A web server for predicting inhibitors against bacterial target GlmU protein.   BMC Pharmacol 11: 07  
Abstract: The emergence of drug resistant tuberculosis poses a serious concern globally and researchers are in rigorous search for new drugs to fight against these dreadful bacteria. Recently, the bacterial GlmU protein, involved in peptidoglycan, lipopolysaccharide and techoic acid synthesis, has been identified as an important drug target. A unique C-terminal disordered tail, essential for survival and the absence of gene in host makes GlmU a suitable target for inhibitor design.
Notes:
Bharat Panwar, G P S Raghava (2011)  Predicting sub-cellular localization of tRNA synthetases from their primary structures.   Amino Acids Mar  
Abstract: Since endo-symbiotic events occur, all genes of mitochondrial aminoacyl tRNA synthetase (AARS) were lost or transferred from ancestral mitochondrial genome into the nucleus. The canonical pattern is that both cytosolic and mitochondrial AARSs coexist in the nuclear genome. In the present scenario all mitochondrial AARSs are nucleus-encoded, synthesized on cytosolic ribosomes and post-translationally imported from the cytosol into the mitochondria in eukaryotic cell. The site-based discrimination between similar types of enzymes is very challenging because they have almost same physico-chemical properties. It is very important to predict the sub-cellular location of AARSs, to understand the mitochondrial protein synthesis. We have analyzed and optimized the distinguishable patterns between cytosolic and mitochondrial AARSs. Firstly, support vector machines (SVM)-based modules have been developed using amino acid and dipeptide compositions and achieved Mathews correlation coefficient (MCC) of 0.82 and 0.73, respectively. Secondly, we have developed SVM modules using position-specific scoring matrix and achieved the maximum MCC of 0.78. Thirdly, we developed SVM modules using N-terminal, intermediate residues, C-terminal and split amino acid composition (SAAC) and achieved MCC of 0.82, 0.70, 0.39 and 0.86, respectively. Finally, a SVM module was developed using selected attributes of split amino acid composition (SA-SAAC) approach and achieved MCC of 0.92 with an accuracy of 96.00%. All modules were trained and tested on a non-redundant data set and evaluated using fivefold cross-validation technique. On the independent data sets, SA-SAAC based prediction model achieved MCC of 0.95 with an accuracy of 97.77%. The web-server 'MARSpred' based on above study is available at http://www.imtech.res.in/raghava/marspred/ .
Notes:
Firoz Ahmed, Gajendra P S Raghava (2011)  Designing of Highly Effective Complementary and Mismatch siRNAs for Silencing a Gene.   PLoS One 6: 8. 08  
Abstract: In past, numerous methods have been developed for predicting efficacy of short interfering RNA (siRNA). However these methods have been developed for predicting efficacy of fully complementary siRNA against a gene. Best of author's knowledge no method has been developed for predicting efficacy of mismatch siRNA against a gene. In this study, a systematic attempt has been made to identify highly effective complementary as well as mismatch siRNAs for silencing a gene.Support vector machine (SVM) based models have been developed for predicting efficacy of siRNAs using composition, binary and hybrid pattern siRNAs. We achieved maximum correlation 0.67 between predicted and actual efficacy of siRNAs using hybrid model. All models were trained and tested on a dataset of 2182 siRNAs and performance was evaluated using five-fold cross validation techniques. The performance of our method desiRm is comparable to other well-known methods. In this study, first time attempt has been made to design mutant siRNAs (mismatch siRNAs). In this approach we mutated a given siRNA on all possible sites/positions with all possible nucleotides. Efficacy of each mutated siRNA is predicted using our method desiRm. It is well known from literature that mismatches between siRNA and target affects the silencing efficacy. Thus we have incorporated the rules derived from base mismatches experimental data to find out over all efficacy of mutated or mismatch siRNAs. Finally we developed a webserver, desiRm (http://www.imtech.res.in/raghava/desirm/) for designing highly effective siRNA for silencing a gene. This tool will be helpful to design siRNA to degrade disease isoform of heterozygous single nucleotide polymorphism gene without depleting the wild type protein.
Notes:
Subhash M Agarwal, Dhwani Raghav, Harinder Singh, G P S Raghava (2011)  CCDB: a curated database of genes involved in cervix cancer.   Nucleic Acids Res 39: Database issue. D975-D979 Jan  
Abstract: The Cervical Cancer gene DataBase (CCDB, http://crdd.osdd.net/raghava/ccdb) is a manually curated catalog of experimentally validated genes that are thought, or are known to be involved in the different stages of cervical carcinogenesis. In spite of the large women population that is presently affected from this malignancy still at present, no database exists that catalogs information on genes associated with cervical cancer. Therefore, we have compiled 537 genes in CCDB that are linked with cervical cancer causation processes such as methylation, gene amplification, mutation, polymorphism and change in expression level, as evident from published literature. Each record contains details related to gene like architecture (exon-intron structure), location, function, sequences (mRNA/CDS/protein), ontology, interacting partners, homology to other eukaryotic genomes, structure and links to other public databases, thus augmenting CCDB with external data. Also, manually curated literature references have been provided to support the inclusion of the gene in the database and establish its association with cervix cancer. In addition, CCDB provides information on microRNA altered in cervical cancer as well as search facility for querying, several browse options and an online tool for sequence similarity search, thereby providing researchers with easy access to the latest information on genes involved in cervix cancer.
Notes:
Manish Kumar, M Michael Gromiha, Gajendra P S Raghava (2011)  SVM based prediction of RNA-binding proteins using binding residues and evolutionary information.   J Mol Recognit 24: 2. 303-313 Mar/Apr  
Abstract: RNA-binding proteins (RBPs) play crucial role in transcription and gene-regulation. This paper describes a support vector machine (SVM) based method for discriminating and classifying RNA-binding and non-binding proteins using sequence features. With the threshold of 30% interacting residues, RNA-binding amino acid prediction method PPRINT achieved the Matthews correlation coefficient (MCC) of 0.32. BLAST and PSI-BLAST identified RBPs with the coverage of 32.63 and 33.16%, respectively, at the e-value of 1e-4. The SVM models developed with amino acid, dipeptide and four-part amino acid compositions showed the MCC of 0.60, 0.46, and 0.53, respectively. This is the first study in which evolutionary information in form of position specific scoring matrix (PSSM) profile has been successfully used for predicting RBPs. We achieved the maximum MCC of 0.62 using SVM model based on PSSM called PSSM-400. Finally, we developed different hybrid approaches and achieved maximum MCC of 0.66. We also developed a method for predicting three subclasses of RNA binding proteins (e.g., rRNA, tRNA, mRNA binding proteins). The performance of the method was also evaluated on an independent dataset of 69 RBPs and 100 non-RBPs (NBPs). An additional benchmarking was also performed using gene ontology (GO) based annotation. Based on the hybrid approach a web-server RNApred has been developed for predicting RNA binding proteins from amino acid sequences (http://www.imtech.res.in/raghava/rnapred/).
Notes:
Anshu Bhardwaj, Vinod Scaria, Gajendra Pal Singh Raghava, Andrew Michael Lynn, Nagasuma Chandra, Sulagna Banerjee, Muthukurussi V Raghunandanan, Vikas Pandey, Bhupesh Taneja, Jyoti Yadav, Debasis Dash, Jaijit Bhattacharya, Amit Misra, Anil Kumar, Srinivasan Ramachandran, Zakir Thomas, Open Source Drug Discovery Consortium, Samir K Brahmachari (2011)  Open source drug discovery- A new paradigm of collaborative research in tuberculosis drug development.   Tuberculosis (Edinb) 91: 5. 479-486 Sep  
Abstract: It is being realized that the traditional closed-door and market driven approaches for drug discovery may not be the best suited model for the diseases of the developing world such as tuberculosis and malaria, because most patients suffering from these diseases have poor paying capacity. To ensure that new drugs are created for patients suffering from these diseases, it is necessary to formulate an alternate paradigm of drug discovery process. The current model constrained by limitations for collaboration and for sharing of resources with confidentiality hampers the opportunities for bringing expertise from diverse fields. These limitations hinder the possibilities of lowering the cost of drug discovery. The Open Source Drug Discovery project initiated by Council of Scientific and Industrial Research, India has adopted an open source model to power wide participation across geographical borders. Open Source Drug Discovery emphasizes integrative science through collaboration, open-sharing, taking up multi-faceted approaches and accruing benefits from advances on different fronts of new drug discovery. Because the open source model is based on community participation, it has the potential to self-sustain continuous development by generating a storehouse of alternatives towards continued pursuit for new drug discovery. Since the inventions are community generated, the new chemical entities developed by Open Source Drug Discovery will be taken up for clinical trial in a non-exclusive manner by participation of multiple companies with majority funding from Open Source Drug Discovery. This will ensure availability of drugs through a lower cost community driven drug discovery process for diseases afflicting people with poor paying capacity. Hopefully what LINUX the World Wide Web have done for the information technology, Open Source Drug Discovery will do for drug discovery.
Notes:
Sandhya Agarwal, Nitish Kumar Mishra, Harinder Singh, Gajendra P S Raghava (2011)  Identification of mannose interacting residues using local composition.   PLoS One 6: 9. 09  
Abstract: Mannose binding proteins (MBPs) play a vital role in several biological functions such as defense mechanisms. These proteins bind to mannose on the surface of a wide range of pathogens and help in eliminating these pathogens from our body. Thus, it is important to identify mannose interacting residues (MIRs) in order to understand mechanism of recognition of pathogens by MBPs.
Notes:
2010
Hifzur Rahman Ansari, Darren R Flower, G P S Raghava (2010)  AntigenDB: an immunoinformatics database of pathogen antigens.   Nucleic Acids Res 38: Database issue. D847-D853 Jan  
Abstract: The continuing threat of infectious disease and future pandemics, coupled to the continuous increase of drug-resistant pathogens, makes the discovery of new and better vaccines imperative. For effective vaccine development, antigen discovery and validation is a prerequisite. The compilation of information concerning pathogens, virulence factors and antigenic epitopes has resulted in many useful databases. However, most such immunological databases focus almost exclusively on antigens where epitopes are known and ignore those for which epitope information was unavailable. We have compiled more than 500 antigens into the AntigenDB database, making use of the literature and other immunological resources. These antigens come from 44 important pathogenic species. In AntigenDB, a database entry contains information regarding the sequence, structure, origin, etc. of an antigen with additional information such as B and T-cell epitopes, MHC binding, function, gene-expression and post translational modifications, where available. AntigenDB also provides links to major internal and external databases. We shall update AntigenDB on a rolling basis, regularly adding antigens from other organisms and extra data analysis tools. AntigenDB is available freely at http://www.imtech.res.in/raghava/antigendb and its mirror site http://www.bic.uams.edu/raghava/antigendb.
Notes:
Ruchi Verma, Grish C Varshney, G P S Raghava (2010)  Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile.   Amino Acids 39: 1. 101-110 Jun  
Abstract: The rate of human death due to malaria is increasing day-by-day. Thus the malaria causing parasite Plasmodium falciparum (PF) remains the cause of concern. With the wealth of data now available, it is imperative to understand protein localization in order to gain deeper insight into their functional roles. In this manuscript, an attempt has been made to develop prediction method for the localization of mitochondrial proteins. In this study, we describe a method for predicting mitochondrial proteins of malaria parasite using machine-learning technique. All models were trained and tested on 175 proteins (40 mitochondrial and 135 non-mitochondrial proteins) and evaluated using five-fold cross validation. We developed a Support Vector Machine (SVM) model for predicting mitochondrial proteins of P. falciparum, using amino acids and dipeptides composition and achieved maximum MCC 0.38 and 0.51, respectively. In this study, split amino acid composition (SAAC) is used where composition of N-termini, C-termini, and rest of protein is computed separately. The performance of SVM model improved significantly from MCC 0.38 to 0.73 when SAAC instead of simple amino acid composition was used as input. In addition, SVM model has been developed using composition of PSSM profile with MCC 0.75 and accuracy 91.38%. We achieved maximum MCC 0.81 with accuracy 92% using a hybrid model, which combines PSSM profile and SAAC. When evaluated on an independent dataset our method performs better than existing methods. A web server PFMpred has been developed for predicting mitochondrial proteins of malaria parasites ( http://www.imtech.res.in/raghava/pfmpred/).
Notes:
Aarti Garg, Rupinder Tewari, Gajendra P S Raghava (2010)  Virtual Screening of potential drug-like inhibitors against Lysine/DAP pathway of Mycobacterium tuberculosis.   BMC Bioinformatics 11 Suppl 1: 01  
Abstract: An explosive global spreading of multidrug resistant Mycobacterium tuberculosis (Mtb) is a catastrophe, which demands an urgent need to design or develop novel/potent antitubercular agents. The Lysine/DAP biosynthetic pathway is a promising target due its specific role in cell wall and amino acid biosynthesis. Here, we report identification of potential antitubercular candidates targeting Mtb dihydrodipicolinate synthase (DHDPS) enzyme of the pathway using virtual screening protocols.
Notes:
Nitish K Mishra, Sandhya Agarwal, Gajendra Ps Raghava (2010)  Prediction of cytochrome P450 isoform responsible for metabolizing a drug molecule.   BMC Pharmacol 10: 07  
Abstract: Different isoforms of Cytochrome P450 (CYP) metabolized different types of substrates (or drugs molecule) and make them soluble during biotransformation. Therefore, fate of any drug molecule depends on how they are treated or metabolized by CYP isoform. There is a need to develop models for predicting substrate specificity of major isoforms of P450, in order to understand whether a given drug will be metabolized or not. This paper describes an in-silico method for predicting the metabolizing capability of major isoforms (e.g. CYP 3A4, 2D6, 1A2, 2C9 and 2C19).
Notes:
Bharat Panwar, Gajendra P S Raghava (2010)  Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains.   BMC Genomics 11: 09  
Abstract: Aminoacyl tRNA synthetases (aaRSs) catalyse the first step of protein synthesis in all organisms. They are responsible for the precise attachment of amino acids to their cognate transfer RNAs. There are twenty different types of aaRSs, unique for each amino acid. These aaRSs have been divided into two classes, each comprising ten enzymes. It is important to predict and classify aaRSs in order to understand protein synthesis.
Notes:
Mamoon Rashid, Sumathy Ramasamy, Gajendra P S Raghava (2010)  A simple approach for predicting protein-protein interactions.   Curr Protein Pept Sci 11: 7. 589-600 Nov  
Abstract: The availability of an increased number of fully sequenced genomes demands functional interpretation of the genomic information. Despite high throughput experimental techniques and in silico methods of predicting protein-protein interaction (PPI); the interactome of most organisms is far from completion. Thus, predicting the interactome of an organism is one of the major challenges in the post-genomic era. This manuscript describes Support Vector Machine (SVM) based models that have been developed for discriminating interacting and non-interacting pairs of proteins from their amino acid sequence. We have developed SVM models using various types of sequence compositions e.g. amino acid, dipeptide, biochemical property, split amino acid and pseudo amino acid composition. We also developed SVM models using evolutionary information in the form of Position Specific Scoring Matrix (PSSM) composition. We achieved maximum Matthews's correlation coefficient (MCC) of 1.00, 0.52 and 0.74 for Escherichia coli, Saccharomyces cerevisiae, and Helicobacter pylori, using dipeptide based SVM model at default threshold. It was observed that the performance of a prediction model depends on the dataset used for training and testing. In case of E. coli MCC decreased from 1.0 to 0.67 when evaluated on a new dataset. In order to understand PPI in different cellular environment, we developed species-specific and general models. It was observed that species-specific models are more accurate than general models. We conclude that the primary amino acid sequence based descriptors could be used to differentiate interacting from non-interacting protein pairs. Some amino acids tend to be favored in interacting pairs than non-interacting ones. Finally, a web server has been developed for predicting protein-protein interactions.
Notes:
Anastas Pashov, Bejatolah Monzavi-Karbassi, Gajendra P S Raghava, Thomas Kieber-Emmons (2010)  Bridging innate and adaptive antitumor immunity targeting glycans.   J Biomed Biotechnol 2010: 06  
Abstract: Effective immunotherapy for cancer depends on cellular responses to tumor antigens. The role of major histocompatibility complex (MHC) in T-cell recognition and T-cell receptor repertoire selection has become a central tenet in immunology. Structurally, this does not contradict earlier findings that T-cells can differentiate between small hapten structures like simple glycans. Understanding T-cell recognition of antigens as defined genetically by MHC and combinatorially by T cell receptors led to the "altered self" hypothesis. This notion reflects a more fundamental principle underlying immune surveillance and integrating evolutionarily and mechanistically diverse elements of the immune system. Danger associated molecular patterns, including those generated by glycan remodeling, represent an instance of altered self. A prominent example is the modification of the tumor-associated antigen MUC1. Similar examples emphasize glycan reactivity patterns of antigen receptors as a phenomenon bridging innate and adaptive but also humoral and cellular immunity and providing templates for immunotherapies.
Notes:
Jagat S Chauhan, Nitish K Mishra, Gajendra P S Raghava (2010)  Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information.   BMC Bioinformatics 11: 06  
Abstract: Guanosine triphosphate (GTP)-binding proteins play an important role in regulation of G-protein. Thus prediction of GTP interacting residues in a protein is one of the major challenges in the field of the computational biology. In this study, an attempt has been made to develop a computational method for predicting GTP interacting residues in a protein with high accuracy (Acc), precision (Prec) and recall (Rc).
Notes:
Nitish K Mishra, Gajendra P S Raghava (2010)  Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information.   BMC Bioinformatics 11 Suppl 1: 01  
Abstract: Flavin binding proteins (FBP) plays a critical role in several biological functions such as electron transport system (ETS). These flavoproteins contain very tightly bound, sometimes covalently, flavin adenine dinucleotide (FAD) or flavin mono nucleotide (FMN). The interaction between flavin nucleotide and amino acids of flavoprotein is essential for their functionality. Thus identification of FAD interacting residues in a FBP is an important step for understanding their function and mechanism.
Notes:
Deepak Singla, Arun Sharma, Jasjit Kaur, Bharat Panwar, Gajendra P S Raghava (2010)  BIAdb: a curated database of benzylisoquinoline alkaloids.   BMC Pharmacol 10: 03  
Abstract: Benzylisoquinoline is the structural backbone of many alkaloids with a wide variety of structures including papaverine, noscapine, codeine, morphine, apomorphine, berberine, protopine and tubocurarine. Many benzylisoquinoline alkaloids have been reported to show therapeutic properties and to act as novel medicines. Thus it is important to collect and compile benzylisoquinoline alkaloids in order to explore their usage in medicine.
Notes:
Aarti Garg, Rupinder Tewari, Gajendra P S Raghava (2010)  KiDoQ: using docking based energy scores to develop ligand based model for predicting antibacterials.   BMC Bioinformatics 11: 03  
Abstract: Identification of novel drug targets and their inhibitors is a major challenge in the field of drug designing and development. Diaminopimelic acid (DAP) pathway is a unique lysine biosynthetic pathway present in bacteria, however absent in mammals. This pathway is vital for bacteria due to its critical role in cell wall biosynthesis. One of the essential enzymes of this pathway is dihydrodipicolinate synthase (DHDPS), considered to be crucial for the bacterial survival. In view of its importance, the development and prediction of potent inhibitors against DHDPS may be valuable to design effective drugs against bacteria, in general.
Notes:
Hifzur R Ansari, Gajendra P S Raghava (2010)  Identification of NAD interacting residues in proteins.   BMC Bioinformatics 11: 03  
Abstract: Small molecular cofactors or ligands play a crucial role in the proper functioning of cells. Accurate annotation of their target proteins and binding sites is required for the complete understanding of reaction mechanisms. Nicotinamide adenine dinucleotide (NAD+ or NAD) is one of the most commonly used organic cofactors in living cells, which plays a critical role in cellular metabolism, storage and regulatory processes. In the past, several NAD binding proteins (NADBP) have been reported in the literature, which are responsible for a wide-range of activities in the cell. Attempts have been made to derive a rule for the binding of NAD+ to its target proteins. However, so far an efficient model could not be derived due to the time consuming process of structure determination, and limitations of similarity based approaches. Thus a sequence and non-similarity based method is needed to characterize the NAD binding sites to help in the annotation. In this study attempts have been made to predict NAD binding proteins and their interacting residues (NIRs) from amino acid sequence using bioinformatics tools.
Notes:
Sneh Lata, Nitish K Mishra, Gajendra P S Raghava (2010)  AntiBP2: improved version of antibacterial peptide prediction.   BMC Bioinformatics 11 Suppl 1: 01  
Abstract: Antibacterial peptides are one of the effecter molecules of innate immune system. Over the last few decades several antibacterial peptides have successfully approved as drug by FDA, which has prompted an interest in these antibacterial peptides. In our recent study we analyzed 999 antibacterial peptides, which were collected from Antibacterial Peptide Database (APD). We have also developed methods to predict and classify these antibacterial peptides using Support Vector Machine (SVM).
Notes:
2009
Sneh Lata, Manoj Bhasin, Gajendra P S Raghava (2009)  MHCBN 4.0: A database of MHC/TAP binding peptides and T-cell epitopes.   BMC Res Notes 2: 04  
Abstract: Many databases housing the information about MHC binders and non-binders have been developed in the past to help the scientific community working in the field of immunology, immune-informatics or vaccine design. As the information about these MHC binding and non-binding peptides continues to grow with the time and there is a need to keep the databases updated. So, in order to provide the immunological fraternity with the most recent information we need to maintain and update our database regularly. In this paper, we describe the updated version of 4.0 of the database MHCBN.
Notes:
Mamoon Rashid, Deepak Singla, Arun Sharma, Manish Kumar, Gajendra P S Raghava (2009)  Hmrbase: a database of hormones and their receptors.   BMC Genomics 10: 07  
Abstract: Hormones are signaling molecules that play vital roles in various life processes, like growth and differentiation, physiology, and reproduction. These molecules are mostly secreted by endocrine glands, and transported to target organs through the bloodstream. Deficient, or excessive, levels of hormones are associated with several diseases such as cancer, osteoporosis, diabetes etc. Thus, it is important to collect and compile information about hormones and their receptors.
Notes:
Rakesh Kaundal, Gajendra P S Raghava (2009)  RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information.   Proteomics 9: 9. 2324-2342 May  
Abstract: The attainment of complete map-based sequence for rice (Oryza sativa) is clearly a major milestone for the research community. Identifying the localization of encoded proteins is the key to understanding their functional characteristics and facilitating their purification. Our proposed method, RSLpred, is an effort in this direction for genome-scale subcellular prediction of encoded rice proteins. First, the support vector machine (SVM)-based modules have been developed using traditional amino acid-, dipeptide- (i+1) and four parts-amino acid composition and achieved an overall accuracy of 81.43, 80.88 and 81.10%, respectively. Secondly, a similarity search-based module has been developed using position-specific iterated-basic local alignment search tool and achieved 68.35% accuracy. Another module developed using evolutionary information of a protein sequence extracted from position-specific scoring matrix achieved an accuracy of 87.10%. In this study, a large number of modules have been developed using various encoding schemes like higher-order dipeptide composition, N- and C-terminal, splitted amino acid composition and the hybrid information. In order to benchmark RSLpred, it was tested on an independent set of rice proteins where it outperformed widely used prediction methods such as TargetP, Wolf-PSORT, PA-SUB, Plant-Ploc and ESLpred. To assist the plant research community, an online web tool 'RSLpred' has been developed for subcellular prediction of query rice proteins, which is freely accessible at http://www.imtech.res.in/raghava/rslpred.
Notes:
Pankaj K Arora, Manish Kumar, Archana Chauhan, Gajendra P S Raghava, Rakesh K Jain (2009)  OxDBase: a database of oxygenases involved in biodegradation.   BMC Res Notes 2: 04  
Abstract: Oxygenases belong to the oxidoreductive group of enzymes (E.C. Class 1), which oxidize the substrates by transferring oxygen from molecular oxygen (O2) and utilize FAD/NADH/NADPH as the co-substrate. Oxygenases can further be grouped into two categories i.e. monooxygenases and dioxygenases on the basis of number of oxygen atoms used for oxidation. They play a key role in the metabolism of organic compounds by increasing their reactivity or water solubility or bringing about cleavage of the aromatic ring.
Notes:
Firoz Ahmed, Hifzur Rahman Ansari, Gajendra P S Raghava (2009)  Prediction of guide strand of microRNAs from its sequence and secondary structure.   BMC Bioinformatics 10: 04  
Abstract: MicroRNAs (miRNAs) are produced by the sequential processing of a long hairpin RNA transcript by Drosha and Dicer, an RNase III enzymes, and form transitory small RNA duplexes. One strand of the duplex, which incorporates into RNA-induced silencing complex (RISC) and silences the gene expression is called guide strand, or miRNA; while the other strand of duplex is degraded and called the passenger strand, or miRNA*. Predicting the guide strand of miRNA is important for better understanding the RNA interference pathways.
Notes:
Manish Kumar, Gajendra P S Raghava (2009)  Prediction of nuclear proteins using SVM and HMM models.   BMC Bioinformatics 10: 01  
Abstract: The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nuclear proteins in the past. The aim of the present study is to develop a new method for predicting nuclear proteins with higher accuracy.
Notes:
Firoz Ahmed, Manish Kumar, Gajendra P S Raghava (2009)  Prediction of polyadenylation signals in human DNA sequences using nucleotide frequencies.   In Silico Biol 9: 3. 135-148  
Abstract: The polyadenylation signal plays a key role in determining the site for addition of a polyadenylated tail to nascent mRNA and its mutation(s) are reported in many diseases. Thus, identifying poly(A) sites is important for understanding the regulation and stability of mRNA. In this study, Support Vector Machine (SVM) models have been developed for predicting poly(A) signals in a DNA sequence using 100 nucleotides, each upstream and downstream of this signal. Here, we introduced a novel split nucleotide frequency technique, and the models thus developed achieved maximum Matthews correlation coefficients (MCC) of 0.58, 0.69, 0.70 and 0.69 using mononucleotide, dinucleotide, trinucleotide, and tetranucleotide frequencies, respectively. Finally, a hybrid model developed using a combination of dinucleotide, 2nd order dinucleotide and tetranucleotide frequencies, achieved a maximum MCC of 0.72. Moreover, for independent datasets this model achieved a precision ranging from 75.8-95.7% with a sensitivity of 57%, which is better than any other known methods.
Notes:
Neelkamal Chaudhary, Lakshna Mahajan, Taruna Madan, Anil Kumar, Gajendra Pratap Singh Raghava, Seturam Bandacharya Katti, Wahajul Haq, Puranam Usha Sarma (2009)  Prophylactic and Therapeutic Potential of Asp f1 Epitopes in Naïve and Sensitized BALB/c Mice.   Immune Netw 9: 5. 179-191 Oct  
Abstract: The present study examines a hypothesis that short allergen-derived peptides may shift an Aspergillus fumigatus (Afu-) specific TH2 response towards a protective TH1. Five overlapping peptides (P1-P5) derived from Asp f1, a major allergen/antigen of Afu, were evaluated for prophylactic or therapeutic efficacy in BALB/c mice.
Notes:
Jagat S Chauhan, Nitish K Mishra, Gajendra P S Raghava (2009)  Identification of ATP binding residues of a protein from its primary sequence.   BMC Bioinformatics 10: 12  
Abstract: One of the major challenges in post-genomic era is to provide functional annotations for large number of proteins arising from genome sequencing projects. The function of many proteins depends on their interaction with small molecules or ligands. ATP is one such important ligand that plays critical role as a coenzyme in the functionality of many proteins. There is a need to develop method for identifying ATP interacting residues in a ATP binding proteins (ABPs), in order to understand mechanism of protein-ligands interaction.
Notes:
Sneh Lata, G P S Raghava (2009)  Prediction and classification of chemokines and their receptors.   Protein Eng Des Sel 22: 7. 441-444 Jul  
Abstract: Chemokines are low molecular mass cytokine-like proteins that orchestrate myriads of immune functions like leukocyte trafficking, T cell differentiation, angiogenesis, hematopeosis and mast cell degranulation. Chemokines also play a role as HIV-1 inhibitor and act as potent natural adjuvant in antitumor immunotherapy. Receptors for these molecules are all seven-pass transmembrane G-protein-coupled receptors that are intimately involved with chemokines in a wide array of physiological and pathological conditions. These receptors also have a major role as co-receptors for HIV-1 entry into target cells. Therefore, chemokine receptors have proven to be excellent targets for small molecule in pharmaceutical industry. The immense importance of chemokines and their receptors motivated us to develop a support vector machine-based method ChemoPred to predict this important class of proteins and further classify them into subfamilies. ChemoPred is capable of predicting chemokines and chemokine receptors with an accuracy of 95.08% and 92.19%, respectively. The overall accuracy of classification of chemokines into three subfamilies was 96.00% and that of chemokine receptors into three families was 92.87%. The server ChemoPred is freely available at www.imtech.res.in/raghava/chemopred.
Notes:
2008
Sandro Vivona, Jennifer L Gardy, Srinivasan Ramachandran, Fiona S L Brinkman, G P S Raghava, Darren R Flower, Francesco Filippini (2008)  Computer-aided biotechnology: from immuno-informatics to reverse vaccinology.   Trends Biotechnol 26: 4. 190-200 Apr  
Abstract: Genome sequences from many organisms, including humans, have been completed, and high-throughput analyses have produced burgeoning volumes of 'omics' data. Bioinformatics is crucial for the management and analysis of such data and is increasingly used to accelerate progress in a wide variety of large-scale and object-specific functional analyses. Refined algorithms enable biotechnologists to follow 'computer-aided strategies' based on experiments driven by high-confidence predictions. In order to address compound problems, current efforts in immuno-informatics and reverse vaccinology are aimed at developing and tuning integrative approaches and user-friendly, automated bioinformatics environments. This will herald a move to 'computer-aided biotechnology': smart projects in which time-consuming and expensive large-scale experimental approaches are progressively replaced by prediction-driven investigations.
Notes:
Manish Kumar, M Michael Gromiha, G P S Raghava (2008)  Prediction of RNA binding sites in a protein using SVM and PSSM profile.   Proteins 71: 1. 189-194 Apr  
Abstract: RNA-binding proteins (RBPs) play key roles in post-transcriptional control of gene expression, which, along with transcriptional regulation, is a major way to regulate patterns of gene expression during development. Thus, the identification and prediction of RNA binding sites is an important step in comprehensive understanding of how RBPs control organism development. Combining evolutionary information and support vector machine (SVM), we have developed an improved method for predicting RNA binding sites or RNA interacting residues in a protein sequence. The prediction models developed in this study have been trained and tested on 86 RNA binding protein chains and evaluated using fivefold cross validation technique. First, a SVM model was developed that achieved a maximum Matthew's correlation coefficient (MCC) of 0.31. The performance of this SVM model further improved the MCC from 0.31 to 0.45, when multiple sequence alignment in the form of PSSM profiles was used as input to the SVM, which is far better than the maximum MCC achieved by previous methods (0.41) on the same dataset. In addition, SVM models were also developed on an alternative dataset that contained 107 RBP chains. Utilizing PSSM as input information to the SVM, the training/testing on this alternate dataset achieved a maximum MCC of 0.32. Conclusively, the prediction performance of SVM models developed in this study is better than the existing methods on the same datasets. A web server 'Pprint' was also developed for predicting RNA binding residues in a protein sequence which is freely available at http://www.imtech.res.in/raghava/pprint/.
Notes:
Sneh Lata, G P S Raghava (2008)  CytoPred: a server for prediction and classification of cytokines.   Protein Eng Des Sel 21: 4. 279-282 Apr  
Abstract: Cytokines are messengers of immune system. They are small secreted proteins that mediate and regulate the immune system, inflammation and hematopoiesis. Recent studies have revealed important roles played by the cytokines in adjuvants as therapeutic targets and in cancer therapy. In this paper, an attempt has been made to predict this important class of proteins and classify further them into families and subfamilies. A PSI-BLAST+Support Vector Machine-based hybrid approach is adopted to develop the prediction methods. CytoPred is capable of predicting cytokines with an accuracy of 98.29%. The overall accuracy of classification of cytokines into four families and further classification into seven subfamilies is 99.77 and 97.24%, respectively. It has been shown by comparison that CytoPred performs better than the already existing CTKPred. A user-friendly server CytoPred has been developed and available at http://www.imtech.res.in/raghava/cytopred.
Notes:
Ruchi Verma, Ajit Tiwari, Sukhwinder Kaur, Grish C Varshney, Gajendra Ps Raghava (2008)  Identification of proteins secreted by malaria parasite into erythrocyte using SVM and PSSM profiles.   BMC Bioinformatics 9: 04  
Abstract: Malaria parasite secretes various proteins in infected RBC for its growth and survival. Thus identification of these secretory proteins is important for developing vaccine/drug against malaria. The existing motif-based methods have got limited success due to lack of universal motif in all secretory proteins of malaria parasite.
Notes:
Deepti Sethi, Aarti Garg, G P S Raghava (2008)  DPROT: prediction of disordered proteins using evolutionary information.   Amino Acids 35: 3. 599-605 Oct  
Abstract: The association of structurally disordered proteins with a number of diseases has engendered enormous interest and therefore demands a prediction method that would facilitate their expeditious study at molecular level. The present study describes the development of a computational method for predicting disordered proteins using sequence and profile compositions as input features for the training of SVM models. First, we developed the amino acid and dipeptide compositions based SVM modules which yielded sensitivities of 75.6 and 73.2% along with Matthew's Correlation Coefficient (MCC) values of 0.75 and 0.60, respectively. In addition, the use of predicted secondary structure content (coil, sheet and helices) in the form of composition values attained a sensitivity of 76.8% and MCC value of 0.77. Finally, the training of SVM models using evolutionary information hidden in the multiple sequence alignment profile improved the prediction performance by achieving a sensitivity value of 78% and MCC of 0.78. Furthermore, when evaluated on an independent dataset of partially disordered proteins, the same SVM module provided a correct prediction rate of 86.6%. Based on the above study, a web server ("DPROT") was developed for the prediction of disordered proteins, which is available at http://www.imtech.res.in/raghava/dprot/.
Notes:
Sneh Lata, Gajendra P S Raghava (2008)  PRRDB: a comprehensive database of pattern-recognition receptors and their ligands.   BMC Genomics 9: 04  
Abstract: Recently in a number of studies, it has been demonstrated that the innate immune system doesn't merely acts as the first line of defense but provides critical signals for the development of specific adaptive immune response. Innate immune system employs a set of receptors called pattern recognition receptors (PRRs) that recognize evolutionarily conserved patterns from pathogens called pathogen associated molecular patterns (PAMPs). In order to assist scientific community, a database PRRDB has been developed that provides extensive information about pattern recognition receptors and their ligands.
Notes:
Mridul K Kalita, Umesh K Nandal, Ansuman Pattnaik, Anandhan Sivalingam, Gowthaman Ramasamy, Manish Kumar, Gajendra P S Raghava, Dinesh Gupta (2008)  CyclinPred: a SVM-based method for predicting cyclin protein sequences.   PLoS One 3: 7. 07  
Abstract: Functional annotation of protein sequences with low similarity to well characterized protein sequences is a major challenge of computational biology in the post genomic era. The cyclin protein family is once such important family of proteins which consists of sequences with low sequence similarity making discovery of novel cyclins and establishing orthologous relationships amongst the cyclins, a difficult task. The currently identified cyclin motifs and cyclin associated domains do not represent all of the identified and characterized cyclin sequences. We describe a Support Vector Machine (SVM) based classifier, CyclinPred, which can predict cyclin sequences with high efficiency. The SVM classifier was trained with features of selected cyclin and non cyclin protein sequences. The training features of the protein sequences include amino acid composition, dipeptide composition, secondary structure composition and PSI-BLAST generated Position Specific Scoring Matrix (PSSM) profiles. Results obtained from Leave-One-Out cross validation or jackknife test, self consistency and holdout tests prove that the SVM classifier trained with features of PSSM profile was more accurate than the classifiers based on either of the other features alone or hybrids of these features. A cyclin prediction server--CyclinPred has been setup based on SVM model trained with PSSM profiles. CyclinPred prediction results prove that the method may be used as a cyclin prediction tool, complementing conventional cyclin prediction methods.
Notes:
Manish Kumar, Varun Thakur, Gajendra P S Raghava (2008)  COPid: composition based protein identification.   In Silico Biol 8: 2. 121-128  
Abstract: In the past, a large number of methods have been developed for predicting various characteristics of a protein from its composition. In order to exploit the full potential of protein composition, we developed the web-server COPid to assist the researchers in annotating the function of a protein from its composition using whole or part of the protein. COPid has three modules called search, composition and analysis. The search module allows searching of protein sequences in six different databases. Search results list database proteins in ascending order of Euclidian distance or descending order of compositional similarity with the query sequence. The composition module allows calculation of the composition of a sequence and average composition of a group of sequences. The composition module also allows computing composition of various types of amino acids (e.g. charge, polar, hydrophobic residues). The analysis module provides the following options; i) comparing composition of two classes of proteins, ii) creating a phylogenetic tree based on the composition and iii) generating input patterns for machine learning techniques. We have evaluated the performance of composition-based (or alignment-free) similarity search in the subcellular localization of proteins. It was found that the alignment free method performs reasonably well in predicting certain classes of proteins. The COPid web-server is available at http://www.imtech.res.in/raghava/copid/.
Notes:
Aarti Garg, Gajendra P S Raghava (2008)  ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins.   BMC Bioinformatics 9: 11  
Abstract: The expansion of raw protein sequence databases in the post genomic era and availability of fresh annotated sequences for major localizations particularly motivated us to introduce a new improved version of our previously forged eukaryotic subcellular localizations prediction method namely "ESLpred". Since, subcellular localization of a protein offers essential clues about its functioning, hence, availability of localization predictor would definitely aid and expedite the protein deciphering studies. However, robustness of a predictor is highly dependent on the superiority of dataset and extracted protein attributes; hence, it becomes imperative to improve the performance of presently available method using latest dataset and crucial input features.
Notes:
Aarti Garg, Gajendra P S Raghava (2008)  A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.   In Silico Biol 8: 2. 129-140  
Abstract: Most of the prediction methods for secretory proteins require the presence of a correct N-terminal end of the preprotein for correct classification. As large scale genome sequencing projects sometimes assign the 5'-end of genes incorrectly, many proteins are encoded without the correct N-terminus leading to incorrect prediction. In this study, a systematic attempt has been made to predict secretory proteins irrespective of presence or absence of N-terminal signal peptides (also known as classical and non-classical secreted proteins respectively), using machine-learning techniques; artificial neural network (ANN) and support vector machine (SVM). We trained and tested our methods on a dataset of 3321 secretory and 3654 non-secretory mammalian proteins using five-fold cross-validation technique. First, ANN-based modules have been developed for predicting secretory proteins using 33 physico-chemical properties, amino acid composition and dipeptide composition and achieved accuracies of 73.1%, 76.1% and 77.1%, respectively. Similarly, SVM-based modules using 33 physico-chemical properties, amino acid, and dipeptide composition have been able to achieve accuracies of 77.4%, 79.4% and 79.9%, respectively. In addition, BLAST and PSI-BLAST modules designed for predicting secretory proteins based on similarity search achieved 23.4% and 26.9% accuracy, respectively. Finally, we developed a hybrid-approach by integrating amino acid and dipeptide composition based SVM modules and PSI-BLAST module that increased the accuracy to 83.2%, which is significantly better than individual modules. We also achieved high sensitivity of 60.4% with low value of 5% false positive predictions using hybrid module. A web server SRTpred has been developed based on above study for predicting classical and non-classical secreted proteins from whole sequence of mammalian proteins, which is available from http://www.imtech.res.in/raghava/srtpred/.
Notes:
2007
Nitish Kumar Mishra, Manish Kumar, G P S Raghava (2007)  Support vector machine based prediction of glutathione S-transferase proteins.   Protein Pept Lett 14: 6. 575-580  
Abstract: Glutathione S-transferase (GST) proteins play vital role in living organism that includes detoxification of exogenous and endogenous chemicals, survivability during stress condition. This paper describes a method developed for predicting GST proteins. We have used a dataset of 107 GST and 107 non-GST proteins for training and the performance of the method was evaluated with five-fold cross-validation technique. First a SVM based method has been developed using amino acid and dipeptide composition and achieved the maximum accuracy of 91.59% and 95.79% respectively. In addition we developed a SVM based method using tripeptide composition and achieved maximum accuracy 97.66% which is better than accuracy achieved by HMM based searching (96.26%). Based on above study a web-server GSTPred has been developed (http://www.imtech.res.in/raghava/gstpred/).
Notes:
Jason A Greenbaum, Pernille Haste Andersen, Martin Blythe, Huynh-Hoa Bui, Raul E Cachau, James Crowe, Matthew Davies, A S Kolaskar, Ole Lund, Sherrie Morrison, Brendan Mumey, Yanay Ofran, Jean-Luc Pellequer, Clemencia Pinilla, Julia V Ponomarenko, G P S Raghava, Marc H V van Regenmortel, Erwin L Roggen, Alessandro Sette, Avner Schlessinger, Johannes Sollner, Martin Zand, Bjoern Peters (2007)  Towards a consensus on datasets and evaluation metrics for developing B-cell epitope prediction tools.   J Mol Recognit 20: 2. 75-82 Mar/Apr  
Abstract: A B-cell epitope is the three-dimensional structure within an antigen that can be bound to the variable region of an antibody. The prediction of B-cell epitopes is highly desirable for various immunological applications, but has presented a set of unique challenges to the bioinformatics and immunology communities. Improving the accuracy of B-cell epitope prediction methods depends on a community consensus on the data and metrics utilized to develop and evaluate such tools. A workshop, sponsored by the National Institute of Allergy and Infectious Disease (NIAID), was recently held in Washington, DC to discuss the current state of the B-cell epitope prediction field. Many of the currently available tools were surveyed and a set of recommendations was devised to facilitate improvements in the currently existing tools and to expedite future tool development. An underlying theme of the recommendations put forth by the panel is increased collaboration among research groups. By developing common datasets, standardized data formats, and the means with which to consolidate information, we hope to greatly enhance the development of B-cell epitope prediction tools.
Notes:
Sneh Lata, B K Sharma, G P S Raghava (2007)  Analysis and prediction of antibacterial peptides.   BMC Bioinformatics 8: 07  
Abstract: Antibacterial peptides are important components of the innate immune system, used by the host to protect itself from different types of pathogenic bacteria. Over the last few decades, the search for new drugs and drug targets has prompted an interest in these antibacterial peptides. We analyzed 486 antibacterial peptides, obtained from antimicrobial peptide database APD, in order to understand the preference of amino acid residues at specific positions in these peptides.
Notes:
Manoj Bhasin, G P S Raghava (2007)  A hybrid approach for predicting promiscuous MHC class I restricted T cell epitopes.   J Biosci 32: 1. 31-42 Jan  
Abstract: In the present study, a systematic attempt has been made to develop an accurate method for predicting MHC class I restricted T cell epitopes for a large number of MHC class I alleles. Initially, a quantitative matrix (QM)-based method was developed for 47 MHC class I alleles having at least 15 binders. A secondary artificial neural network (ANN)-based method was developed for 30 out of 47 MHC alleles having a minimum of 40 binders. Combination of these ANN-and QM-based prediction methods for 30 alleles improved the accuracy of prediction by 6% compared to each individual method. Average accuracy of hybrid method for 30 MHC alleles is 92.8%. This method also allows prediction of binders for 20 additional alleles using QM that has been reported in the literature, thus allowing prediction for 67 MHC class I alleles. The performance of the method was evaluated using jack-knife validation test. The performance of the methods was also evaluated on blind or independent data. Comparison of our method with existing MHC binder prediction methods for alleles studied by both methods shows that our method is superior to other existing methods. This method also identifies proteasomal cleavage sites in antigen sequences by implementing the matrices described earlier. Thus, the method that we discover allows the identification of MHC class I binders (peptides binding with many MHC alleles) having proteasomal cleavage site at C-terminus. The user-friendly result display format (HTML-II) can assist in locating the promiscuous MHC binding regions from antigen sequence. The method is available on the web at www.imtech.res.in/raghava/nhlapred and its mirror site is available at http://bioinformatics.uams.edu/mirror/nhlapred/.
Notes:
Sudipto Saha, Gajendra P S Raghava (2007)  Predicting virulence factors of immunological interest.   Methods Mol Biol 409: 407-415  
Abstract: In this chapter, three prediction servers used for predicting virulence factors, bacterial toxins, and neurotoxins have been described. VICMpred server predicts the functional proteins of gram-negative bacteria that include virulence factors, information molecule, cellular process, and metabolism molecule. BTXpred server allows users to predict bacterial toxins, its release, and further classification of exotoxins. NTXpred server allows prediction of neurotoxins and further classifying them based on their function and source.
Notes:
Shilpy Srivastava, Mahender Kumar Singh, Gajendra P S Raghava, Grish C Varshney (2007)  Searching haptens, carrier proteins, and anti-hapten antibodies.   Methods Mol Biol 409: 125-139  
Abstract: Haptens are small molecules that are usually nonimmunogenic unless coupled to some carrier proteins. The generation of anti-hapten antibodies is important for the development of immunodiagnostics and therapeutics. Recently, our group has developed a database called HaptenDB, which provides comprehensive information about 1,087 haptens. In this chapter, we describe following web tools integrated in HaptenDB: (i) keyword search facility allows search on major fields, (ii) browsing service, to display all haptens, carrier proteins and antibodies, and (iii) structure similarity search, which allows the users to search their structure against hapten structures.
Notes:
Sudipto Saha, Gajendra P S Raghava (2007)  Searching and mapping of B-cell epitopes in Bcipep database.   Methods Mol Biol 409: 113-124  
Abstract: One of the major challenges in the field of subunit vaccine design is to identify the antigenic regions in an antigen, which can activate B cell. These antigenic regions are called B-cell epitopes. In this chapter, we describe how to use Bcipep, which is a database of experimentally determined linear B-cell epitopes of varying immunogenicity collected from literature and other publicly available databases. The current version of Bcipep database contains 3,031 entries that include 763 immunodominant, 1,797 immunogenic, and 471 null-immunogenic epitopes. The database provides a set of tools for analysis and extraction of data that includes keyword search, peptide mapping, and BLAST search. The database is available at http://www.imtech.res.in/raghava/bcipep/.
Notes:
Sneh Lata, Manoj Bhasin, Gajendra P S Raghava (2007)  Application of machine learning techniques in predicting MHC binders.   Methods Mol Biol 409: 201-215  
Abstract: The machine learning techniques are playing a vital role in the field of immunoinformatics. In the past, a number of methods have been developed for predicting major histocompatibility complex (MHC)-binding peptides using machine learning techniques. These methods allow predicting MHC-binding peptides with high accuracy. In this chapter, we describe two machine learning technique-based methods, nHLAPred and MHC2Pred, developed for predicting MHC binders for class I and class II alleles, respectively. nHLAPred is a web server developed for predicting binders for 67 MHC class I alleles. This sever has two methods: ANNPred and ComPred. ComPred allows predicting binders for 67 MHC class I alleles, using the combined method [artificial neural network (ANN) and quantitative matrix] for 30 alleles and quantitative matrix-based method for 37 alleles. ANNPred allows prediction of binders for only 30 alleles purely based on the ANN. MHC2Pred is a support vector machine (SVM)-based method for prediction of promiscuous binders for 42 MHC class II alleles.
Notes:
Manoj Bhasin, Sneh Lata, G P S Raghava (2007)  TAPPred prediction of TAP-binding peptides in antigens.   Methods Mol Biol 409: 381-386  
Abstract: The transporter associated with antigen processing (TAP) plays a crucial role in the transport of the peptide fragments of the proteolysed antigenic or self-altered proteins to the endoplasmic reticulum where the association between these peptides and the major histocompatibility complex (MHC) class I molecules takes place. Therefore, prediction of TAP-binding peptides is highly helpful in identifying the MHC class I-restricted T-cell epitopes and hence in the subunit vaccine designing. In this chapter, we describe a support vector machine (SVM)-based method TAPPred that allows users to predict TAP-binding affinity of peptides over web. The server allows user to predict TAP binders using a simple SVM model or cascade SVM model. The server also allows user to customize the display/output. It is freely available for academicians and noncommercial organization at the address http://www.imtech.res.in/raghava/tappred.
Notes:
Sudipto Saha, Gajendra P S Raghava (2007)  Prediction methods for B-cell epitopes.   Methods Mol Biol 409: 387-394  
Abstract: In this chapter, two prediction servers of linear B-cell epiotpes have been described; (i) BcePred, based on physico-chemical properties that include hydrophilicity, flexibility/mobility, accessibility, polarity, exposed surface, turns, and antigenicity and ii) ABCpred, based on recurrent neural network. Both of the servers assist in locating linear epitope regions in a protein.
Notes:
Manoj Bhasin, Sneh Lata, Gajendra P S Raghava (2007)  Searching and mapping of T-cell epitopes, MHC binders, and TAP binders.   Methods Mol Biol 409: 95-112  
Abstract: This chapter describes searching and mapping tools of MHCBN database, which is a curated database. It comprises over 23,000 peptide sequences, whose binding affinity with major histocompatibility complex (MHC) or transporter associated with antigen processing (TAP) molecules has been assayed experimentally. Each entry of the database provides full information (such as sequence, its MHC- or TAP-binding specificity, and source protein) about peptide whose binding affinity (IC50) and T-cell activity is experimentally determined. MHCBN has number of web-based tools for analyzing and retrieving information. In this chapter, we describe how to use web tools integrated in MHCBN that include (i) mapping of experimentally determined antigenic regions on the query sequence, (ii) creation of allele-specific peptide data set, and (iii) BLAST search against MHC or antigen databases.
Notes:
Sudipto Saha, Gajendra P S Raghava (2007)  BTXpred: prediction of bacterial toxins.   In Silico Biol 7: 4-5. 405-412  
Abstract: This paper describes a method developed for predicting bacterial toxins from their amino acid sequences. All the modules, developed in this study, were trained and tested on a non-redundant dataset of 150 bacterial toxins that included 77 exotoxins and 73 endotoxins. Firstly, support vector machines (SVM) based modules were developed for predicting the bacterial toxins using amino acids and dipeptides composition and achieved an accuracy of 96.07% and 92.50%, respectively. Secondly, SVM based modules were developed for discriminating entotoxins and exotoxins, using amino acids and dipeptides composition and achieved an accuracy of 95.71% and 92.86%, respectively. In addition, modules have been developed for classifying the exotoxins (e.g. activate adenylate cyclase, activate guanylate cyclase, neurotoxins) using hidden Markov models (HMM), PSI-BLAST and a combination of the two and achieved overall accuracy of 95.75%, 97.87% and 100%, respectively. Based on the above study, a web server called 'BTXpred' has been developed, which is available at http://www.imtech.res.in/raghava/btxpred/. Supplementary information is available at http://www.imtech.res.in/raghava/btxpred/supplementary.html.
Notes:
Manish Kumar, Michael M Gromiha, Gajendra P S Raghava (2007)  Identification of DNA-binding proteins using support vector machines and evolutionary profiles.   BMC Bioinformatics 8: 11  
Abstract: Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation, as these proteins play a crucial role in gene-regulation. In this paper, we developed various SVM modules for predicting DNA-binding domains and proteins. All models were trained and tested on multiple datasets of non-redundant proteins.
Notes:
Harpreet Kaur, Aarti Garg, G P S Raghava (2007)  PEPstr: a de novo method for tertiary structure prediction of small bioactive peptides.   Protein Pept Lett 14: 7. 626-631  
Abstract: Among secondary structure elements, beta-turns are ubiquitous and major feature of bioactive peptides. We analyzed 77 biologically active peptides with length varying from 9 to 20 residues. Out of 77 peptides, 58 peptides were found to contain at least one beta-turn. Further, at the residue level, 34.9% of total peptide residues were found to be in beta-turns, higher than the number of helical (32.3%) and beta-sheet residues (6.9%). So, we utilized the predicted beta-turns information to develop an improved method for predicting the three-dimensional (3D) structure of small peptides. In principle, we built four different structural models for each peptide. The first 'model I' was built by assigning all the peptide residues an extended conformation (phi = Psi = 180 degrees ). Second 'model II' was built using the information of regular secondary structures (helices, beta-strands and coil) predicted from PSIPRED. In third 'model III', secondary structure information including beta-turn types predicted from BetaTurns method was used. The fourth 'model IV' had main-chain phi, Psi angles of model III and side chain angles assigned using standard Dunbrack backbone dependent rotamer library. These models were further refined using AMBER package and the resultant C(alpha) rmsd values were calculated. It was found that adding the beta-turns to the regular secondary structures greatly reduces the rmsd values both before and after the energy minimization. Hence, the results indicate that regular and irregular secondary structures, particularly beta-turns information can provide valuable and vital information in the tertiary structure prediction of small bioactive peptides. Based on the above study, a web server PEPstr (http://www.imtech.res.in/raghava/pepstr/) was developed for predicting the tertiary structure of small bioactive peptides.
Notes:
Anastas Pashov, Behjatolah Monzavi-Karbassi, Gajendra Raghava, Thomas Kieber-Emmons (2007)  Peptide mimotopes as prototypic templates of broad-spectrum surrogates of carbohydrate antigens for cancer vaccination.   Crit Rev Immunol 27: 3. 247-270  
Abstract: Mechanisms of broad cross-protection, as seen in viral infection and also applied to vaccines, emphasize preexisting antibodies, CD8+ memory T cells, and accelerated B-cell responses reactive with conserved regions in antigens. Another practical application to induce broad-spectrum responses is making use of multispecific antigen recognition by antibodies and T cells. Antibody polyreactivity can be related to the capacity of the antigen-combining site to accommodate a number of different small epitopes or to attain different conformations. A better understanding of the functionality of molecular interactions with graded specificity might help the design of polyreactive immunogens inducing antibody responses to a predefined set of target antigens. We have found this approach useful in targeting tumor-associated carbohydrate antigens in cancer vaccine development. Using combinatorial libraries and pharmacophore design principles, carbohydrate mimetic peptides were created that not only induce antibodies with multiple specificities, but also cellular responses that inhibit tumor growth in vivo.
Notes:
S Muthukrishnan, Aarti Garg, G P S Raghava (2007)  Oxypred: prediction and classification of oxygen-binding proteins.   Genomics Proteomics Bioinformatics 5: 3-4. 250-252 Dec  
Abstract: This study describes a method for predicting and classifying oxygen-binding proteins. Firstly, support vector machine (SVM) modules were developed using amino acid composition and dipeptide composition for predicting oxygen-binding proteins, and achieved maximum accuracy of 85.5% and 87.8%, respectively. Secondly, an SVM module was developed based on amino acid composition, classifying the predicted oxygen-binding proteins into six classes with accuracy of 95.8%, 97.5%, 97.5%, 96.9%, 99.4%, and 96.0% for erythrocruorin, hemerythrin, hemocyanin, hemoglobin, leghemoglobin, and myoglobin proteins, respectively. Finally, an SVM module was developed using dipeptide composition for classifying the oxygen-binding proteins, and achieved maximum accuracy of 96.1%, 98.7%, 98.7%, 85.6%, 99.6%, and 93.3% for the above six classes, respectively. All modules were trained and tested by five-fold cross validation. Based on the above approach, a web server Oxypred was developed for predicting and classifying oxygen-binding proteins (available from http://www.imtech.res.in/raghava/oxypred/).
Notes:
Sudipto Saha, Gajendra P S Raghava (2007)  Prediction of neurotoxins based on their function and source.   In Silico Biol 7: 4-5. 369-387  
Abstract: We have developed a method NTXpred for predicting neurotoxins and classifying them based on their function and origin. The dataset used in this study consists of 582 non-redundant, experimentally annotated neurotoxins obtained from Swiss-Prot. A number of modules have been developed for predicting neurotoxins using residue composition based on feed-forwarded neural network (FNN), recurrent neural network (RNN), support vector machine (SVM) and achieved maximum accuracy of 84.19%, 92.75%, 97.72% respectively. In addition, SVM modules have been developed for classifying neurotoxins based on their source (e.g., eubacteria, cnidarians, molluscs, arthropods have been and chordate) using amino acid composition and dipeptide composition and achieved maximum overall accuracy of 78.94% and 88.07% respectively. The overall accuracy increased to 92.10%, when the evolutionary information obtained from PSI-BLAST was combined with SVM module of source classification. We have also developed SVM modules for classifying neurotoxins based on functions using amino acid, dipeptide composition and achieved overall accuracy of 83.11%, 91.10% respectively. The overall accuracy of function classification improved to 95.11%, when PSI-BLAST output was combined with SVM module. All the modules developed in this study were evaluated using five-fold cross-validation technique. The NTXpred is available at www.imtech.res.in/raghava/ntxpred/ and mirror site at http://bioinformatics.uams.edu/mirror/ntxpred.
Notes:
Mamoon Rashid, Sudipto Saha, Gajendra Ps Raghava (2007)  Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs.   BMC Bioinformatics 8: 09  
Abstract: In past number of methods have been developed for predicting subcellular location of eukaryotic, prokaryotic (Gram-negative and Gram-positive bacteria) and human proteins but no method has been developed for mycobacterial proteins which may represent repertoire of potent immunogens of this dreaded pathogen. In this study, attempt has been made to develop method for predicting subcellular location of mycobacterial proteins.
Notes:
2006
G P S Raghava, Geoffrey J Barton (2006)  Quantification of the variation in percentage identity for protein sequence alignments.   BMC Bioinformatics 7: 09  
Abstract: Percentage Identity (PID) is frequently quoted in discussion of sequence alignments since it appears simple and easy to understand. However, although there are several different ways to calculate percentage identity and each may yield a different result for the same alignment, the method of calculation is rarely reported. Accordingly, quantification of the variation in PID caused by the different calculations would help in interpreting PID values in the literature. In this study, the variation in PID was quantified systematically on a reference set of 1028 alignments generated by comparison of the protein three-dimensional structures. Since the alignment algorithm may also affect the range of PID, this study also considered the effect of algorithm, and the combination of algorithm and PID method.
Notes:
Sudipto Saha, G P S Raghava (2006)  VICMpred: an SVM-based method for the prediction of functional proteins of Gram-negative bacteria using amino acid patterns and composition.   Genomics Proteomics Bioinformatics 4: 1. 42-47 Feb  
Abstract: In this study, an attempt has been made to predict the major functions of gram-negative bacterial proteins from their amino acid sequences. The dataset used for training and testing consists of 670 non-redundant gram-negative bacterial proteins (255 of cellular process, 60 of information molecules, 285 of metabolism, and 70 of virulence factors). First we developed an SVM-based method using amino acid and dipeptide composition and achieved the overall accuracy of 52.39% and 47.01%, respectively. We introduced a new concept for the classification of proteins based on tetrapeptides, in which we identified the unique tetrapeptides significantly found in a class of proteins. These tetrapeptides were used as the input feature for predicting the function of a protein and achieved the overall accuracy of 68.66%. We also developed a hybrid method in which the tetrapeptide information was used with amino acid composition and achieved the overall accuracy of 70.75%. A five-fold cross validation was used to evaluate the performance of these methods. The web server VICMpred has been developed for predicting the function of gram-negative bacterial proteins (http://www.imtech.res.in/raghava/vicmpred/).
Notes:
Sudipto Saha, Jyoti Zack, Balvinder Singh, G P S Raghava (2006)  VGIchan: prediction and classification of voltage-gated ion channels.   Genomics Proteomics Bioinformatics 4: 4. 253-258 Nov  
Abstract: This study describes methods for predicting and classifying voltage-gated ion channels. Firstly, a standard support vector machine (SVM) method was developed for predicting ion channels by using amino acid composition and dipeptide composition, with an accuracy of 82.89% and 85.56%, respectively. The accuracy of this SVM method was improved from 85.56% to 89.11% when combined with PSI-BLAST similarity search. Then we developed an SVM method for classifying ion channels (potassium, sodium, calcium, and chloride) by using dipeptide composition and achieved an overall accuracy of 96.89%. We further achieved a classification accuracy of 97.78% by using a hybrid method that combines dipeptide-based SVM and hidden Markov model methods. A web server VGIchan has been developed for predicting and classifying voltage-gated ion channels using the above approaches.
Notes:
Harpreet Kaur, Gajendra Pal Singh Raghava (2006)  Prediction of C alpha-H...O and C alpha-H...pi interactions in proteins using recurrent neural network.   In Silico Biol 6: 1-2. 111-125  
Abstract: In this study, an attempt has been made to develop a method for predicting weak hydrogen bonding interactions, namely, C alpha-H...O and C alpha-H...pi interactions in proteins using artificial neural network. Both standard feed-forward neural network (FNN) and recurrent neural networks (RNN) have been trained and tested using five-fold cross-validation on a non-homologous dataset of 2298 protein chains where no pair of sequences has more than 25% sequence identity. It has been found that the prediction accuracy varies with the separation distance between donor and acceptor residues. The maximum sensitivity achieved with RNN for C alpha-H...O is 51.2% when donor and acceptor residues are four residues apart (i.e. at delta D-A = 4) and for C alpha-H...pi is 82.1% at delta D-A = 3. The performance of RNN is increased by 1-3% for both types of interactions when PSIPRED predicted protein secondary structure is used. Overall, RNN performs better than feed-forward networks at all separation distances between donor-acceptor pair for both types of interactions. Based on the observations, a web server CHpredict (available at http://www.imtech.res.in/raghava/chpredict/) has been developed for predicting donor and acceptor residues in C alpha-H...O and C alpha-H...pi interactions in proteins.
Notes:
Sudipto Saha, G P S Raghava (2006)  AlgPred: prediction of allergenic proteins and mapping of IgE epitopes.   Nucleic Acids Res 34: Web Server issue. W202-W209 Jul  
Abstract: In this study a systematic attempt has been made to integrate various approaches in order to predict allergenic proteins with high accuracy. The dataset used for testing and training consists of 578 allergens and 700 non-allergens obtained from A. K. Bjorklund, D. Soeria-Atmadja, A. Zorzet, U. Hammerling and M. G. Gustafsson (2005) Bioinformatics, 21, 39-50. First, we developed methods based on support vector machine using amino acid and dipeptide composition and achieved an accuracy of 85.02 and 84.00%, respectively. Second, a motif-based method has been developed using MEME/MAST software that achieved sensitivity of 93.94 with 33.34% specificity. Third, a database of known IgE epitopes was searched and this predicted allergenic proteins with 17.47% sensitivity at specificity of 98.14%. Fourth, we predicted allergenic proteins by performing BLAST search against allergen representative peptides. Finally hybrid approaches have been developed, which combine two or more than two approaches. The performance of all these algorithms has been evaluated on an independent dataset of 323 allergens and on 101 725 non-allergens obtained from Swiss-Prot. A web server AlgPred has been developed for the predicting allergenic proteins and for mapping IgE epitopes on allergenic proteins (http://www.imtech.res.in/raghava/algpred/). AlgPred is available at www.imtech.res.in/raghava/algpred/.
Notes:
Rakesh Kaundal, Amar S Kapoor, Gajendra P S Raghava (2006)  Machine learning techniques in disease forecasting: a case study on rice blast prediction.   BMC Bioinformatics 7: 11  
Abstract: Diverse modeling approaches viz. neural networks and multiple regression have been followed to date for disease prediction in plant populations. However, due to their inability to predict value of unknown data points and longer training times, there is need for exploiting new prediction softwares for better understanding of plant-pathogen-environment relationships. Further, there is no online tool available which can help the plant researchers or farmers in timely application of control measures. This paper introduces a new prediction approach based on support vector machines for developing weather-based prediction models of plant diseases.
Notes:
Sudipto Saha, G P S Raghava (2006)  Prediction of continuous B-cell epitopes in an antigen using recurrent neural network.   Proteins 65: 1. 40-48 Oct  
Abstract: B-cell epitopes play a vital role in the development of peptide vaccines, in diagnosis of diseases, and also for allergy research. Experimental methods used for characterizing epitopes are time consuming and demand large resources. The availability of epitope prediction method(s) can rapidly aid experimenters in simplifying this problem. The standard feed-forward (FNN) and recurrent neural network (RNN) have been used in this study for predicting B-cell epitopes in an antigenic sequence. The networks have been trained and tested on a clean data set, which consists of 700 non-redundant B-cell epitopes obtained from Bcipep database and equal number of non-epitopes obtained randomly from Swiss-Prot database. The networks have been trained and tested at different input window length and hidden units. Maximum accuracy has been obtained using recurrent neural network (Jordan network) with a single hidden layer of 35 hidden units for window length of 16. The final network yields an overall prediction accuracy of 65.93% when tested by fivefold cross-validation. The corresponding sensitivity, specificity, and positive prediction values are 67.14, 64.71, and 65.61%, respectively. It has been observed that RNN (JE) was more successful than FNN in the prediction of B-cell epitopes. The length of the peptide is also important in the prediction of B-cell epitopes from antigenic sequences. The webserver ABCpred is freely available at www.imtech.res.in/raghava/abcpred/.
Notes:
Manish Kumar, Ruchi Verma, Gajendra P S Raghava (2006)  Prediction of mitochondrial proteins using support vector machine and hidden Markov model.   J Biol Chem 281: 9. 5357-5363 Mar  
Abstract: Mitochondria are considered as one of the core organelles of eukaryotic cells hence prediction of mitochondrial proteins is one of the major challenges in the field of genome annotation. This study describes a method, MitPred, developed for predicting mitochondrial proteins with high accuracy. The data set used in this study was obtained from Guda, C., Fahy, E. & Subramaniam, S. (2004) Bioinformatics 20, 1785-1794. First support vector machine-based modules/methods were developed using amino acid and dipeptide composition of proteins and achieved accuracy of 78.37 and 79.38%, respectively. The accuracy of prediction further improved to 83.74% when split amino acid composition (25 N-terminal, 25 C-terminal, and remaining residues) of proteins was used. Then BLAST search and support vector machine-based method were combined to get 88.22% accuracy. Finally we developed a hybrid approach that combined hidden Markov model profiles of domains (exclusively found in mitochondrial proteins) and the support vector machine-based method. We were able to predict mitochondrial protein with 100% specificity at a 56.36% sensitivity rate and with 80.50% specificity at 98.95% sensitivity. The method estimated 9.01, 6.35, 4.84, 3.95, and 4.25% of proteins as mitochondrial in Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, mouse, and human proteomes, respectively. MitPred was developed on the above hybrid approach.
Notes:
Mahender Kumar Singh, Shilpy Srivastava, G P S Raghava, Grish C Varshney (2006)  HaptenDB: a comprehensive database of haptens, carrier proteins and anti-hapten antibodies.   Bioinformatics 22: 2. 253-255 Jan  
Abstract: The key requirement for successful immunochemical assay is the availability of antibodies with high specificity and desired affinity. Small molecules, when used as haptens, are not immunogenic. However, on conjugating with carrier molecule they elicit antibody response. The production of anti-hapten antibodies of desired specificity largely depends on the hapten design (preserving greatly the chemical structure and spatial conformation of target compound), selection of the appropriate carrier protein and the conjugation method. This manuscript describes a curated database HaptenDB, where information is collected from published literature and web resources. The current version of the database has 2021 entries for 1087 haptens and 25 carrier proteins, where each entry provides comprehensive details about (1) nature of the hapten, (2) 2D and 3D structures of haptens, (3) carrier proteins, (4) coupling method, (5) method of anti-hapten antibody production, (6) assay method (used for characterization) and (7) specificities of antibodies. The current version of HaptenDB covers a wide array of haptens including pesticides, herbicides, insecticides, drugs, vitamins, steroids, hormones, toxins, dyes, explosives, etc. It provides internal and external links to various databases/resources to obtain further information about the nature of haptens, carriers and respective antibodies. For structure similarity comparison of haptens, the database also integrates tools like JME Editor and JMOL for sketching, displaying and manipulating hapten 2D/3D structures online. So the database would be of great help in identifying functional group(s) in smaller molecules using antibodies as well as for the development of immunodiagnostics/therapeutics by providing data and procedures available so far for the generation of specific or cross-reactive antibodies. AVAILABILITY: HaptenDB is available on http://www.imtech.res.in/raghava/haptendb/ and http://bioinformatics.uams.edu/raghava/haptendb/ (Mirror site).
Notes:
2005
Sudipto Saha, Manoj Bhasin, Gajendra P S Raghava (2005)  Bcipep: a database of B-cell epitopes.   BMC Genomics 6: 05  
Abstract: Bcipep is a database of experimentally determined linear B-cell epitopes of varying immunogenicity collected from literature and other publicly available databases.
Notes:
Manoj Bhasin, G P S Raghava (2005)  GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors.   Nucleic Acids Res 33: Web Server issue. W143-W147 Jul  
Abstract: The receptors of amine subfamily are specifically major drug targets for therapy of nervous disorders and psychiatric diseases. The recognition of novel amine type of receptors and their cognate ligands is of paramount interest for pharmaceutical companies. In the past, Chou and co-workers have shown that different types of amine receptors are correlated with their amino acid composition and are predictable on its basis with considerable accuracy [Elrod and Chou (2002) Protein Eng., 15, 713-715]. This motivated us to develop a better method for the recognition of novel amine receptors and for their further classification. The method was developed on the basis of amino acid composition and dipeptide composition of proteins using support vector machine. The method was trained and tested on 167 proteins of amine subfamily of G-protein-coupled receptors (GPCRs). The method discriminated amine subfamily of GPCRs from globular proteins with Matthew's correlation coefficient of 0.98 and 0.99 using amino acid composition and dipeptide composition, respectively. In classifying different types of amine receptors using amino acid composition and dipeptide composition, the method achieved an accuracy of 89.8 and 96.4%, respectively. The performance of the method was evaluated using 5-fold cross-validation. The dipeptide composition based method predicted 67.6% of protein sequences with an accuracy of 100% with a reliability index > or =5. A web server GPCRsclass has been developed for predicting amine-binding receptors from its amino acid sequence [http://www.imtech.res.in/raghava/gpcrsclass/ and http://bioinformatics.uams.edu/raghava/gpersclass/ (mirror site)].
Notes:
Manish Kumar, Manoj Bhasin, Navjot K Natt, G P S Raghava (2005)  BhairPred: prediction of beta-hairpins in a protein from multiple alignment information using ANN and SVM techniques.   Nucleic Acids Res 33: Web Server issue. W154-W159 Jul  
Abstract: This paper describes a method for predicting a supersecondary structural motif, beta-hairpins, in a protein sequence. The method was trained and tested on a set of 5102 hairpins and 5131 non-hairpins, obtained from a non-redundant dataset of 2880 proteins using the DSSP and PROMOTIF programs. Two machine-learning techniques, an artificial neural network (ANN) and a support vector machine (SVM), were used to predict beta-hairpins. An accuracy of 65.5% was achieved using ANN when an amino acid sequence was used as the input. The accuracy improved from 65.5 to 69.1% when evolutionary information (PSI-BLAST profile), observed secondary structure and surface accessibility were used as the inputs. The accuracy of the method further improved from 69.1 to 79.2% when the SVM was used for classification instead of the ANN. The performances of the methods developed were assessed in a test case, where predicted secondary structure and surface accessibility were used instead of the observed structure. The highest accuracy achieved by the SVM based method in the test case was 77.9%. A maximum accuracy of 71.1% with Matthew's correlation coefficient of 0.41 in the test case was obtained on a dataset previously used by X. Cruz, E. G. Hutchinson, A. Shephard and J. M. Thornton (2002) Proc. Natl Acad. Sci. USA, 99, 11157-11162. The performance of the method was also evaluated on proteins used in the '6th community-wide experiment on the critical assessment of techniques for protein structure prediction (CASP6)'. Based on the algorithm described, a web server, BhairPred (http://www.imtech.res.in/raghava/bhairpred/), has been developed, which can be used to predict beta-hairpins in a protein using the SVM approach.
Notes:
Aarti Garg, Harpreet Kaur, G P S Raghava (2005)  Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure.   Proteins 61: 2. 318-324 Nov  
Abstract: The present study is an attempt to develop a neural network-based method for predicting the real value of solvent accessibility from the sequence using evolutionary information in the form of multiple sequence alignment. In this method, two feed-forward networks with a single hidden layer have been trained with standard back-propagation as a learning algorithm. The Pearson's correlation coefficient increases from 0.53 to 0.63, and mean absolute error decreases from 18.2 to 16% when multiple-sequence alignment obtained from PSI-BLAST is used as input instead of a single sequence. The performance of the method further improves from a correlation coefficient of 0.63 to 0.67 when secondary structure information predicted by PSIPRED is incorporated in the prediction. The final network yields a mean absolute error value of 15.2% between the experimental and predicted values, when tested on two different nonhomologous and nonredundant datasets of varying sizes. The method consists of two steps: (1) in the first step, a sequence-to-structure network is trained with the multiple alignment profiles in the form of PSI-BLAST-generated position-specific scoring matrices, and (2) in the second step, the output obtained from the first network and PSIPRED-predicted secondary structure information is used as an input to the second structure-to-structure network. Based on the present study, a server SARpred (http://www.imtech.res.in/raghava/sarpred/) has been developed that predicts the real value of solvent accessibility of residues for a given protein sequence. We have also evaluated the performance of SARpred on 47 proteins used in CASP6 and achieved a correlation coefficient of 0.68 and a MAE of 15.9% between predicted and observed values.
Notes:
Gajendra P S Raghava, Joon H Han (2005)  Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein.   BMC Bioinformatics 6: 03  
Abstract: A large number of papers have been published on analysis of microarray data with particular emphasis on normalization of data, detection of differentially expressed genes, clustering of genes and regulatory network. On other hand there are only few studies on relation between expression level and composition of nucleotide/protein sequence, using expression data. There is a need to understand why particular genes/proteins express more in particular conditions. In this study, we analyze 3468 genes of Saccharomyces cerevisiae obtained from Holstege et al., (1998) to understand the relationship between expression level and amino acid composition.
Notes:
Aarti Garg, Manoj Bhasin, Gajendra P S Raghava (2005)  Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search.   J Biol Chem 280: 15. 14427-14432 Apr  
Abstract: Here we report a systematic approach for predicting subcellular localization (cytoplasm, mitochondrial, nuclear, and plasma membrane) of human proteins. First, support vector machine (SVM)-based modules for predicting subcellular localization using traditional amino acid and dipeptide (i + 1) composition achieved overall accuracy of 76.6 and 77.8%, respectively. PSI-BLAST, when carried out using a similarity-based search against a nonredundant data base of experimentally annotated proteins, yielded 73.3% accuracy. To gain further insight, a hybrid module (hybrid1) was developed based on amino acid composition, dipeptide composition, and similarity information and attained better accuracy of 84.9%. In addition, SVM modules based on a different higher order dipeptide i.e. i + 2, i + 3, and i + 4 were also constructed for the prediction of subcellular localization of human proteins, and overall accuracy of 79.7, 77.5, and 77.1% was accomplished, respectively. Furthermore, another SVM module hybrid2 was developed using traditional dipeptide (i + 1) and higher order dipeptide (i + 2, i + 3, and i + 4) compositions, which gave an overall accuracy of 81.3%. We also developed SVM module hybrid3 based on amino acid composition, traditional and higher order dipeptide compositions, and PSI-BLAST output and achieved an overall accuracy of 84.4%. A Web server HSLPred (www.imtech.res.in/raghava/hslpred/ or bioinformatics.uams.edu/raghava/hslpred/) has been designed to predict subcellular localization of human proteins using the above approaches.
Notes:
Manoj Bhasin, G P S Raghava (2005)  Pcleavage: an SVM based method for prediction of constitutive proteasome and immunoproteasome cleavage sites in antigenic sequences.   Nucleic Acids Res 33: Web Server issue. W202-W207 Jul  
Abstract: This manuscript describes a support vector machine based method for the prediction of constitutive as well as immunoproteasome cleavage sites in antigenic sequences. This method achieved Matthew's correlation coefficents of 0.54 and 0.43 on in vitro and major histocompatibility complex ligand data, respectively. This shows that the performance of our method is comparable to that of the NetChop method, which is currently considered to be the best method for proteasome cleavage site prediction. Based on the method, a web server, Pcleavage, has also been developed. This server accepts protein sequences in any standard format and present results in a user-friendly format. The server is available for free use by all academic users at the URL http://www.imtech.res.in/raghava/pcleavage/ or http://bioinformatics.uams.edu/mirror/pcleavage/.
Notes:
Manoj Bhasin, Aarti Garg, G P S Raghava (2005)  PSLpred: prediction of subcellular localization of bacterial proteins.   Bioinformatics 21: 10. 2522-2524 May  
Abstract: SUMMARY: We developed a web server PSLpred for predicting subcellular localization of gram-negative bacterial proteins with an overall accuracy of 91.2%. PSLpred is a hybrid approach-based method that integrates PSI-BLAST and three SVM modules based on compositions of residues, dipeptides and physico-chemical properties. The prediction accuracies of 90.7, 86.8, 90.3, 95.2 and 90.6% were attained for cytoplasmic, extracellular, inner-membrane, outer-membrane and periplasmic proteins, respectively. Furthermore, PSLpred was able to predict approximately 74% of sequences with an average prediction accuracy of 98% at RI = 5. AVAILABILITY: PSLpred is available at http://www.imtech.res.in/raghava/pslpred/
Notes:
2004
Manoj Bhasin, G P S Raghava (2004)  SVM based method for predicting HLA-DRB1*0401 binding peptides in an antigen sequence.   Bioinformatics 20: 3. 421-423 Feb  
Abstract: Prediction of peptides binding with MHC class II allele HLA-DRB1(*)0401 can effectively reduce the number of experiments required for identifying helper T cell epitopes. This paper describes support vector machine (SVM) based method developed for identifying HLA-DRB1(*)0401 binding peptides in an antigenic sequence. SVM was trained and tested on large and clean data set consisting of 567 binders and equal number of non-binders. The accuracy of the method was 86% when evaluated through 5-fold cross-validation technique.
Notes:
Deepak Sharma, Biju Issac, G P S Raghava, R Ramaswamy (2004)  Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation.   Bioinformatics 20: 9. 1405-1412 Jun  
Abstract: Repetitive DNA sequences, besides having a variety of regulatory functions, are one of the principal causes of genomic instability. Understanding their origin and evolution is of fundamental importance for genome studies. The identification of repeats and their units helps in deducing the intra-genomic dynamics as an important feature of comparative genomics. A major difficulty in identification of repeats arises from the fact that the repeat units can be either exact or imperfect, in tandem or dispersed, and of unspecified length.
Notes:
Harpreet Kaur, G P S Raghava (2004)  A neural network method for prediction of beta-turn types in proteins using evolutionary information.   Bioinformatics 20: 16. 2751-2758 Nov  
Abstract: The prediction of beta-turns is an important element of protein secondary structure prediction. Recently, a highly accurate neural network based method Betatpred2 has been developed for predicting beta-turns in proteins using position-specific scoring matrices (PSSM) generated by PSI-BLAST and secondary structure information predicted by PSIPRED. However, the major limitation of Betatpred2 is that it predicts only beta-turn and non-beta-turn residues and does not provide any information of different beta-turn types. Thus, there is a need to predict beta-turn types using an approach based on multiple sequence alignment, which will be useful in overall tertiary structure prediction.
Notes:
Manoj Bhasin, G P S Raghava (2004)  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.   Nucleic Acids Res 32: Web Server issue. W414-W419 Jul  
Abstract: Automated prediction of subcellular localization of proteins is an important step in the functional annotation of genomes. The existing subcellular localization prediction methods are based on either amino acid composition or N-terminal characteristics of the proteins. In this paper, support vector machine (SVM) has been used to predict the subcellular location of eukaryotic proteins from their different features such as amino acid composition, dipeptide composition and physico-chemical properties. The SVM module based on dipeptide composition performed better than the SVM modules based on amino acid composition or physico-chemical properties. In addition, PSI-BLAST was also used to search the query sequence against the dataset of proteins (experimentally annotated proteins) to predict its subcellular location. In order to improve the prediction accuracy, we developed a hybrid module using all features of a protein, which consisted of an input vector of 458 dimensions (400 dipeptide compositions, 33 properties, 20 amino acid compositions of the protein and 5 from PSI-BLAST output). Using this hybrid approach, the prediction accuracies of nuclear, cytoplasmic, mitochondrial and extracellular proteins reached 95.3, 85.2, 68.2 and 88.9%, respectively. The overall prediction accuracy of SVM modules based on amino acid composition, physico-chemical properties, dipeptide composition and the hybrid approach was 78.1, 77.8, 82.9 and 88.0%, respectively. The accuracy of all the modules was evaluated using a 5-fold cross-validation technique. Assigning a reliability index (reliability index > or =3), 73.5% of prediction can be made with an accuracy of 96.4%. Based on the above approach, an online web server ESLpred was developed, which is available at http://www.imtech.res.in/raghava/eslpred/.
Notes:
Manoj Bhasin, G P S Raghava (2004)  Prediction of CTL epitopes using QM, SVM and ANN techniques.   Vaccine 22: 23-24. 3195-3204 Aug  
Abstract: Cytotoxic T lymphocyte (CTL) epitopes are potential candidates for subunit vaccine design for various diseases. Most of the existing T cell epitope prediction methods are indirect methods that predict MHC class I binders instead of CTL epitopes. In this study, a systematic attempt has been made to develop a direct method for predicting CTL epitopes from an antigenic sequence. This method is based on quantitative matrix (QM) and machine learning techniques such as Support Vector Machine (SVM) and Artificial Neural Network (ANN). This method has been trained and tested on non-redundant dataset of T cell epitopes and non-epitopes that includes 1137 experimentally proven MHC class I restricted T cell epitopes. The accuracy of QM-, ANN- and SVM-based methods was 70.0, 72.2 and 75.2%, respectively. The performance of these methods has been evaluated through Leave One Out Cross-Validation (LOOCV) at a cutoff score where sensitivity and specificity was nearly equal. Finally, both machine-learning methods were used for consensus and combined prediction of CTL epitopes. The performances of these methods were evaluated on blind dataset where machine learning-based methods perform better than QM-based method. We also demonstrated through subgroup analysis that our methods can discriminate between T-cell epitopes and MHC binders (non-epitopes). In brief this method allows prediction of CTL epitopes using QM, SVM, ANN approaches. The method also facilitates prediction of MHC restriction in predicted T cell epitopes.
Notes:
Biju Issac, Gajendra Pal Singh Raghava (2004)  EGPred: prediction of eukaryotic genes using ab initio methods after combining with sequence similarity approaches.   Genome Res 14: 9. 1756-1766 Sep  
Abstract: EGPred is a Web-based server that combines ab initio methods and similarity searches to predict genes, particularly exon regions, with high accuracy. The EGPred program proceeds in the following steps: (1) an initial BLASTX search of genomic sequence against the RefSeq database is used to identify protein hits with an E-value <1; (2) a second BLASTX search of genomic sequence against the hits from the previous run with relaxed parameters (E-values <10) helps to retrieve all probable coding exon regions; (3) a BLASTN search of genomic sequence against the intron database is then used to detect probable intron regions; (4) the probable intron and exon regions are compared to filter/remove wrong exons; (5) the NNSPLICE program is then used to reassign splicing signal site positions in the remaining probable coding exons; and (6) finally ab initio predictions are combined with exons derived from the fifth step based on the relative strength of start/stop and splice signal sites as obtained from ab initio and similarity search. The combination method increases the exon level performance of five different ab initio programs by 4%-10% when evaluated on the HMR195 data set. Similar improvement is observed when ab initio programs are evaluated on the Burset/Guigo data set. Finally, EGPred is demonstrated on an approximately 95-Mbp fragment of human chromosome 13. The list of predicted genes from this analysis are available in the supplementary material. The EGPred program is computationally intensive due to multiple BLAST runs during each analysis. The EGPred server is available at http://www.imtech.res.in/raghava/egpred/.
Notes:
Manoj Bhasin, G P S Raghava (2004)  GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors.   Nucleic Acids Res 32: Web Server issue. W383-W389 Jul  
Abstract: G-protein coupled receptors (GPCRs) belong to one of the largest superfamilies of membrane proteins and are important targets for drug design. In this study, a support vector machine (SVM)-based method, GPCRpred, has been developed for predicting families and subfamilies of GPCRs from the dipeptide composition of proteins. The dataset used in this study for training and testing was obtained from http://www.soe.ucsc.edu/research/compbio/gpcr/. The method classified GPCRs and non-GPCRs with an accuracy of 99.5% when evaluated using 5-fold cross-validation. The method is further able to predict five major classes or families of GPCRs with an overall Matthew's correlation coefficient (MCC) and accuracy of 0.81 and 97.5% respectively. In recognizing the subfamilies of the rhodopsin-like family, the method achieved an average MCC and accuracy of 0.97 and 97.3% respectively. The method achieved overall accuracy of 91.3% and 96.4% at family and subfamily level respectively when evaluated on an independent/blind dataset of 650 GPCRs. A server for recognition and classification of GPCRs based on multiclass SVMs has been set up at http://www.imtech.res.in/raghava/gpcrpred/. We have also suggested subfamilies for 42 sequences which were previously identified as unclassified ClassA GPCRs. The supplementary information is available at http://www.imtech.res.in/raghava/gpcrpred/info.html.
Notes:
Navjyot K Natt, Harpreet Kaur, G P S Raghava (2004)  Prediction of transmembrane regions of beta-barrel proteins using ANN- and SVM-based methods.   Proteins 56: 1. 11-18 Jul  
Abstract: This article describes a method developed for predicting transmembrane beta-barrel regions in membrane proteins using machine learning techniques: artificial neural network (ANN) and support vector machine (SVM). The ANN used in this study is a feed-forward neural network with a standard back-propagation training algorithm. The accuracy of the ANN-based method improved significantly, from 70.4% to 80.5%, when evolutionary information was added to a single sequence as a multiple sequence alignment obtained from PSI-BLAST. We have also developed an SVM-based method using a primary sequence as input and achieved an accuracy of 77.4%. The SVM model was modified by adding 36 physicochemical parameters to the amino acid sequence information. Finally, ANN- and SVM-based methods were combined to utilize the full potential of both techniques. The accuracy and Matthews correlation coefficient (MCC) value of SVM, ANN, and combined method are 78.5%, 80.5%, and 81.8%, and 0.55, 0.63, and 0.64, respectively. These methods were trained and tested on a nonredundant data set of 16 proteins, and performance was evaluated using "leave one out cross-validation" (LOOCV). Based on this study, we have developed a Web server, TBBPred, for predicting transmembrane beta-barrel regions in proteins (available at http://www.imtech.res.in/raghava/tbbpred).
Notes:
Harpreet Kaur, G P S Raghava (2004)  Prediction of alpha-turns in proteins using PSI-BLAST profiles and secondary structure information.   Proteins 55: 1. 83-90 Apr  
Abstract: In this paper a systematic attempt has been made to develop a better method for predicting alpha-turns in proteins. Most of the commonly used approaches in the field of protein structure prediction have been tried in this study, which includes statistical approach "Sequence Coupled Model" and machine learning approaches; i) artificial neural network (ANN); ii) Weka (Waikato Environment for Knowledge Analysis) Classifiers and iii) Parallel Exemplar Based Learning (PEBLS). We have also used multiple sequence alignment obtained from PSIBLAST and secondary structure information predicted by PSIPRED. The training and testing of all methods has been performed on a data set of 193 non-homologous protein X-ray structures using five-fold cross-validation. It has been observed that ANN with multiple sequence alignment and predicted secondary structure information outperforms other methods. Based on our observations we have developed an ANN-based method for predicting alpha-turns in proteins. The main components of the method are two feed-forward back-propagation networks with a single hidden layer. The first sequence-structure network is trained with the multiple sequence alignment in the form of PSI-BLAST-generated position specific scoring matrices. The initial predictions obtained from the first network and PSIPRED predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. The final network yields an overall prediction accuracy of 78.0% and MCC of 0.16. A web server AlphaPred (http://www.imtech.res.in/raghava/alphapred/) has been developed based on this approach.
Notes:
Manoj Bhasin, Gajendra P S Raghava (2004)  Classification of nuclear receptors based on amino acid composition and dipeptide composition.   J Biol Chem 279: 22. 23262-23266 May  
Abstract: Nuclear receptors are key transcription factors that regulate crucial gene networks responsible for cell growth, differentiation, and homeostasis. Nuclear receptors form a superfamily of phylogenetically related proteins and control functions associated with major diseases (e.g. diabetes, osteoporosis, and cancer). In this study, a novel method has been developed for classifying the subfamilies of nuclear receptors. The classification was achieved on the basis of amino acid and dipeptide composition from a sequence of receptors using support vector machines. The training and testing was done on a non-redundant data set of 282 proteins obtained from the NucleaRDB data base (1). The performance of all classifiers was evaluated using a 5-fold cross validation test. In the 5-fold cross-validation, the data set was randomly partitioned into five equal sets and evaluated five times on each distinct set while keeping the remaining four sets for training. It was found that different subfamilies of nuclear receptors were quite closely correlated in terms of amino acid composition as well as dipeptide composition. The overall accuracy of amino acid composition-based and dipeptide composition-based classifiers were 82.6 and 97.5%, respectively. Therefore, our results prove that different subfamilies of nuclear receptors are predictable with considerable accuracy using amino acid or dipeptide composition. Furthermore, based on above approach, an online web service, NRpred, was developed, which is available at www.imtech.res.in/raghava/nrpred.
Notes:
Harpreet Kaur, G P S Raghava (2004)  Role of evolutionary information in prediction of aromatic-backbone NH interactions in proteins.   FEBS Lett 564: 1-2. 47-57 Apr  
Abstract: In this study, an attempt has been made to develop a neural network-based method for predicting segments in proteins containing aromatic-backbone NH (Ar-NH) interactions using multiple sequence alignment. We have analyzed 3121 segments seven residues long containing Ar-NH interactions, extracted from 2298 non-redundant protein structures where no two proteins have more than 25% sequence identity. Two consecutive feed-forward neural networks with a single hidden layer have been trained with standard back-propagation as learning algorithm. The performance of the method improves from 0.12 to 0.15 in terms of Matthews correlation coefficient (MCC) value when evolutionary information (multiple alignment obtained from PSI-BLAST) is used as input instead of a single sequence. The performance of the method further improves from MCC 0.15 to 0.20 when secondary structure information predicted by PSIPRED is incorporated in the prediction. The final network yields an overall prediction accuracy of 70.1% and an MCC of 0.20 when tested by five-fold cross-validation. Overall the performance is 15.2% higher than the random prediction. The method consists of two neural networks: (i) a sequence-to-structure network which predicts the aromatic residues involved in Ar-NH interaction from multiple alignment of protein sequences and (ii) a structure-to structure network where the input consists of the output obtained from the first network and predicted secondary structure. Further, the actual position of the donor residue within the 'potential' predicted fragment has been predicted using a separate sequence-to-structure neural network. Based on the present study, a server Ar_NHPred has been developed which predicts Ar-NH interaction in a given amino acid sequence. The web server Ar_NHPred is available at and (mirror site).
Notes:
Manoj Bhasin, G P S Raghava (2004)  Analysis and prediction of affinity of TAP binding peptides using cascade SVM.   Protein Sci 13: 3. 596-607 Mar  
Abstract: The generation of cytotoxic T lymphocyte (CTL) epitopes from an antigenic sequence involves number of intracellular processes, including production of peptide fragments by proteasome and transport of peptides to endoplasmic reticulum through transporter associated with antigen processing (TAP). In this study, 409 peptides that bind to human TAP transporter with varying affinity were analyzed to explore the selectivity and specificity of TAP transporter. The abundance of each amino acid from P1 to P9 positions in high-, intermediate-, and low-affinity TAP binders were examined. The rules for predicting TAP binding regions in an antigenic sequence were derived from the above analysis. The quantitative matrix was generated on the basis of contribution of each position and residue in binding affinity. The correlation of r = 0.65 was obtained between experimentally determined and predicted binding affinity by using a quantitative matrix. Further a support vector machine (SVM)-based method has been developed to model the TAP binding affinity of peptides. The correlation (r = 0.80) was obtained between the predicted and experimental measured values by using sequence-based SVM. The reliability of prediction was further improved by cascade SVM that uses features of amino acids along with sequence. An extremely good correlation (r = 0.88) was obtained between measured and predicted values, when the cascade SVM-based method was evaluated through jackknife testing. A Web service, TAPPred (http://www.imtech.res.in/raghava/tappred/ or http://bioinformatics.uams.edu/mirror/tappred/), has been developed based on this approach.
Notes:
2003
Manoj Bhasin, Harpreet Singh, G P S Raghava (2003)  MHCBN: a comprehensive database of MHC binding and non-binding peptides.   Bioinformatics 19: 5. 665-666 Mar  
Abstract: MHCBN is a comprehensive database of Major Histocompatibility Complex (MHC) binding and non-binding peptides compiled from published literature and existing databases. The latest version of the database has 19 777 entries including 17 129 MHC binders and 2648 MHC non-binders for more than 400 MHC molecules. The database has sequence and structure data of (a) source proteins of peptides and (b) MHC molecules. MHCBN has a number of web tools that include: (i) mapping of peptide on query sequence; (ii) search on any field; (iii) creation of data sets; and (iv) online data submission. The database also provides hypertext links to major databases like SWISS-PROT, PDB, IMGT/HLA-DB, GenBank and PUBMED.
Notes:
Harpreet Kaur, G P S Raghava (2003)  A neural-network based method for prediction of gamma-turns in proteins from multiple sequence alignment.   Protein Sci 12: 5. 923-929 May  
Abstract: In the present study, an attempt has been made to develop a method for predicting gamma-turns in proteins. First, we have implemented the commonly used statistical and machine-learning techniques in the field of protein structure prediction, for the prediction of gamma-turns. All the methods have been trained and tested on a set of 320 nonhomologous protein chains by a fivefold cross-validation technique. It has been observed that the performance of all methods is very poor, having a Matthew's Correlation Coefficient (MCC) </= 0.06. Second, predicted secondary structure obtained from PSIPRED is used in gamma-turn prediction. It has been found that machine-learning methods outperform statistical methods and achieve an MCC of 0.11 when secondary structure information is used. The performance of gamma-turn prediction is further improved when multiple sequence alignment is used as the input instead of a single sequence. Based on this study, we have developed a method, GammaPred, for gamma-turn prediction (MCC = 0.17). The GammaPred is a neural-network-based method, which predicts gamma-turns in two steps. In the first step, a sequence-to-structure network is used to predict the gamma-turns from multiple alignment of protein sequence. In the second step, it uses a structure-to-structure network in which input consists of predicted gamma-turns obtained from the first step and predicted secondary structure obtained from PSIPRED.
Notes:
Harpreet Kaur, Gajendra Pal Singh Raghava (2003)  Prediction of beta-turns in proteins from multiple alignment using neural network.   Protein Sci 12: 3. 627-634 Mar  
Abstract: A neural network-based method has been developed for the prediction of beta-turns in proteins by using multiple sequence alignment. Two feed-forward back-propagation networks with a single hidden layer are used where the first-sequence structure network is trained with the multiple sequence alignment in the form of PSI-BLAST-generated position-specific scoring matrices. The initial predictions from the first network and PSIPRED-predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. A significant improvement in prediction accuracy has been achieved by using evolutionary information contained in the multiple sequence alignment. The final network yields an overall prediction accuracy of 75.5% when tested by sevenfold cross-validation on a set of 426 nonhomologous protein chains. The corresponding Q(pred), Q(obs), and Matthews correlation coefficient values are 49.8%, 72.3%, and 0.43, respectively, and are the best among all the previously published beta-turn prediction methods. The Web server BetaTPred2 (http://www.imtech.res.in/raghava/betatpred2/) has been developed based on this approach.
Notes:
Manoj Bhasin, G P S Raghava (2003)  Prediction of promiscuous and high-affinity mutated MHC binders.   Hybrid Hybridomics 22: 4. 229-234 Aug  
Abstract: The identification of peptides in an antigenic sequence that can bind with high affinity to a wide range of MHC alleles is one of the challenges in subunit vaccine design. The mutation of natural peptides is an alternative to obtaining peptides that can bind to a wide range of MHC alleles with high affinity. A large number of experiments are typically necessary to identify mutations that define high-affinity binding peptides. Therefore there is a need to develop a computational method for detecting amino acid mutations in a peptide for making it high-affinity or promiscuous MHC binders. This report describes a high-throughput computer driven solution for the identification of promiscuous and high-affinity mutated binders of 47 MHC class I alleles by introducing mutations in an antigenic sequence. The method implements quantitative matrices for creating optimal mutations in an antigenic sequence. It has two major options: (i) prediction of promiscuous MHC binders and (ii) prediction of high-affinity binders. In case of prediction of promiscuous binders, the server allows a user to select (i) permissible mutations in a peptide; (ii) MHC alleles to whom it should bind; and (iii) positions at which mutation is allowed. In the case of prediction of high-affinity binders, the server allows users to specify the positions that should be conserved in the native protein. In both cases, the method computes the type of mutations and position of mutations in 9-mer peptides required to have the desired results. The web server MMBPred is available at www.imtech.res.in/raghava/mmbpred/.
Notes:
Harpreet Kaur, G P S Raghava (2003)  BTEVAL: a server for evaluation of beta-turn prediction methods.   J Bioinform Comput Biol 1: 3. 495-504 Oct  
Abstract: This paper describes a web server BTEVAL, developed for assessing the performance of newly developed beta-turn prediction method and it's ranking with respect to other existing beta-turn prediction methods. Evaluation of a method can be carried out on a single protein or a number of proteins. It consists of clean data set of 426 non-homologous proteins with seven subsets of these proteins. Users can evaluate their method on any subset or a complete set of data. The method is assessed at amino acid level and performance is evaluated in terms of Qtotal, Qpredicted, Qobserved and MCC measures. The server also compares the performance of the method with other existing beta-turn prediction methods such as Chou-Fasman algorithm, Thornton's algorithm, GORBTURN, 1-4 and 2-3 Correlation model, Sequence coupled model and BTPRED. The server is accessible from http://imtech.res.in/raghava/bteval/
Notes:
G P S Raghava, Stephen M J Searle, Patrick C Audley, Jonathan D Barber, Geoffrey J Barton (2003)  OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.   BMC Bioinformatics 4: Oct  
Abstract: The alignment of two or more protein sequences provides a powerful guide in the prediction of the protein structure and in identifying key functional residues, however, the utility of any prediction is completely dependent on the accuracy of the alignment. In this paper we describe a suite of reference alignments derived from the comparison of protein three-dimensional structures together with evaluation measures and software that allow automatically generated alignments to be benchmarked. We test the OXBench benchmark suite on alignments generated by the AMPS multiple alignment method, then apply the suite to compare eight different multiple alignment algorithms. The benchmark shows the current state-of-the art for alignment accuracy and provides a baseline against which new alignment algorithms may be judged.
Notes:
Jyoti Sarin, Gajendra P S Raghava, Pradip K Chakraborti (2003)  Intrinsic contributions of polar amino acid residues toward thermal stability of an ABC-ATPase of mesophilic origin.   Protein Sci 12: 9. 2118-2120 Sep  
Abstract: The nucleotide-binding subunit of phosphate-specific transporter (PstB) from mesophilic bacterium, Mycobacterium tuberculosis, is a unique ATP-binding cassette (ABC) ATPase because of its unusual ability to hydrolyze ATP at high temperature. In an attempt to define the basis of thermostability, we took a theoretical approach and compared amino acid composition of this protein to that of other PstBs from available bacterial genomes. Interestingly, based on the content of polar amino acids, this protein clustered with the thermophiles.
Notes:
Harpreet Singh, G P S Raghava (2003)  ProPred1: prediction of promiscuous MHC Class-I binding sites.   Bioinformatics 19: 8. 1009-1014 May  
Abstract: SUMMARY: ProPred1 is an on-line web tool for the prediction of peptide binding to MHC class-I alleles. This is a matrix-based method that allows the prediction of MHC binding sites in an antigenic sequence for 47 MHC class-I alleles. The server represents MHC binding regions within an antigenic sequence in user-friendly formats. These formats assist user in the identification of promiscuous MHC binders in an antigen sequence that can bind to large number of alleles. ProPred1 also allows the prediction of the standard proteasome and immunoproteasome cleavage sites in an antigenic sequence. This server allows identification of MHC binders, who have the cleavage site at the C terminus. The simultaneous prediction of MHC binders and proteasome cleavage sites in an antigenic sequence leads to the identification of potential T-cell epitopes. AVAILABILITY: Server is available at http://www.imtech.res.in/raghava/propred1/. Mirror site of this server is available at http://bioinformatics.uams.edu/mirror/propred1/ Supplementary information: Matrices and document on server are available at http://www.imtech.res.in/raghava/propred1/page2.html
Notes:
2002
Biju Issac, G P S Raghava (2002)  GWFASTA: server for FASTA search in eukaryotic and microbial genomes.   Biotechniques 33: 3. 548-50, 552, 554-6 Sep  
Abstract: Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.
Notes:
Harpreet Kaur, G P S Raghava (2002)  An evaluation of beta-turn prediction methods.   Bioinformatics 18: 11. 1508-1514 Nov  
Abstract: MOTIVATION: beta-turn is an important element of protein structure. In the past three decades, numerous beta-turn prediction methods have been developed based on various strategies. For a detailed discussion about the importance of beta-turns and a systematic introduction of the existing prediction algorithms for beta-turns and their types, please see a recent review (Chou, Analytical Biochemistry, 286, 1-16, 2000). However at present, it is still difficult to say which method is better than the other. This is because of the fact that these methods were developed on different sets of data. Thus, it is important to evaluate the performance of beta-turn prediction methods. RESULTS: We have evaluated the performance of six methods of beta-turn prediction. All the methods have been tested on a set of 426 non-homologous protein chains. It has been observed that the performance of the neural network based method, BTPRED, is significantly better than the statistical methods. One of the reasons for its better performance is that it utilizes the predicted secondary structure information. We have also trained, tested and evaluated the performance of all methods except BTPRED and GORBTURN, on new data set using a 7-fold cross-validation technique. There is a significant improvement in performance of all the methods when secondary structure information is incorporated. Moreover, after incorporating secondary structure information, the Sequence Coupled Model has yielded better results in predicting beta-turns as compared with other methods. In this study, both threshold dependent and independent (ROC) measures have been used for evaluation.
Notes:
Biju Issac, Harpreet Singh, Harpreet Kaur, G P S Raghava (2002)  Locating probable genes using Fourier transform approach.   Bioinformatics 18: 1. 196-197 Jan  
Abstract: FTG is a web server for analyzing nucleotide sequences to predict the genes using Fourier transform techniques. This server implements the existing Fourier transform algorithms for gene prediction and allows the rapid visualization of analysis by output in GIF format.
Notes:
Harpreet Kaur, G P S Raghava (2002)  BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.   Bioinformatics 18: 3. 498-499 Mar  
Abstract: beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server.
Notes:
2001
H Singh, G P Raghava (2001)  ProPred: prediction of HLA-DR binding sites.   Bioinformatics 17: 12. 1236-1237 Dec  
Abstract: ProPred is a graphical web tool for predicting MHC class II binding regions in antigenic protein sequences. The server implement matrix based prediction algorithm, employing amino-acid/position coefficient table deduced from literature. The predicted binders can be visualized either as peaks in graphical interface or as colored residues in HTML interface. This server might be a useful tool in locating the promiscuous binding regions that can bind to several HLA-DR alleles. AVAILABILITY: The server is available at http://www.imtech.res.in/raghava/propred/ CONTACT: raghava@imtech.res.in SUPPLEMENTARY INFORMATION: http://www.imtech.res.in/raghava/propred/page3.html
Notes:
2000
G P Raghava, R J Solanki, V Soni, P Agrawal (2000)  Fingerprinting method for phylogenetic classification and identification of microorganisms based on variation in 16S rRNA gene sequences.   Biotechniques 29: 1. 108-12, 114-6 Jul  
Abstract: The paper describes a method for the classification and identification of microorganisms based on variations in 16S rRNA sequences. The 16S rRNA is one of the most conserved molecules within a cell. The nature of the variable and spacer regions has been found to be specific to a given organism. Thus, the method presented here can be very useful for the classification and identification of microorganisms for which very little information is available. To automate the method, a comprehensive computer program called FPMAP has been developed for the analysis of restriction fragment pattern data. The method involves the restriction digestion of genomic DNA, preferably using four-cutters that may recognize 6-9 sites within the 16S rDNA. The fragments are separated on a polyacrylamide gel along with a suitable marker, then transferred into a nylon membrane and hybridized with a radiolabeled 16S rDNA probe. After autoradiography, the fragment sizes are calculated, and the data are analyzed using the FPMAP software. We demonstrate that the method can be used for identification of strains of Streptomyces and mycobacteria. The software is available from our ftp site ftp:¿imtech.chd.nic.in/pub/com/fpmap/unix/.
Notes:
1997
D Nihalani, G P Raghava, G Sahni (1997)  Mapping of the plasminogen binding site of streptokinase with short synthetic peptides.   Protein Sci 6: 6. 1284-1292 Jun  
Abstract: Although several recent studies employing various truncated fragments of streptokinase (SK) have demonstrated that the high-affinity interactions of this protein with human plasminogen (HPG) to form activator complex (SK-HPG) are located in the central region of SK, the exact location and nature of such HPG interacting site(s) is still unclear. In order to locate the "core" HPG binding ability in SK, we focused on the primary structure of a tryptic fragment of SK derived from the central region (SK143-293) that could bind as well as activate HPG, albeit at reduced levels in comparison to the activity of the native, full-length protein. Because this fragment was refractory to further controlled proteolysis, we took recourse to a synthetic peptide approach wherein the HPG interacting properties of 16 overlapping 20-mer peptides derived from this region of SK were examined systematically. Only four peptides from this set, viz., SK234-253, SK254-273, SK274-293, and SK263-282, together representing the contiguous sequence SK234-293, displayed HPG binding ability. This was established by a specific HPG-binding ELISA as well as by dot blot assay using 125I-labeled HPG. These results showed that the minimal sequence with HPG binding function resided between residues 234 and 293. None of the synthetic SK peptides was found to activate HPG, either individually or in combination, but, in competition experiments where each of the peptides was added prior to complex formation between SK and HPG, three of the HPG binding peptides (SK234-253, SK254-273, and SK274-293) inhibited strongly the generation of a functional activator complex by SK and HPG. This indicated that residues 234-293 in SK participate directly in intermolecular contact formation with HPG during the formation of the 1:1 SK-HPG complex. Two of the three peptides (SK234-253 and SK274-293), apart from interfering in SK-HPG complex formation, also showed inhibition of the amidolytic activity of free HPN by increasing the K(m) by approximately fivefold. A similar increase in K(m) for amidolysis by HPN as a result of complexation with SK has been interpreted previously to arise from the steric hinderance at or near the active site due to the binding of SK in this region. Thus, our results suggest that SK234-253 and SK274-293 also, like SK, bound close to the active site of HPN, an event that was reflected in the observed alteration in its substrate accessibility. By contrast, whereas the intervening peptide (SK254-273) could not inhibit amidolysis by free HPN, it showed a marked inhibition of the activation of "substrate" PG (human or bovine plasminogen) by activator complex, indicating that this particular region is intimately involved in interaction of the SK-HPG activator complex with substrate plasminogen during the catalytic cycle. This finding provides a rational explanation for one of the most intriguing aspects of SK action, i.e., the ability of the SK-HPG complex to catalyze selectively the activation of substrate molecules of PG to PN, whereas free HPN alone cannot do so. Taken together, the results presented in this paper strongly support a model of SK action in which the segment 234-293 of SK, by virtue of the epitopes present in residues 234-253 and 274-293, binds close to the active center of HPN (or, a cryptic active site, in the case of HPG) during the intermolecular association of the two proteins to form the equimolar activator complex; the segment SK254-273 present in the center of the core region then imparts an ability to the activator complex to interact selectively with substrate PG molecules during each PG activation cycle.
Notes:
1995
G P Raghava (1995)  DNAOPT: a computer program to aid optimization of DNA gel electrophoresis and SDS-PAGE.   Biotechniques 18: 2. 274-8, 280 Feb  
Abstract: Several methods and computer programs have been developed for estimating the size of DNA fragments from gel electrophoresis. However, methods are lacking that may facilitate in optimization of gel conditions. In this article, a computer program called DNAOPT is described, which was developed to assist researchers in tuning the gel conditions of gel electrophoresis. The DNAOPT program fits the reciprocal of the migration distance vs. the size of the DNA fragments using the hyperbolic regression method and computes the hyperbolic parameters such as signal, flatness and capacity (optimization parameters). The program further manipulates these parameters obtained by running gel electrophoresis under various conditions (i) to determine the relationship between the gel conditions (temperature, buffer concentration, electric field strength, etc.) and optimization parameters; (ii) to demonstrate gel electrophoresis curves and optimization parameters graphically; and (iii) to represent the optimizing parameters at different gel conditions in tabular form. The above-mentioned program options aid the users in selecting optimum gel conditions by running the gel repeatedly under various conditions in which the agarose concentration, electric field strength, temperature, buffer concentration and so on are varied in a systematic way for each set of gel conditions. Similarly, this program can also be used to optimize gel conditions of sodium dodecyl sulfate polyacrylamide gel electrophoresis.
Notes:
1994
G P Raghava, J N Agrewala (1994)  Method for determining the affinity of monoclonal antibody using non-competitive ELISA: a computer program.   J Immunoassay 15: 2. 115-128 May  
Abstract: A simple and reliable method based upon law of mass action for calculating affinity of a monoclonal antibody using non-competitive ELISA, is described. In this method, the binding of an antibody (Ab) with an antigen (Ag) is measured by ELISA using serial dilutions of both antigen (coated on the plate) as well as antibody. When the OD measured after the antigen antibody interaction was plotted against the concentration of Ab, added to the wells, a hyperbolic curve was obtained. The OD, at any point of the curve, was considered as a direct reflection of the amount of antibody bound to the antigen. The OD-100 denotes the occupancy of maximum no. of epitopes available on the antigen molecules, accessible to the antigen. The concentration of antibody (Ab, Ab') at corresponding levels of antigen concentration (Ag, Ag'), presents the value obtained at OD-50. The [Ag] and [Ag'] are not the true antigen concentrations but are the measurement of antigen density on the plate. The affinity constant K(aff) was calculated by using the formula K(aff) = (n - 1)/2(n[Ab'] - [Ab]), derived from law of mass action, where n = [Ag]/[Ag']. A computer program to calculate the affinity of antibody to the antigen using method described in this manuscript has been developed and discussed.
Notes:
G P Raghava, A Goel, A M Singh, G C Varshney (1994)  A simple microassay for computing the hemolytic potency of drugs.   Biotechniques 17: 6. 1148-1153 Dec  
Abstract: A simple microassay and computer program are described for determining the erythrocyte hemolytic potency of drugs in vitro. This microassay is sensitive for both micro as well as macro ranges of hemoglobin concentration. An ELISA reader has been adapted to read erythrocyte lysis (hemolysis), which reduces the number and culture of replicates. A computer program was developed that calculates parameters such as C50 (concentration of drug causing 50% hemolysis), C100 (concentration of drug causing 100% hemolysis) and beta (slope of the curve) and graphically expresses the hemolytic patterns of various drugs simultaneously. The program can obtain optical densities directly from a 96-well plate ELISA reader by interfacing the microplate reader to the computer or by using a keyboard. This method is useful for screening a large number of hemolytic drugs and requires lower amounts of test compounds. It may also be applicable to quantitative functional assays, such as complement-mediated hemolysis and enumeration of antibody-secreting cells. The program can be obtained from the authors on request.
Notes:
G P Raghava (1994)  Improved estimation of DNA fragment length from gel electrophoresis data using a graphical method.   Biotechniques 17: 1. 100-104 Jul  
Abstract: A computer program has been developed for computing DNA fragment size from its electrophoretic mobility using a graphical method. The program uses DNA marker data and selects the semilogarithmic linear range (sl-range), i.e., the linear portion of the semilogarithmic curve (mobility vs. log of DNA fragment length). Over this range a linear interpolation is derived for calculating the size of a DNA fragment whose mobility falls in the sl-range. The program also derives a hyperbolic interpolation formula that covers the entire range for determining the size of a DNA fragment whose mobility is beyond the semilogarithmic linear range. The method described in this paper is sensitive, accurate and reliable. This program can also be used to compute protein or polypeptide size from sodium dodecyl sulfate polyacrylamide gel electrophoresis data. The DOS version of the DNASIZE program is freely available from Netserver at EMBL or from BioTechNet by EMail.
Notes:
G P Raghava, G Sahni (1994)  GMAP: a multi-purpose computer program to aid synthetic gene design, cassette mutagenesis and the introduction of potential restriction sites into DNA sequences.   Biotechniques 16: 6. 1116-1123 Jun  
Abstract: A computer program called GMAP has been developed for i) mapping the potential restriction endonuclease (R.E.) sites that can be introduced in a nonambiguous DNA sequence; ii) predicting the mutations required to introduce unique R.E. sites in the nonambiguous DNA sequences; and iii) searching all R.E. sites in ambiguous DNA sequence obtained by reverse translation of a given amino acid sequence. This allows the design of synthetic genes as well as the modular redesign after introducing limited base pair mismatches in wild-type genes in order to adapt them for "cassette" mutagenesis. The GMAP program uses an algorithm based on set theory that reduces the degree of complexity from an exponential to linear function of sequence length. Therefore, the speed of searching for potential R.E. sites in reverse-translated gene sequences and the prediction of new R.E. sites in natural genes by mutations are rapid.
Notes:
1993
J N Agrewala, G P Raghava, G C Mishra (1993)  Measurement and computation of murine interleukin-4 and interferon-gamma by exploiting the unique abilities of these lymphokines to induce the secretion of IgG1 and IgG2a.   J Immunoassay 14: 1-2. 83-97 Mar/Jun  
Abstract: A specific and new method for measuring Interleukin-4 and Interferon-gamma, based on the estimation of IgG1 and IgG2a isotypes secretion from B cells is described. An antagonizing effect of Interferon-gamma in the production of IgG1 induced by Interleukin-4 was neutralized by using antibody to Interferon-gamma. Similarly, the interference of Interleukin-4 in the Interferon-gamma mediated enhancement of IgG2a production was blocked by anti-Interleukin-4 antibody. The high concentrations of Interleukin-4 and Interferon gamma inhibited the secretion of IgG1 and IgG2a respectively. Therefore, in the assay described, the samples containing the cytokines were so diluted that their activity fell into the non-inhibitory zone. A computer program has also been developed for determining the concentrations of lymphokines.
Notes:
1992
G P Raghava, A K Joshi, J N Agrewala (1992)  Calculation of antibody and antigen concentrations from ELISA data using a graphical method.   J Immunol Methods 153: 1-2. 263-264 Aug  
Abstract: A graphical method for determining the concentration of either the antibody or the antigen from ELISA data is presented in the form of a GWBASIC program. In the program, ELISAEQ, optical densities (OD) obtained from a 96-well ELISA plate can be input either directly by interfacing a microplate reader to the computer or manually. The program uses standard sample data, and selects the semilogarithmic linear range. Over this range, a least-squares method is used to determine the concentrations of interest. In addition, a hyperbolic interpolation formula is derived over the entire range for estimating the antibody or antigen concentration of the unknown samples whose OD is beyond the linear range.
Notes:
 
Abstract:
Notes:
Powered by PublicationsList.org.