hosted by
publicationslist.org
    

Debasis Dash

G.N. Ramachandran Knowledge Centre for Genome Informatics
Institute of Genomics and Integrative Biology, CSIR,
Mall Road, Delhi - 110 007
INDIA
ddash@igib.res.in
RESEARCH INTEREST:
# Unraveling the functional significance of unfolded proteins
# Development of Polymorphic Markup Language with the help international collaboration, Indian Genome Variation Database and Genotype to Phenotype Database
# Development of computer based methods for genome analysis for applications such as:
(a) identifying peptides useful as drug targets, (b) assigning functions to hypothetical/ orphan proteins, (c) retrieving unique peptides for diagnostics (d) identification of protein coding genes
# Developed & Commercialized Software PLHOST-FA, GenoCluster, HT-SSS and publicly available database CoPS -Comprehensive Peptide Signature Database

Journal articles

2012
Amit Kumar Yadav, Dhirendra Kumar, Debasis Dash (2012)  Learning from decoys to improve the sensitivity and specificity of proteomics database search results.   PLoS One 7: 11. 11  
Abstract: The statistical validation of database search results is a complex issue in bottom-up proteomics. The correct and incorrect peptide spectrum match (PSM) scores overlap significantly, making an accurate assessment of true peptide matches challenging. Since the complete separation between the true and false hits is practically never achieved, there is need for better methods and rescoring algorithms to improve upon the primary database search results. Here we describe the calibration and False Discovery Rate (FDR) estimation of database search scores through a dynamic FDR calculation method, FlexiFDR, which increases both the sensitivity and specificity of search results. Modelling a simple linear regression on the decoy hits for different charge states, the method maximized the number of true positives and reduced the number of false negatives in several standard datasets of varying complexity (18-mix, 49-mix, 200-mix) and few complex datasets (E. coli and Yeast) obtained from a wide variety of MS platforms. The net positive gain for correct spectral and peptide identifications was up to 14.81% and 6.2% respectively. The approach is applicable to different search methodologies--separate as well as concatenated database search, high mass accuracy, and semi-tryptic and modification searches. FlexiFDR was also applied to Mascot results and showed better performance than before. We have shown that appropriate threshold learnt from decoys, can be very effective in improving the database search results. FlexiFDR adapts itself to different instruments, data types and MS platforms. It learns from the decoy hits and sets a flexible threshold that automatically aligns itself to the underlying variables of data quality and size.
Notes:
Meenakshi Anurag, Gajinder Pal Singh, Debasis Dash (2012)  Location of disorder in coiled coil proteins is influenced by its biological role and subcellular localization: a GO-based study on human proteome.   Mol Biosyst 8: 1. 346-352 Jan  
Abstract: Intrinsic disorder in proteins has been explored to study lack of structure-function aspects of many proteins. The current study focuses on coiled coils which are often linked to intrinsic disorder. We present a sequence level analysis of human coiled coils to find out if this is universally true for all coiled coils. When annotated coiled-coil regions were collected from UniProt and investigated with disorder prediction tools namely-IUPred and DISpro, three patterns were commonly observed-disordered coiled coils (DisCCs), ordered coiled coils (OCCs) and the last one having a disordered region outside the coiled-coil region (DOCCs). Differential enrichment in the gene ontology was seen in these three categories. We found that OCCs are enriched in structural components of the extracellular space including the fibrinogen complex and laminin complex. On the contrary, DisCCs were found to be exclusively over-represented in proteins involved in actin filament, lamellipodium, cell junction, macromolecule complexes, ciliary rootlet and nucleolus. DOCCs are found to be associated with many regulatory and adaptor functions including positive regulation of calcium ion transport via store-operated calcium channel activity, cytoskeletal adaptor activity etc. Other than the GO-based analysis, sequence level analysis showed that disordered coiled-coil regions bear a high proportion of low-complexity regions as compared to ordered coiled coils. The former also has a higher probability of forming a dimer as compared to the ordered counterpart. Our study shows that the in silico approach of mapping of disorder in or around coiled coils in other biological systems or organisms can be applied to understand and rationalize the mode of action of these dynamic motifs.
Notes:
Rohit Vashisht, Anupam Kumar Mondal, Akanksha Jain, Anup Shah, Priti Vishnoi, Priyanka Priyadarshini, Kausik Bhattacharyya, Harsha Rohira, Ashwini G Bhat, Anurag Passi, Keya Mukherjee, Kumari Sonal Choudhary, Vikas Kumar, Anshula Arora, Prabhakaran Munusamy, Ahalyaa Subramanian, Aparna Venkatachalam, S Gayathri, Sweety Raj, Vijaya Chitra, Kaveri Verma, Salman Zaheer, J Balaganesh, Malarvizhi Gurusamy, Mohammed Razeeth, Ilamathi Raja, Madhumohan Thandapani, Vishal Mevada, Raviraj Soni, Shruti Rana, Girish Muthagadhalli Ramanna, Swetha Raghavan, Sunil N Subramanya, Trupti Kholia, Rajesh Patel, Varsha Bhavnani, Lakavath Chiranjeevi, Soumi Sengupta, Pankaj Kumar Singh, Naresh Atray, Swati Gandhi, Tiruvayipati Suma Avasthi, Shefin Nisthar, Meenakshi Anurag, Pratibha Sharma, Yasha Hasija, Debasis Dash, Arun Sharma, Vinod Scaria, Zakir Thomas, Nagasuma Chandra, Samir K Brahmachari, Anshu Bhardwaj (2012)  Crowd sourcing a new paradigm for interactome driven drug target identification in Mycobacterium tuberculosis.   PLoS One 7: 7. 07  
Abstract: A decade since the availability of Mycobacterium tuberculosis (Mtb) genome sequence, no promising drug has seen the light of the day. This not only indicates the challenges in discovering new drugs but also suggests a gap in our current understanding of Mtb biology. We attempt to bridge this gap by carrying out extensive re-annotation and constructing a systems level protein interaction map of Mtb with an objective of finding novel drug target candidates. Towards this, we synergized crowd sourcing and social networking methods through an initiative 'Connect to Decode' (C2D) to generate the first and largest manually curated interactome of Mtb termed 'interactome pathway' (IPW), encompassing a total of 1434 proteins connected through 2575 functional relationships. Interactions leading to gene regulation, signal transduction, metabolism, structural complex formation have been catalogued. In the process, we have functionally annotated 87% of the Mtb genome in context of gene products. We further combine IPW with STRING based network to report central proteins, which may be assessed as potential drug targets for development of drugs with least possible side effects. The fact that five of the 17 predicted drug targets are already experimentally validated either genetically or biochemically lends credence to our unique approach.
Notes:
Tim Beck, Sirisha Gollapudi, Søren Brunak, Norbert Graf, Heinz U Lemke, Debasis Dash, Iain Buchan, Carlos Díaz, Ferran Sanz, Anthony J Brookes (2012)  Knowledge engineering for health: a new discipline required to bridge the "ICT gap" between research and healthcare.   Hum Mutat 33: 5. 797-802 May  
Abstract: Despite vast amount of money and research being channeled toward biomedical research, relatively little impact has been made on routine clinical practice. At the heart of this failure is the information and communication technology "chasm" that exists between research and healthcare. A new focus on "knowledge engineering for health" is needed to facilitate knowledge transmission across the research-healthcare gap. This discipline is required to engineer the bidirectional flow of data: processing research data and knowledge to identify clinically relevant advances and delivering these into healthcare use; conversely, making outcomes from the practice of medicine suitably available for use by the research community. This system will be able to self-optimize in that outcomes for patients treated by decisions that were based on the latest research knowledge will be fed back to the research world. A series of meetings, culminating in the "I-Health 2011" workshop, have brought together interdisciplinary experts to map the challenges and requirements for such a system. Here, we describe the main conclusions from these meetings. An "I4Health" interdisciplinary network of experts now exists to promote the key aims and objectives, namely "integrating and interpreting information for individualized healthcare," by developing the "knowledge engineering for health" domain.
Notes:
2011
Deepak Singla, Meenakshi Anurag, Debasis Dash, Gajendra P S Raghava (2011)  A Web Server for Predicting Inhibitors against Bacterial Target GlmU Protein.   BMC Pharmacol 11: 1. Jul  
Abstract: ABSTRACT: BACKGROUND: The emergence of drug resistant tuberculosis poses a serious concern globally and researchers are in rigorous search for new drugs to fight against these dreadful bacteria. Recently, the bacterial GlmU protein, involved in peptidoglycan, lipopolysaccharide and techoic acid synthesis, has been identified as an important drug target. A unique C-terminal disordered tail, essential for survival and the absence of gene in host makes GlmU a suitable target for inhibitor design. RESULTS: This study describes the models developed for predicting inhibitory activity (IC50) of chemical compounds against GlmU protein using QSAR and docking techniques. These models were trained on 84 diverse compounds (GlmU inhibitors) taken from PubChem BioAssay (AID 1376). These inhibitors were docked in the active site of the C-terminal domain of GlmU protein (2OI6) using the AutoDock. A QSAR model was developed using docking energies as descriptors and achieved maximum correlation of 0.35/0.12 (r/r2) between actual and predicted pIC50. Secondly, QSAR models were developed using molecular descriptors calculated using various software packages and achieved maximum correlation of 0.77/0.60 (r/r2). Finally, hybrid models were developed using various types of descriptors and achieved high correlation of 0.83/0.70 (r/r2) between predicted and actual pIC50. It was observed that some molecular descriptors used in this study had high correlation with pIC50. We screened chemical libraries using models developed in this study and predicted 40 potential GlmU inhibitors. These inhibitors could be used to develop drugs against Mycobacterium tuberculosis. CONCLUSION: These results demonstrate that docking energies can be used as descriptors for developing QSAR models. The current work suggests that docking energies based descriptors could be used along with commonly used molecular descriptors for predicting inhibitory activity (IC50) of molecules against GlmU. Based on this study an open source platform, http://crdd.osdd.net/raghava/gdoq, has been developed for predicting inhibitors GlmU.
Notes:
Amit Kumar Yadav, Dhirendra Kumar, Debasis Dash (2011)  MassWiz: a novel scoring algorithm with target-decoy based analysis pipeline for tandem mass spectrometry.   J Proteome Res 10: 5. 2154-2160 May  
Abstract: Mass spectrometry has made rapid advances in the recent past and has become the preferred method for proteomics. Although many open source algorithms for peptide identification exist, such as X!Tandem and OMSSA, it has majorly been a domain of proprietary software. There is a need for better, freely available, and configurable algorithms that can help in identifying the correct peptides while keeping the false positives to a minimum. We have developed MassWiz, a novel empirical scoring function that gives appropriate weights to major ions, continuity of b-y ions, intensities, and the supporting neutral losses based on the instrument type. We tested MassWiz accuracy on 486,882 spectra from a standard mixture of 18 proteins generated on 6 different instruments downloaded from the Seattle Proteome Center public repository. We compared the MassWiz algorithm with Mascot, Sequest, OMSSA, and X!Tandem at 1% FDR. MassWiz outperformed all in the largest data set (AGILENT XCT) and was second only to Mascot in the other data sets. MassWiz showed good performance in the analysis of high confidence peptides, i.e., those identified by at least three algorithms. We also analyzed a yeast data set containing 106,133 spectra downloaded from the NCBI Peptidome repository and got similar results. The results demonstrate that MassWiz is an effective algorithm for high-confidence peptide identification without compromising on the number of assignments. MassWiz is open-source, versatile, and easily configurable.
Notes:
Ankita Narang, Pankaj Jha, Vimal Rawat, Arijit Mukhopadhayay, Debasis Dash, Analabha Basu, Mitali Mukerji (2011)  Recent admixture in an Indian population of african ancestry.   Am J Hum Genet 89: 1. 111-120 Jul  
Abstract: Identification and study of genetic variation in recently admixed populations not only provides insight into historical population events but also is a powerful approach for mapping disease loci. We studied a population (OG-W-IP) that is of African-Indian origin and has resided in the western part of India for 500 years; members of this population are believed to be descendants of the Bantu-speaking population of Africa. We have carried out this study by using a set of 18,534 autosomal markers common between Indian, CEPH-HGDP, and HapMap populations. Principal-components analysis clearly revealed that the African-Indian population derives its ancestry from Bantu-speaking west-African as well as Indo-European-speaking north and northwest Indian population(s). STRUCTURE and ADMIXTURE analyses show that, overall, the OG-W-IPs derive 58.7% of their genomic ancestry from their African past and have very little inter-individual ancestry variation (8.4%). The extent of linkage disequilibrium also reveals that the admixture event has been recent. Functional annotation of genes encompassing the ancestry-informative markers that are closer in allele frequency to the Indian ancestral population revealed significant enrichment of biological processes, such as ion-channel activity, and cadherins. We briefly examine the implications of determining the genetic diversity of this population, which could provide opportunities for studies involving admixture mapping.
Notes:
Pramod Gautam, Pankaj Jha, Dhirendra Kumar, Shivani Tyagi, Binuja Varma, Debasis Dash, Arijit Mukhopadhyay, Mitali Mukerji (2011)  Spectrum of large copy number variations in 26 diverse Indian populations: potential involvement in phenotypic diversity.   Hum Genet Jul  
Abstract: Copy number variations (CNVs) have provided a dynamic aspect to the apparently static human genome. We have analyzed CNVs larger than 100 kb in 477 healthy individuals from 26 diverse Indian populations of different linguistic, ethnic and geographic backgrounds. These CNVRs were identified using the Affymetrix 50K Xba 240 Array. We observed 1,425 and 1,337 CNVRs in the deletion and amplification sets, respectively, after pooling data from all the populations. More than 50% of the genes encompassed entirely in CNVs had both deletions and amplifications. There was wide variability across populations not only with respect to CNV extent (ranging from 0.04-1.14% of genome under deletion and 0.11-0.86% under amplification) but also in terms of functional enrichments of processes like keratinization, serine proteases and their inhibitors, cadherins, homeobox, olfactory receptors etc. These did not correlate with linguistic, ethnic, geographic backgrounds and size of populations. Certain processes were near exclusive to deletion (serine proteases, keratinization, olfactory receptors, GPCRs) or duplication (homeobox, serine protease inhibitors, embryonic limb morphogenesis) datasets. Populations having same enriched processes were observed to contain genes from different genomic loci. Comparison of polymorphic CNVRs (5% or more) with those cataloged in Database of Genomic Variants revealed that 78% (2473) of the genes in CNVRs in Indian populations are novel. Validation of CNVs using Sequenom MassARRAY revealed extensive heterogeneity in CNV boundaries. Exploration of CNV profiles in such diverse populations would provide a widely valuable resource for understanding diversity in phenotypes and disease.
Notes:
Dhanashree S Kelkar, Dhirendra Kumar, Praveen Kumar, Lavanya Balakrishnan, Babylakshmi Muthusamy, Amit Kumar Yadav, Priyanka Shrivastava, Arivusudar Marimuthu, Sridhar Anand, Hema Sundaram, Reena Kingsbury, H C Harsha, Bipin Nair, T S Keshava Prasad, Devendra Singh Chauhan, Kiran Katoch, Vishwa Mohan Katoch, Prahlad Kumar, Raghothama Chaerkady, Srinivasan Ramachandran, Debasis Dash, Akhilesh Pandey (2011)  Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry.   Mol Cell Proteomics Oct  
Abstract: The genome sequencing of H37Rv strain of Mycobacterium tuberculosis was completed in 1998 followed by the whole genome sequencing of a clinical isolate, CDC1551 in 2002. Since then, the genomic sequences of a number of other strains have become available making it one of the better studied pathogenic bacterial species at the genomic level. However, annotation of its genome remains challenging because of high GC content and dissimilarity to other model prokaryotes. To this end, we carried out an in-depth proteogenomic analysis of the M. tuberculosis H37Rv strain using Fourier transform mass spectrometry with high resolution at both MS and MS/MS levels. In all, we identified 3,176 proteins from Mycobacterium tuberculosis representing ~80% of its total predicted gene count. In addition to protein database search, we carried out genome database search, which led to identification of ~250 novel peptides. Based on these novel genome search specific peptides (GSSPs), we discovered 41 novel protein coding genes in the H37Rv genome. Using peptide evidence and alternative gene prediction tools, we also corrected 79 gene models. Finally, mass spectrometric data from N-terminus-derived peptides confirmed 745 existing annotations for translational start sites while correcting those for 33 proteins. We report creation of a high confidence set of protein coding regions in Mycobacterium tuberculosis genome obtained by high resolution tandem mass-spectrometry at both precursor and fragment detection steps for the first time. This proteogenomic approach should be generally applicable to other organisms whose genomes have already been sequenced for obtaining a more accurate catalog of protein-coding genes.
Notes:
Amit Kumar Yadav, Gourav Bhardwaj, Trayambak Basak, Dhirendra Kumar, Shadab Ahmad, Ruby Priyadarshini, Ashish Kumar Singh, Debasis Dash, Shantanu Sengupta (2011)  A systematic analysis of eluted fraction of plasma post immunoaffinity depletion: implications in biomarker discovery.   PLoS One 6: 9. 09  
Abstract: Plasma is the most easily accessible source for biomarker discovery in clinical proteomics. However, identifying potential biomarkers from plasma is a challenge given the large dynamic range of proteins. The potential biomarkers in plasma are generally present at very low abundance levels and hence identification of these low abundance proteins necessitates the depletion of highly abundant proteins. Sample pre-fractionation using immuno-depletion of high abundance proteins using multi-affinity removal system (MARS) has been a popular method to deplete multiple high abundance proteins. However, depletion of these abundant proteins can result in concomitant removal of low abundant proteins. Although there are some reports suggesting the removal of non-targeted proteins, the predominant view is that number of such proteins is small. In this study, we identified proteins that are removed along with the targeted high abundant proteins. Three plasma samples were depleted using each of the three MARS (Hu-6, Hu-14 and Proteoprep 20) cartridges. The affinity bound fractions were subjected to gelC-MS using an LTQ-Orbitrap instrument. Using four database search algorithms including MassWiz (developed in house), we selected the peptides identified at <1% FDR. Peptides identified by at least two algorithms were selected for protein identification. After this rigorous bioinformatics analysis, we identified 101 proteins with high confidence. Thus, we believe that for biomarker discovery and proper quantitation of proteins, it might be better to study both bound and depleted fractions from any MARS depleted plasma sample.
Notes:
Anshu Bhardwaj, Vinod Scaria, Gajendra Pal Singh Raghava, Andrew Michael Lynn, Nagasuma Chandra, Sulagna Banerjee, Muthukurussi V Raghunandanan, Vikas Pandey, Bhupesh Taneja, Jyoti Yadav, Debasis Dash, Jaijit Bhattacharya, Amit Misra, Anil Kumar, Srinivasan Ramachandran, Zakir Thomas, Samir K Brahmachari (2011)  Open source drug discovery- A new paradigm of collaborative research in tuberculosis drug development.   Tuberculosis (Edinb) Jul  
Abstract: It is being realized that the traditional closed-door and market driven approaches for drug discovery may not be the best suited model for the diseases of the developing world such as tuberculosis and malaria, because most patients suffering from these diseases have poor paying capacity. To ensure that new drugs are created for patients suffering from these diseases, it is necessary to formulate an alternate paradigm of drug discovery process. The current model constrained by limitations for collaboration and for sharing of resources with confidentiality hampers the opportunities for bringing expertise from diverse fields. These limitations hinder the possibilities of lowering the cost of drug discovery. The Open Source Drug Discovery project initiated by Council of Scientific and Industrial Research, India has adopted an open source model to power wide participation across geographical borders. Open Source Drug Discovery emphasizes integrative science through collaboration, open-sharing, taking up multi-faceted approaches and accruing benefits from advances on different fronts of new drug discovery. Because the open source model is based on community participation, it has the potential to self-sustain continuous development by generating a storehouse of alternatives towards continued pursuit for new drug discovery. Since the inventions are community generated, the new chemical entities developed by Open Source Drug Discovery will be taken up for clinical trial in a non-exclusive manner by participation of multiple companies with majority funding from Open Source Drug Discovery. This will ensure availability of drugs through a lower cost community driven drug discovery process for diseases afflicting people with poor paying capacity. Hopefully what LINUX the World Wide Web have done for the information technology, Open Source Drug Discovery will do for drug discovery.
Notes:
2010
Ankita Narang, Rishi Das Roy, Amit Chaurasia, Arijit Mukhopadhyay, Mitali Mukerji, Debasis Dash (2010)  IGVBrowser--a genomic variation resource from diverse Indian populations.   Database (Oxford) 2010: 09  
Abstract: The Indian Genome Variation Consortium (IGVC) project, an initiative of the Council for Scientific and Industrial Research, has been the first large-scale comprehensive study of the Indian population. One of the major aims of the project is to study and catalog the variations in nearly thousand candidate genes related to diseases and drug response for predictive marker discovery, founder identification and also to address questions related to ethnic diversity, migrations, extent and relatedness with other world population. The Phase I of the project aimed at providing a set of reference populations that would represent the entire genetic spectrum of India in terms of language, ethnicity and geography and Phase II in providing variation data on candidate genes and genome wide neutral markers on these reference set of populations. We report here development of the IGVBrowser that provides allele and genotype frequency data generated in the IGVC project. The database harbors 4229 SNPs from more than 900 candidate genes in contrasting Indian populations. Analysis shows that most of the markers are from genic regions. Further, a large fraction of genes are implicated in cardiovascular, metabolic, cancer and immune system-related diseases. Thus, the IGVC data provide a basal level variation data in Indian population to study genetic diseases and pharmacology. Additionally, it also houses data on ∼50,000 (Affy 50 K array) genome wide neutral markers in these reference populations. In IGVBrowser one can analyze and compare genomic variations in Indian population with those reported in HapMap along with annotation information from various primary data sources. Database URL: http://igvbrowser.igib.res.in.
Notes:
2009
Meenakshi Anurag, Debasis Dash (2009)  Unraveling the potential of intrinsically disordered proteins as drug targets: application to Mycobacterium tuberculosis.   Mol Biosyst 5: 12. 1752-1757 Dec  
Abstract: Many eukaryotic and prokaryotic proteins remain disordered under physiological conditions and often acquire a stable secondary structure on binding to their cellular targets. Though the process of binding is still under analysis, it has been found that the flexibility of proteins can add to their functionality. This motivated us to explore intrinsically disordered proteins (IDPs) as drug targets. In silico studies have been carried out on Mycobacterium tuberculosis, which, with emergence of hyper-virulent and drug resistant strains, XDRs and MDRs, is one of the most dreaded pathogens in the modern world. Our study reports 13 IDPs as potential drug targets, and three of them--FtsW (Rv2154c), GlmU (Rv1018c) and Obg (Rv2440c)--are chosen as key proteins and are described in detail. Future applications of this method can provide new insight into understanding the molecular mechanism of IDPs and their potential role as drug targets.
Notes:
Anthony J Brookes, Heikki Lehvaslaiho, Juha Muilu, Yasumasa Shigemoto, Takashige Oroguchi, Takeshi Tomiki, Atsuhiro Mukaiyama, Akihiko Konagaya, Toshio Kojima, Ituro Inoue, Masako Kuroda, Hiroshi Mizushima, Gudmundur A Thorisson, Debasis Dash, Haseena Rajeevan, Matthew W Darlison, Mark Woon, David Fredman, Albert V Smith, Martin Senger, Kimitoshi Naito, Hideaki Sugawara (2009)  The phenotype and genotype experiment object model (PaGE-OM): a robust data structure for information related to DNA variation.   Hum Mutat 30: 6. 968-977 Jun  
Abstract: Torrents of genotype-phenotype data are being generated, all of which must be captured, processed, integrated, and exploited. To do this optimally requires the use of standard and interoperable "object models," providing a description of how to partition the total spectrum of information being dealt with into elemental "objects" (such as "alleles," "genotypes," "phenotype values," "methods") with precisely stated logical interrelationships (such as "A objects are made up from one or more B objects"). We herein propose the Phenotype and Genotype Experiment Object Model (PaGE-OM; www.pageom.org), which has been tested and implemented in conjunction with several major databases, and approved as a standard by the Object Management Group (OMG). PaGE-OM is open-source, ready for use by the wider community, and can be further developed as needs arise. It will help to improve information management, assist data integration, and simplify the task of informatics resource design and construction for genotype and phenotype data projects.
Notes:
Gudmundur A Thorisson, Owen Lancaster, Robert C Free, Robert K Hastings, Pallavi Sarmah, Debasis Dash, Samir K Brahmachari, Anthony J Brookes (2009)  HGVbaseG2P: a central genetic association database.   Nucleic Acids Res 37: Database issue. D797-D802 Jan  
Abstract: The Human Genome Variation database of Genotype to Phenotype information (HGVbaseG2P) is a new central database for summary-level findings produced by human genetic association studies, both large and small. Such a database is needed so that researchers have an easy way to access all the available association study data relevant to their genes, genome regions or diseases of interest. Such a depository will allow true positive signals to be more readily distinguished from false positives (type I error) that fail to consistently replicate. In this paper we describe how HGVbaseG2P has been constructed, and how its data are gathered and organized. We present a range of user-friendly but powerful website tools for searching, browsing and visualizing G2P study findings. HGVbaseG2P is available at http://www.hgvbaseg2p.org.
Notes:
2008
(2008)  Genetic landscape of the people of India: a canvas for disease gene exploration.   J Genet 87: 1. 3-20 Apr  
Abstract: Analyses of frequency profiles of markers on disease or drug-response related genes in diverse populations are important for the dissection of common diseases. We report the results of analyses of data on 405 SNPs from 75 such genes and a 5.2 Mb chromosome, 22 genomic region in 1871 individuals from diverse 55 endogamous Indian populations. These include 32 large (>10 million individuals) and 23 isolated populations, representing a large fraction of the people of India. We observe high levels of genetic divergence between groups of populations that cluster largely on the basis of ethnicity and language. Indian populations not only overlap with the diversity of HapMap populations, but also contain population groups that are genetically distinct. These data and results are useful for addressing stratification and study design issues in complex traits especially for heterogeneous populations.
Notes:
Gajinder Pal Singh, Debasis Dash (2008)  How expression level influences the disorderness of proteins.   Biochem Biophys Res Commun 371: 3. 401-404 Jul  
Abstract: Intrinsically disordered proteins are very common in eukaryotes and thus understanding functional roles and factors which influence protein disorderness becomes very important. In this work, we ask whether global properties not directly related to the function of the proteins, like expression level and avoidance of aggregation, influence disorderness of proteins. We found that proteins expressed at higher levels tend to be less disordered, even within the same functional class. We also found that the correlation between expression level and evolutionary rate was significantly reduced for disordered proteins indicating the role of disorderness in preventing aggregation of highly expressed proteins, which are more susceptible to misfolding due to translational errors. We reconcile these seemingly opposing results based on the observation that the correlation between expression level and disorderness was significantly less for proteins involved in binding functions, suggesting that highly expressed proteins involved in binding functions utilize disordered regions to avoid aggregation. Our results show that disorderness is not just influenced by functional properties of proteins, but also by properties not directly related to their functions like expression level and avoidance of aggregation.
Notes:
2007
Gajinder Pal Singh, Mythily Ganapathi, Debasis Dash (2007)  Role of intrinsic disorder in transient interactions of hub proteins.   Proteins 66: 4. 761-765 Mar  
Abstract: Hubs in the protein-protein interaction network have been classified as "party" hubs, which are highly correlated in their mRNA expression with their partners while "date" hubs show lesser correlation. In this study, we explored the role of intrinsic disorder in date and party hub interactions. The data reveals that intrinsic disorder is significantly enriched in date hub proteins when compared with party hub proteins. Intrinsic disorder has been largely implicated in transient binding interactions. The disorder to order transition, which occurs during binding interactions in disordered regions, renders the interaction highly reversible while maintaining the high specificity. The enrichment of intrinsic disorder in date hubs may facilitate transient interactions, which might be required for date hubs to interact with different partners at different times.
Notes:
S M Vidyasagar, Mande, S Rajgopal, B Gopalkrishnan, S T P T Srinivas, C Uma, Maheswara Rao, T Kathiravan, K Mastanarao, S Narendranath, S Rohini, A Irshad, T Murali, C Subrahmanyam, T Mona, S Sankha, V Priya, D Suman, V V Raja Rao, P Nageswara Rao, R Issaac, H Yashodeep, B Arundhoti, G Nishant, S Jignesh, K S Chaitanya, S P V Prasad Reddy, P Chakraborty, S E Hasnain, S Mande, A Nagarajaram, A Ranjan, M S Acharya, M Anwaruddin, S K Arun, Gyanrajkumar, D Kumar, S Priya, S Ranjan, B R Reddi, J Seshadri, P SravanKumar, S Swaminathan, P Umadevi, V Vindal, S Vijaykrishnan, A K Saxena, A Dixit, P Prathipati, S K Kashaw, C Mandal, S Bag, N Balakrishnan, M Bansal, N R Chandra, M R N Murthy, S Ramakumar, K Sekar, N Srinivasan, K Suguna, S Vishveshwara, R Anandhi, Bhadra, S Das, P Hansia, S Hariharaputran, J Jeyakani, R Karthikeyan, R K Pandey, C S Swamy, B Vasanthakumar, P V Balaji, R Y Patel, B Jayaram, S A Shaikh, P P Chakrabarti, A Banerjee, A Chakrabarti, R L Karandikar, P Chaudhuri, G P S Raghava, A Ghosh, M Bansal, N Paramsivam, S K Brahmachari, D Dash, C Balasubramaniam, A Basu, P Biswas, M Hariharan, R Mathur, K S Sandhu, V Scaria, R Shankar, P J Narayanan, V Jain, Nirnimesh, S Krishnaswamy, V Alaguraj, R Marikkannu, A V S K Mohan Katta, N Krishnan, K V Srividhya, P J Eswari, P V Bharatam, P Iqbal, D Bhattacharyya, G R Desiraju, J J Kumar, M Ravikumar, M Gautham, P A Prasad, D Bharanidharan (2007)  BioSuite: A comprehensive bioinformatics software package (A unique industry–academia collaboration)   Current science Vol. 92: NO. 1,. 29 - 38 Jan  
Abstract:
Notes:
Gajinder Pal Singh, Debasis Dash (2007)  Intrinsic disorder in yeast transcriptional regulatory network.   Proteins 68: 3. 602-605 Aug  
Abstract: Intrinsic disorder has been shown to be important in mediating protein-protein and protein-DNA interactions. Proteins involved in regulatory functions are particularly enriched in intrinsic disorder. In this study we explored the role of intrinsic disorder in transcriptional regulatory network of yeast. Using disorder prediction programs we show that transcription factors (TFs) regulating large number of targets (transcriptional hubs) have significantly increased intrinsic disorder, though targets regulated by multiple TFs did not show increased intrinsic disorder. Intrinsic disorder may allow these transcriptional hubs to bind to diverse promoter regions of their targets in different contexts, and may also allow complex regulatory control of transcriptional hubs that are involved in coordinating different cellular processes.
Notes:
Kuljeet Singh Sandhu, Debasis Dash (2007)  Dynamic alpha-helices: conformations that do not conform.   Proteins 68: 1. 109-122 Jul  
Abstract: Structural transitions are important for the stability and function of proteins, but these phenomena are poorly understood. An extensive analysis of Protein Data Bank entries reveals 103 regions in proteins with a tendency to transform from helical to nonhelical conformation and vice versa. We find that these dynamic helices, unlike other helices, are depleted in hydrophobic residues. Furthermore, the dynamic helices have higher surface accessibility and conformational mobility (P-value = 3.35e-07) than the rigid helices. Contact analyses show that these transitions result from protein-ligand, protein-nucleic acid, and crystal-contacts. The immediate structural environment differs quantitatively (P-value = 0.003) as well as qualitatively in the two alternate conformations. Often, dynamic helix experiences more contacts in its helical conformation than in the nonhelical counterpart (P-value = 0.001). There is differential preference for the type of short contacts observed in two conformational states. We also demonstrate that the regions in protein that can undergo such large conformational transitions can be predicted with a reasonable accuracy using logistic regression model of supervised learning. Our findings have implications in understanding the molecular basis of structural transitions that are coupled with binding and are important for the function and stability of the protein. Based on our observations, we propose that several functionally relevant regions on the protein surface can switch over their conformation from coil to helix and vice-versa, to regulate the recognition and binding of their partner and hence these may work as "molecular switches" in the proteins to regulate certain biological process. Our results supports the idea that protein structure-function paradigm should transform from static to a highly dynamic one.
Notes:
2006
Kuljeet Singh Sandhu, Debasis Dash (2006)  Conformational flexibility may explain multiple cellular roles of PEST motifs.   Proteins 63: 4. 727-732 Jun  
Abstract: PEST sequences are one of the major motifs that serve as signal for the protein degradation and are also involved in various cellular processes such as phosphorylation and protein-protein interaction. In our earlier study, we found that these motifs contribute largely to eukaryotic protein disorder. This observation led us to evaluate their conformational variability in the nonredundant Protein Data Bank (PDB) structures. For this purpose, crystallographic temperature factors, structural alignment of multiple NMR models, and dihedral angle order parameters have been used in this study. The study has revealed the hypermobility of PEST motifs as compared to other regions of the protein. Conformational flexibility may allow them to participate in number of molecular interactions under different conditions. This analysis may explain the role of protein backbone flexibility in bringing about multiple cellular roles of PEST motifs.
Notes:
Gajinder Pal Singh, Mythily Ganapathi, Kuljeet Singh Sandhu, Debasis Dash (2006)  Intrinsic unstructuredness and abundance of PEST motifs in eukaryotic proteomes.   Proteins 62: 2. 309-315 Feb  
Abstract: The study of unfolded protein regions has gained importance because of their prevalence and important roles in various cellular functions. These regions have characteristically high net charge and low hydrophobicity. The amino acid sequence determines the intrinsic unstructuredness of a region and, therefore, efforts are ongoing to delineate the sequence motifs, which might contribute to protein disorder. We find that PEST motifs are enriched in the characterized disordered regions as compared with globular ones. Analysis of representative PDB chains revealed very few structures containing PEST sequences and the majority of them lacked regular secondary structure. A proteome-wide study in completely sequenced eukaryotes with predicted unfolded and folded proteins shows that PEST proteins make up a large fraction of unfolded dataset as compared with the folded proteins. Our data also reveal the prevalence of PEST proteins in eukaryotic proteomes (approximately 25%). Functional classification of the PEST-containing proteins shows an over- and under-representation in proteins involved in regulation and metabolism, respectively. Furthermore, our analysis shows that predicted PEST regions do not exhibit any preference to be localized in the C terminals of proteins, as reported earlier.
Notes:
2005
(2005)  The Indian Genome Variation database (IGVdb): a project overview.   Hum Genet 118: 1. 1-11 Oct  
Abstract: Indian population, comprising of more than a billion people, consists of 4693 communities with several thousands of endogamous groups, 325 functioning languages and 25 scripts. To address the questions related to ethnic diversity, migrations, founder populations, predisposition to complex disorders or pharmacogenomics, one needs to understand the diversity and relatedness at the genetic level in such a diverse population. In this backdrop, six constituent laboratories of the Council of Scientific and Industrial Research (CSIR), with funding from the Government of India, initiated a network program on predictive medicine using repeats and single nucleotide polymorphisms. The Indian Genome Variation (IGV) consortium aims to provide data on validated SNPs and repeats, both novel and reported, along with gene duplications, in over a thousand genes, in 15,000 individuals drawn from Indian subpopulations. These genes have been selected on the basis of their relevance as functional and positional candidates in many common diseases including genes relevant to pharmacogenomics. This is the first large-scale comprehensive study of the structure of the Indian population with wide-reaching implications. A comprehensive platform for Indian Genome Variation (IGV) data management, analysis and creation of IGVdb portal has also been developed. The samples are being collected following ethical guidelines of Indian Council of Medical Research (ICMR) and Department of Biotechnology (DBT), India. This paper reveals the structure of the IGV project highlighting its various aspects like genesis, objectives, strategies for selection of genes, identification of the Indian subpopulations, collection of samples and discovery and validation of genetic markers, data analysis and monitoring as well as the project's data release policy.
Notes:
Tulika Prakash, C Ramakrishnan, Debasis Dash, Samir K Brahmachari (2005)  Conformational analysis of invariant peptide sequences in bacterial genomes.   J Mol Biol 345: 5. 937-955 Feb  
Abstract: The functional significance of evolutionarily conserved motifs/patterns of short regions in proteins is well documented. Although a large number of sequences are conserved, only a small fraction of these are invariant across several organisms. Here, we have examined the structural features of the functionally important peptide sequences, which have been found invariant across diverse bacterial genera. Ramachandran angles (phi,psi) have been used to analyze the conformation, folding patterns and geometrical location (buried/exposed) of these invariant peptides in different crystal structures harboring these sequences. The analysis indicates that the peptides preferred a single conformation in different protein structures, with the exception of only a few longer peptides that exhibited some conformational variability. In addition, it is noticed that the variability of conformation occurs mainly due to flipping of peptide units about the virtual C(alpha)...C(alpha) bond. However, for a given invariant peptide, the folding patterns are found to be similar in almost all the cases. Over and above, such peptides are found to be buried in the protein core. Thus, we can safely conclude that these invariant peptides are structurally important for the proteins, since they acquire unique structures across different proteins and can act as structural determinants (SD) of the proteins. The location of these SD peptides on the protein chain indicated that most of them are clustered towards the N-terminal and middle region of the protein with the C-terminal region exhibiting low preference. Another feature that emerges out of this study is that some of these SD peptides can also play the roles of "fold boundaries" or "hinge nucleus" in the protein structure. The study indicates that these SD peptides may act as chain-reversal signatures, guiding the proteins to adopt appropriate folds. In some cases the invariant signature peptides may also act as folding nuclei (FN) of the proteins.
Notes:
2004
Neeraj Pandey, Mythily Ganapathi, Kaushal Kumar, Dipayan Dasgupta, Sushanta Kumar Das Sutar, Debasis Dash (2004)  Comparative analysis of protein unfoldedness in human housekeeping and non-housekeeping proteins.   Bioinformatics 20: 17. 2904-2910 Nov  
Abstract: Absence of any regular structure is increasingly being observed in structural studies of proteins. These disordered regions or random coils, which have been observed under physiological conditions, are indicators of protein plasticity. The wide variety of interactions possible due to the flexibility of these 'natively disordered' regions confers functional advantage to the protein and the organism in general. This concept is underscored by the increasing proportion of intrinsically unstructured proteins seen with the ascension in the complexity of the organisms. The 'natively unfolded/disordered' state of the protein can be predicted utilizing Uversky's or Dunker's algorithm. We utilized Uversky's prediction scheme and based on the unique position of a protein in the charge-hydrophobicity plot, a derived net score was used to predict the overall disorder of the human housekeeping and non-housekeeping proteins. Substantial numbers of proteins in both the classes were predicted to be unfolded. However, comparative genomic analysis of predicted unfolded Homo sapiens proteins with homologues in Caenorhabditis elegans, Drosophila melanogaster and Mus musculus revealed significant increase in unfoldedness in non-housekeeping proteins in comparison with housekeeping proteins. Our analysis in the evolutionary context suggests addition or substitution of amino acid residues which favour unfoldedness in non-housekeeping proteins compared to housekeeping proteins.
Notes:
Rachit Bakshi, Tulika Prakash, Debasis Dash, Vani Brahmachari (2004)  In silico characterization of the INO80 subfamily of SWI2/SNF2 chromatin remodeling proteins.   Biochem Biophys Res Commun 320: 1. 197-204 Jul  
Abstract: Proteins belonging to SNF2 family of DNA dependent ATPases are important members of the chromatin remodeling complexes that are implicated in epigenetic control of gene expression. The yeast Ino80, the catalytic ATPase subunit of the INO80 complex, is the most recently described member of the SNF2 family. Outside the conserved ATPase domain, it has very little similarity with other well-characterized SNF2 proteins hence it is believed to represent a new subfamily. We have identified new members of this subfamily in different organisms and have detected characteristic features of this subfamily. Using various data mining tools we have identified a new, previously undetected domain in all members of this subfamily. This domain designated DBINO is characteristic of the INO80 subfamily and is predicted to have DNA-binding function. The presence of this domain in all the INO80 subfamily proteins from different organisms suggests its conserved function in evolution.
Notes:
Ramakant Sharma, Jitendra Kumar Maheshwari, Tulika Prakash, Debasis Dash, Samir K Brahmachari (2004)  Recognition and analysis of protein-coding genes in severe acute respiratory syndrome associated coronavirus.   Bioinformatics 20: 7. 1074-1080 May  
Abstract: MOTIVATION: The recent outbreak of severe acute respiratory syndrome (SARS) caused by SARS coronavirus (SARS-CoV) has necessitated an in-depth molecular understanding of the virus to identify new drug targets. The availability of complete genome sequence of several strains of SARS virus provides the possibility of identification of protein-coding genes and defining their functions. Computational approach to identify protein-coding genes and their putative functions will help in designing experimental protocols. RESULTS: In this paper, a novel analysis of SARS genome using gene prediction method GeneDecipher developed in our laboratory has been presented. Each of the 18 newly sequenced SARS-CoV genomes has been analyzed using GeneDecipher. In addition to polyprotein 1ab(1), polyprotein 1a and the four genes coding for major structural proteins spike (S), small envelope (E), membrane (M) and nucleocapsid (N), six to eight additional proteins have been predicted depending upon the strain analyzed. Their lengths range between 61 and 274 amino acids. Our method also suggests that polyprotein 1ab, polyprotein 1a, S, M and N are proteins of viral origin and others are of prokaryotic. Putative functions of all predicted protein-coding genes have been suggested using conserved peptides present in their open reading frames. AVAILABILITY: Detailed results of GeneDecipher analysis of all the 18 strains of SARS-CoV genomes are available at http://www.igib.res.in/sarsanalysis.html
Notes:
Tulika Prakash, Mamta Khandelwal, Dipayan Dasgupta, Debasis Dash, Samir K Brahmachari (2004)  CoPS: Comprehensive Peptide Signature database.   Bioinformatics 20: 16. 2886-2888 Nov  
Abstract: We present the development of a Comprehensive database of 12 076 invariant Peptide Signatures (CoPS) derived from 52 bacterial genomes with a minimum occurrence in at least seven organisms. These peptides were observed in functionally similar proteins and are distributed over nearly 1250 different functional proteins. The database provides function, structure and occurrence in biochemical pathways of the proteins containing these signature peptides. It houses additional information on the signature peptides, such as identical match in other motif/pattern (e.g. PROSITE, BLOCKS, PRINTS and Pfam) databases and the database of interacting proteins, human proteome and mutation effect on these signature peptides. There is a wide applicability of this database in the identification of critical functional residues in proteins. The database also facilitates the identification of folding nucleus/structural determinants in proteins and functional assignment to yet unknown proteins. We demonstrate functional assignment to 2605 hypothetical proteins in bacterial genomes and 112 unknown proteins in human using this database. AVAILABILITY: The database can be freely accessed through the following URL: http://203.195.151.46/copsv2/index.html or http://203.90.127.70/copsv2/index.html
Notes:
2003
Tannistha Nandi, Debasis Dash, Rohit Ghai, Chandrika B-Rao, K Kannan, Samir K Brahmachari, C Ramakrishnan, Srinivasan Ramachandran (2003)  A novel complexity measure for comparative analysis of protein sequences from complete genomes.   J Biomol Struct Dyn 20: 5. 657-668 Apr  
Abstract: Analysis of sequence complexities of proteins is an important step in the characterization and classification of new genomes. A new measure has been proposed to compute sequence complexity in protein sequences based on linguistic complexity. The algorithm requires a single parameter, is computationally simple and provides a framework for comparative genomic analysis. Protein sequences were classified into groups of high or low complexity based on a quantitative measure termed F(c), which is proportional to the fraction of low complexity sequence present in the protein. The algorithm was tested on sequences of 196 non-homologous proteins whose crystal structures are available at </=2.0 A resolution. Protein sequences of high complexity had 'globular' structures (95% agreement), whereas those of low complexity had non-globular structures (80% agreement). Application of this measure to proteins of unknown structure/function from different genomes revealed that the sequences of high complexity constitute the majority in all genomes (about 90% in Archaea, about 93% in Eubacteria, 89% in Saccharomyces cerevisiae and 90% in Caenorhabditis elegans). Aeropyrum pernix among Archaeae and Deinococcus radiodurans among Eubacteria have the lowest fraction of high complexity proteins (75% and 80% respectively). Further, it was observed that a few bacterial pathogens (Mycobacterium tuberculosis, Pseudomonas aeruginosa) have high fraction of low complexity proteins. The program ScanCom is available from the authors as a PERL script (UNIX system).
Notes:
Somdutta Sen, Debasis Dash, Santosh Pasha, Samir K Brahmachari (2003)  Role of histidine interruption in mitigating the pathological effects of long polyglutamine stretches in SCA1: A molecular approach.   Protein Sci 12: 5. 953-962 May  
Abstract: Polyglutamine expansions, leading to aggregation, have been implicated in various neurodegenerative disorders. The range of repeats observed in normal individuals in most of these diseases is 19-36, whereas mutant proteins carry 40-81 repeats. In one such disorder, spinocerebellar ataxia (SCA1), it has been reported that certain individuals with expanded polyglutamine repeats in the disease range (Q(12)HQHQ(12)HQHQ(14/15)) but with histidine interruptions were found to be phenotypically normal. To establish the role of histidine, a comparative study of conformational properties of model peptide sequences with (Q(12)HQHQ(12)HQHQ(12)) and without (Q(42)) interruptions is presented here. Q(12)HQHQ(12)HQHQ(12) displays greater solubility and lesser aggregation propensity compared to uninterrupted Q(42) as well as much shorter Q(22). The solvent and temperature-driven conformational transitions (beta structure <--> random coil --> alpha helix) displayed by these model polyQ stretches is also discussed in the present report. The study strengthens our earlier hypothesis of the importance of histidine interruptions in mitigating the pathogenicity of expanded polyglutamine tract at the SCA1 locus. The relatively lower propensity for aggregation observed in case of histidine interrupted stretches even in the disease range suggests that at a very low concentration, the protein aggregation in normal cells, is possibly not initiated at all or the disease onset is significantly delayed. Our present study also reveals that besides histidine interruption, proline interruption in polyglutamine stretches can lower their aggregation propensity.
Notes:
2002
Chitra Chauhan, Debasis Dash, Deepak Grover, Jaya Rajamani, Mitali Mukerji (2002)  Origin and instability of GAA repeats: insights from Alu elements.   J Biomol Struct Dyn 20: 2. 253-263 Oct  
Abstract: Expansion of GAA repeats in the intron of the frataxin gene is involved in the autosomal recessive Friedreich's ataxia (FRDA). The GAA repeats arise from a stretch of adenine residues of an Alu element. These repeats have a size ranging from 7- 38 in the normal population, and expand to thousands in the affected individuals. The mechanism of origin of GAA repeats, their polymorphism and stability are not well understood. In this study, we have carried out an extensive analysis of GAA repeats at several loci in the humans. This analysis indicates the association of a majority of GAA repeats with the 3' end of an "A" stretch present in the Alu repeats. Further, the prevalence of GAA repeats correlates with the evolutionary age of Alu subfamilies as well as with their relative frequency in the genome. Our study on GAA repeat polymorphism at some loci in the normal population reveals that the length of the GAA repeats is determined by the relative length of the flanking A stretch. Based on these observations, a possible mechanism for origin of GAA repeats and modulatory effects of flanking sequences on repeat instability mediated by DNA triplex is proposed.
Notes:
Ritushree Kukreti, Debasis Dash, K E Vineetha, Sanchita Chakravarty, Swapan Kr Das, Madhusnata De, Geeta Talukder (2002)  Spectrum of beta-thalassemia mutations and their association with allelic sequence polymorphisms at the beta-globin gene cluster in an Eastern Indian population.   Am J Hematol 70: 4. 269-277 Aug  
Abstract: In this report, the spectrum of beta-thalassemia mutations and genotype-to-phenotype correlations were defined in large number of patients (beta-thalassemia carriers and major) with varying disease severity in an Eastern Indian population mainly from the state of West Bengal. The five most common beta-thalassemia mutations were detected, which included IVS1-5 (G-->C), codon 15 (G-->A), codon 26 (G-->A), codon 30 (G-->C), and codon 41/42 (-TCTT). These accounted for 85% in 80 beta-thalassemic alleles deciphered from 56 patients, including beta-thalassemia major and carriers, and 15% of alleles remained uncharacterized in these patients. Expression of the human beta-globin gene is regulated by an array of cis-acting DNA elements, including five DNase I hypersensitive sites (HSs) in the locus control region (LCR), promoters that incorporate certain silencer elements, and enhancers at 3' of the beta-globin gene. For detailed studies and to understand the molecular basis of beta-thalassemia, we studied two groups of subjects: a group of 12 patients from four families having beta-thalassemia major and carrier phenotype and a control group of 26 healthy individuals. In these two groups, we examined portions of the beta-globin gene locus control region HSs 1, 2, 3, and 4, which included the (CA)(x)(TA)(y) repeat motif, the (AT)(x)N(y)(AT)(z) repeat motif, the inverted repeat sequence TGGGGACCCCA, the promoter region of the (G)gamma-globin gene, an (AT)(x)(T)(y) repeat 5' of the silencer region, and the beta-globin gene and its 3' flanking region. We investigated the allelic sequence polymorphisms in these regions and their association with the beta-thalassemia mutations to know the possible genotype-phenotype relationship in beta-thalassemia patients. An analysis of cis-acting regulatory regions showed varied sequence haplotypes associated with some frequent beta-thalassemia mutations in this Eastern Indian population.
Notes:
2001
Q Saleem, D Dash, C Gandhi, A Kishore, V Benegal, T Sherrin, O Mukherjee, S Jain, S K Brahmachari (2001)  Association of CAG repeat loci on chromosome 22 with schizophrenia and bipolar disorder.   Mol Psychiatry 6: 6. 694-700 Nov  
Abstract: Chromosome 22 has been implicated in schizophrenia and bipolar disorder in a number of linkage, association and cytogenetic studies. Recent evidence has also implicated CAG repeat tract expansion in these diseases. In order to explore the involvement of CAG repeats on chromosome 22 in these diseases, we have created an integrated map of all CAG repeats > or =5 on this chromosome together with microsatellite markers associated with these diseases using the recently completed nucleotide sequence of chromosome 22. Of the 52 CAG repeat loci identified in this manner, four of the longest repeat stretches in regions previously implicated by linkage analyses were chosen for further study. Three of the four repeat containing loci, were found in the coding region with the CAG repeats coding for glutamine and were expressed in the brain. All the loci studied showed varying degrees of polymorphism with one of the loci exhibiting two alleles of 7 and 8 CAG repeats. The 8-repeat allele at this locus was significantly overrepresented in both schizophrenia and bipolar patient groups when compared to ethnically matched controls, while alleles at the other three loci did not show any such difference. The repeat lies within a gene which shows homology to an androgen receptor related apoptosis protein in rat. We have also identified other candidate genes in the vicinity of this locus. Our results suggest that the repeats within this gene or other genes in the vicinity of this locus are likely to be implicated in bipolar disorder and schizophrenia.
Notes:
S S Pataskar, D Dash, S K Brahmachari (2001)  Progressive myoclonus epilepsy [EPM1] repeat d(CCCCGCCCCGCG)n forms folded hairpin structures at physiological pH.   J Biomol Struct Dyn 19: 2. 293-305 Oct  
Abstract: The secondary structure of DNA has been shown to be an important component in the mechanism of expansion of the trinucleotide repeats that are associated with many neurodegenerative disorders. Recently, expansion of a dodecamer repeat, (CCCCGCCCCGCG)n upstream of cystatin B gene has been shown to be the most common mutation associated with Progressive Myoclonus Epilepsy (EPM1) of Unverricht-Lundborg type. We have investigated structure of oligonucleotides containing one, two and three copies of the EPM1 repeat sequences at physiological pH. CD spectra and anomalous faster gel electrophoretic mobilty indicates formation of intramolecularly folded structures that are formed independent of concentration. Hydroxylamine probing allowed us to identify the C residues that are involved in C.G base pairing. P1 nuclease studies elucidated the presence of unpaired regions in the folded back structures. UV melting studies show biphasic melting curves for the oligonucleotides containing two and three EPM1 repeats. Our data suggests multiple hairpin structures for two and three repeat containing oligonucleotides. In this paper we show that oligonucleotides containing EPM1 repeat adopt secondary structures that may facilitate strand slippage thereby causing the expansion.
Notes:
S S Pataskar, D Dash, S K Brahmachari (2001)  Intramolecular i-motif structure at acidic pH for progressive myoclonus epilepsy (EPM1) repeat d(CCCCGCCCCGCG)n.   J Biomol Struct Dyn 19: 2. 307-313 Oct  
Abstract: The most common mutation associated with Progressive Myoclonus Epilepsy (EPM1) of Unverricht-Lundberg type is the expansion of a dodecamer repeat, d(CCCCGCCCCGCG)n. We show that the C-rich strand of this repeat (2-3 copies) forms intercalated i-motif structure at acidic pH as judged by CD spectroscopy and anomalous gel electrophoretic mobility. The stability of the structure increases with the increase in the length of the repeat. Transient formation of stable, folded back structure like i-motif could play an important role in the mechanism of expansion of this repeat.
Notes:
 
Abstract:
Notes:
Powered by PublicationsList.org.