hosted by
publicationslist.org
    

Thorsten Schmidt


dr.thorsten.schmidt@googlemail.com

Journal articles

2010
Tobias B Haack, Katharina Danhauser, Birgit Haberberger, Jonathan Hoser, Valentina Strecker, Detlef Boehm, Graziella Uziel, Eleonora Lamantea, Federica Invernizzi, Joanna Poulton, Boris Rolinski, Arcangela Iuso, Saskia Biskup, Thorsten Schmidt, Hans-Werner Mewes, Ilka Wittig, Thomas Meitinger, Massimo Zeviani, Holger Prokisch (2010)  Exome sequencing identifies ACAD9 mutations as a cause of complex I deficiency.   Nat Genet Nov  
Abstract: An isolated defect of respiratory chain complex I activity is a frequent biochemical abnormality in mitochondrial disorders. Despite intensive investigation in recent years, in most instances, the molecular basis underpinning complex I defects remains unknown. We report whole-exome sequencing of a single individual with severe, isolated complex I deficiency. This analysis, followed by filtering with a prioritization of mitochondrial proteins, led us to identify compound heterozygous mutations in ACAD9, which encodes a poorly understood member of the mitochondrial acyl-CoA dehydrogenase protein family. We demonstrated the pathogenic role of the ACAD9 variants by the correction of the complex I defect on expression of the wildtype ACAD9 protein in fibroblasts derived from affected individuals. ACAD9 screening of 120 additional complex I-defective index cases led us to identify two additional unrelated cases and a total of five pathogenic ACAD9 alleles.
Notes:
2009
Thorsten Schmidt, Hans-Werner Mewes, Volker Stümpflen (2009)  A novel putative miRNA target enhancer signal.   PLoS One 4: 7. 07  
Abstract: It is known that miRNA target sites are very short and the effect of miRNA-target site interaction alone appears as being unspecific. Recent experiments suggest further context signals involved in miRNA target site recognition and regulation. Here, we present a novel GC-rich RNA motif downstream of experimentally supported miRNA target sites in human mRNAs with no similarity to previously reported functional motifs. We demonstrate that the novel motif can be found in at least one third of all transcripts regulated by miRNAs. Furthermore, we show that motif occurrence and the frequency of miRNA target sites as well as the stability of their duplex structures correlate. The finding, that the novel motif is significantly associated with miRNA target sites, suggests a functional role of the motif in miRNA target site biology. Beyond, the novel motif has the impact to improve prediction of miRNA target sites significantly.
Notes:
Petr O Ilyinskii, Thorsten Schmidt, Dmitry Lukashev, Anatoli B Meriin, Galini Thoidis, Dmitrij Frishman, Alexander M Shneider (2009)  Importance of mRNA secondary structural elements for the expression of influenza virus genes.   OMICS 13: 5. 421-430 Oct  
Abstract: Development of novel vaccines and therapeutics often requires efficient expression of recombinant viral proteins. Here we show that mutations in essential functional regions of conserved influenza proteins NP and NS1, lead to reduced expression of these genes in vitro. According to in silico analysis, these mRNA regions possess distinct secondary structures sensitive to mutations. We identified a novel structural feature within a region in NS1 mRNA that encodes amino acids essential for NS1 function. Mutations altering this mRNA element lead to significantly reduced protein expression. Conversely, expression was not affected by mutations resulting in amino acid substitutions, when they were designed to preserve this secondary RNA structural element. Furthermore, altering this structure significantly reduced RNA transcription without affecting mRNA stability. Therefore, distinct internal secondary structures of viral mRNA may be important for viral gene expression. If such elements encode amino acids essential for the protein function, then early selection against mutations in this region will be beneficial for the virus. This might point at yet another mechanism of viral evolution, especially for RNA viruses. Finally, introducing mutations into viral genes while preserving their secondary RNA structure, suggests a new method for the generation of efficiently expressed recombinants of viral proteins.
Notes:
2008
Yasushi Ishihama, Thorsten Schmidt, Juri Rappsilber, Matthias Mann, F Ulrich Hartl, Michael J Kerner, Dmitrij Frishman (2008)  Protein abundance profiling of the Escherichia coli cytosol.   BMC Genomics 9: 02  
Abstract: BACKGROUND: Knowledge about the abundance of molecular components is an important prerequisite for building quantitative predictive models of cellular behavior. Proteins are central components of these models, since they carry out most of the fundamental processes in the cell. Thus far, protein concentrations have been difficult to measure on a large scale, but proteomic technologies have now advanced to a stage where this information becomes readily accessible. RESULTS: Here, we describe an experimental scheme to maximize the coverage of proteins identified by mass spectrometry of a complex biological sample. Using a combination of LC-MS/MS approaches with protein and peptide fractionation steps we identified 1103 proteins from the cytosolic fraction of the Escherichia coli strain MC4100. A measure of abundance is presented for each of the identified proteins, based on the recently developed emPAI approach which takes into account the number of sequenced peptides per protein. The values of abundance are within a broad range and accurately reflect independently measured copy numbers per cell. As expected, the most abundant proteins were those involved in protein synthesis, most notably ribosomal proteins. Proteins involved in energy metabolism as well as those with binding function were also found in high copy number while proteins annotated with the terms metabolism, transcription, transport, and cellular organization were rare. The barrel-sandwich fold was found to be the structural fold with the highest abundance. Highly abundant proteins are predicted to be less prone to aggregation based on their length, pI values, and occurrence patterns of hydrophobic stretches. We also find that abundant proteins tend to be predominantly essential. Additionally we observe a significant correlation between protein and mRNA abundance in E. coli cells. CONCLUSION: Abundance measurements for more than 1000 E. coli proteins presented in this work represent the most complete study of protein abundance in a bacterial cell so far. We show significant associations between the abundance of a protein and its properties and functions in the cell. In this way, we provide both data and novel insights into the role of protein concentration in this model organism.
Notes:
Martin Irmler, Daniela Hartl, Thorsten Schmidt, Johannes Schuchhardt, Christiane Lach, Helmut E Meyer, Martin Hrabé de Angelis, Joachim Klose, Johannes Beckers (2008)  An approach to handling and interpretation of ambiguous data in transcriptome and proteome comparisons.   Proteomics 8: 6. 1165-1169 Mar  
Abstract: A major challenge towards a comprehensive analysis of biological systems is the integration of data from different "omics" sources and their interpretation at a functional level. Here we address this issue by analysing transcriptomic and proteomic datasets from mouse brain tissue at embryonic days 9.5 and 13.5. We observe a high concordance between transcripts and their corresponding proteins when they were compared at the level of expression ratios between embryonic stages. Absolute expression values show marginal correlation. We show in examples, that poor concordance between protein and transcript expression is in part explained by the fact, that single genes give rise to multiple transcripts and protein variants. The integration of transcriptomic and proteomic data therefore requires proper handling of such ambiguities. A closer inspection of such cases in our datasets suggests, that comparing gene expression at exon level instead of gene level could improve the comparability. To address the biological relevance of differences in expression profiles, literature-data mining and analysis of gene ontology terms are widely used. We show here, that this can be complemented by the inspection of physical properties of genes, transcripts, and proteins.
Notes:
Brigitte Waegele, Thorsten Schmidt, H Werner Mewes, Andreas Ruepp (2008)  OREST: the online resource for EST analysis.   Nucleic Acids Res 36: Web Server issue. W140-W144 Jul  
Abstract: The generation of expressed sequence tag (EST) libraries offers an affordable approach to investigate organisms, if no genome sequence is available. OREST (http://mips.gsf.de/genre/proj/orest/index.html) is a server-based EST analysis pipeline, which allows the rapid analysis of large amounts of ESTs or cDNAs from mammalia and fungi. In order to assign the ESTs to genes or proteins OREST maps DNA sequences to reference datasets of gene products and in a second step to complete genome sequences. Mapping against genome sequences recovers additional 13% of EST data, which otherwise would escape further analysis. To enable functional analysis of the datasets, ESTs are functionally annotated using the hierarchical FunCat annotation scheme as well as GO annotation terms. OREST also allows to predict the association of gene products and diseases by Morbid Map (OMIM) classification. A statistical analysis of the results of the dataset is possible with the included PROMPT software, which provides information about enrichment and depletion of functional and disease annotation terms. OREST was successfully applied for the identification and functional characterization of more than 3000 EST sequences of the common marmoset monkey (Callithrix jacchus) as part of an international collaboration.
Notes:
Andreas Ruepp, Barbara Brauner, Irmtraud Dunger-Kaltenbach, Goar Frishman, Corinna Montrone, Michael Stransky, Brigitte Waegele, Thorsten Schmidt, Octave Noubibou Doudieu, Volker Stümpflen, H Werner Mewes (2008)  CORUM: the comprehensive resource of mammalian protein complexes.   Nucleic Acids Res 36: Database issue. D646-D650 Jan  
Abstract: Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The CORUM (http://mips.gsf.de/genre/proj/corum/index.html) database is a collection of experimentally verified mammalian protein complexes. Information is manually derived by critical reading of the scientific literature from expert annotators. Information about protein complexes includes protein complex names, subunits, literature references as well as the function of the complexes. For functional annotation, we use the FunCat catalogue that enables to organize the protein complex space into biologically meaningful subsets. The database contains more than 1750 protein complexes that are built from 2400 different genes, thus representing 12% of the protein-coding genes in human. A web-based system is available to query, view and download the data. CORUM provides a comprehensive dataset of protein complexes for discoveries in systems biology, analyses of protein networks and protein complex-associated diseases. Comparable to the MIPS reference dataset of protein complexes from yeast, CORUM intends to serve as a reference for mammalian protein complexes.
Notes:
Thorsten Schmidt, Dmitrij Frishman (2008)  Assignment of isochores for all completely sequenced vertebrate genomes using a consensus.   Genome Biol 9: 6. 06  
Abstract: We show that although the currently available isochore mapping methods agree on the isochore classification of about two-thirds of the human DNA, they produce significantly different results with regard to the location of isochore boundaries and isochore length distribution. We present a new consensus isochore assignment method based on majority voting and provide IsoBase, a comprehensive on-line database of isochore maps for all completely sequenced vertebrate genomes.
Notes:
Philip Wong, Sonja Althammer, Andrea Hildebrand, Andreas Kirschner, Philipp Pagel, Bernd Geissler, Pawel Smialowski, Florian Blöchl, Matthias Oesterheld, Thorsten Schmidt, Normann Strack, Fabian J Theis, Andreas Ruepp, Dmitrij Frishman (2008)  An evolutionary and structural characterization of mammalian protein complex organization.   BMC Genomics 9: 12  
Abstract: BACKGROUND: We have recently released a comprehensive, manually curated database of mammalian protein complexes called CORUM. Combining CORUM with other resources, we assembled a dataset of over 2700 mammalian complexes. The availability of a rich information resource allows us to search for organizational properties concerning these complexes. RESULTS: As the complexity of a protein complex in terms of the number of unique subunits increases, we observed that the number of such complexes and the mean non-synonymous to synonymous substitution ratio of associated genes tend to decrease. Similarly, as the number of different complexes a given protein participates in increases, the number of such proteins and the substitution ratio of the associated gene also tends to decrease. These observations provide evidence relating natural selection and the organization of mammalian complexes. We also observed greater homogeneity in terms of predicted protein isoelectric points, secondary structure and substitution ratio in annotated versus randomly generated complexes. A large proportion of the protein content and interactions in the complexes could be predicted from known binary protein-protein and domain-domain interactions. In particular, we found that large proteins interact preferentially with much smaller proteins. CONCLUSION: We observed similar trends in yeast and other data. Our results support the existence of conserved relations associated with the mammalian protein complexes.
Notes:
Alexey V Antonov, Thorsten Schmidt, Yu Wang, Hans W Mewes (2008)  ProfCom: a web tool for profiling the complex functionality of gene groups identified from high-throughput data.   Nucleic Acids Res 36: Web Server issue. W347-W351 Jul  
Abstract: ProfCom is a web-based tool for the functional interpretation of a gene list that was identified to be related by experiments. A trait which makes ProfCom a unique tool is an ability to profile enrichments of not only available Gene Ontology (GO) terms but also of 'complex functions'. A 'Complex function' is constructed as Boolean combination of available GO terms. The complex functions inferred by ProfCom are more specific in comparison to single terms and describe more accurately the functional role of genes. ProfCom provides a user friendly dialog-driven web page submission available for several model organisms and supports most available gene identifiers. In addition, the web service interface allows the submission of any kind of annotation data. ProfCom is freely available at http://webclu.bio.wzw.tum.de/profcom/.
Notes:
2007
M Louise Riley, Thorsten Schmidt, Irena I Artamonova, Christian Wagner, Andreas Volz, Klaus Heumann, Hans-Werner Mewes, Dmitrij Frishman (2007)  PEDANT genome database: 10 years online.   Nucleic Acids Res 35: Database issue. D354-D357 Jan  
Abstract: The PEDANT genome database provides exhaustive annotation of 468 genomes by a broad set of bioinformatics algorithms. We describe recent developments of the PEDANT Web server. The all-new Graphical User Interface (GUI) implemented in Javatrade mark allows for more efficient navigation of the genome data, extended search capabilities, user customization and export facilities. The DNA and Protein viewers have been made highly dynamic and customizable. We also provide Web Services to access the entire body of PEDANT data programmatically. Finally, we report on the application of association rule mining for automatic detection of potential annotation errors. PEDANT is freely accessible to academic users at http://pedant.gsf.de.
Notes:
2006
Thorsten Schmidt, Dmitrij Frishman (2006)  PROMPT: a protein mapping and comparison tool.   BMC Bioinformatics 7: 07  
Abstract: BACKGROUND: Comparison of large protein datasets has become a standard task in bioinformatics. Typically researchers wish to know whether one group of proteins is significantly enriched in certain annotation attributes or sequence properties compared to another group, and whether this enrichment is statistically significant. In order to conduct such comparisons it is often required to integrate molecular sequence data and experimental information from disparate incompatible sources. While many specialized programs exist for comparisons of this kind in individual problem domains, such as expression data analysis, no generic software solution capable of addressing a wide spectrum of routine tasks in comparative proteomics is currently available. RESULTS: PROMPT is a comprehensive bioinformatics software environment which enables the user to compare arbitrary protein sequence sets, revealing statistically significant differences in their annotation features. It allows automatic retrieval and integration of data from a multitude of molecular biological databases as well as from a custom XML format. Similarity-based mapping of sequence IDs makes it possible to link experimental information obtained from different sources despite discrepancies in gene identifiers and minor sequence variation. PROMPT provides a full set of statistical procedures to address the following four use cases: i) comparison of the frequencies of categorical annotations between two sets, ii) enrichment of nominal features in one set with respect to another one, iii) comparison of numeric distributions, and iv) correlation of numeric variables. Analysis results can be visualized in the form of plots and spreadsheets and exported in various formats, including Microsoft Excel. CONCLUSION: PROMPT is a versatile, platform-independent, easily expandable, stand-alone application designed to be a practical workhorse in analysing and mining protein sequences and associated annotation. The availability of the Java Application Programming Interface and scripting capabilities on one hand, and the intuitive Graphical User Interface with context-sensitive help system on the other, make it equally accessible to professional bioinformaticians and biologically-oriented users. PROMPT is freely available for academic users from http://webclu.bio.wzw.tum.de/prompt/.
Notes:
Pawel Smialowski, Thorsten Schmidt, Jürgen Cox, Andreas Kirschner, Dmitrij Frishman (2006)  Will my protein crystallize? A sequence-based predictor.   Proteins 62: 2. 343-355 Feb  
Abstract: We propose a machine-learning approach to sequence-based prediction of protein crystallizability in which we exploit subtle differences between proteins whose structures were solved by X-ray analysis [or by both X-ray and nuclear magnetic resonance (NMR) spectroscopy] and those proteins whose structures were solved by NMR spectroscopy alone. Because the NMR technique is usually applied on relatively small proteins, sequence length distributions of the X-ray and NMR datasets were adjusted to avoid predictions biased by protein size. As feature space for classification, we used frequencies of mono-, di-, and tripeptides represented by the original 20-letter amino acid alphabet as well as by several reduced alphabets in which amino acids were grouped by their physicochemical and structural properties. The classification algorithm was constructed as a two-layered structure in which the output of primary support vector machine classifiers operating on peptide frequencies was combined by a second-level Naive Bayes classifier. Due to the application of metamethods for cost sensitivity, our method is able to handle real datasets with unbalanced class representation. An overall prediction accuracy of 67% [65% on the positive (crystallizable) and 69% on the negative (noncrystallizable) class] was achieved in a 10-fold cross-validation experiment, indicating that the proposed algorithm may be a valuable tool for more efficient target selection in structural genomics. A Web server for protein crystallizability prediction called SECRET is available at http://webclu.bio.wzw.tum.de:8080/secret.
Notes:
2005
M Louise Riley, Thorsten Schmidt, Christian Wagner, Hans-Werner Mewes, Dmitrij Frishman (2005)  The PEDANT genome database in 2005.   Nucleic Acids Res 33: Database issue. D308-D310 Jan  
Abstract: The PEDANT genome database (http://pedant.gsf.de) contains pre-computed bioinformatics analyses of publicly available genomes. Its main mission is to provide robust automatic annotation of the vast majority of amino acid sequences, which have not been subjected to in-depth manual curation by human experts in high-quality protein sequence databases. By design PEDANT annotation is genome-oriented, making it possible to explore genomic context of gene products, and evaluate functional and structural content of genomes using a category-based query mechanism. At present, the PEDANT database contains exhaustive annotation of over 1,240,000 proteins from 270 eubacterial, 23 archeal and 41 eukaryotic genomes.
Notes:

Conference papers

2007
Powered by PublicationsList.org.