hosted by
publicationslist.org
    

Joerg Schultz

Prof. J. Schultz
Dept. of Bioinformatics
Biozentrum, Am Hubland
Universität Würzburg
D-97074 Wuerzburg
Germany
Joerg.Schultz@biozentrum.uni-wuerzburg.de

Journal articles

2010
Stefan Pinkert, Jörg Schultz, Jörg Reichardt (2010)  Protein interaction networks--more than mere modules.   PLoS Comput Biol 6: 1. 01  
Abstract: It is widely believed that the modular organization of cellular function is reflected in a modular structure of molecular networks. A common view is that a "module" in a network is a cohesively linked group of nodes, densely connected internally and sparsely interacting with the rest of the network. Many algorithms try to identify functional modules in protein-interaction networks (PIN) by searching for such cohesive groups of proteins. Here, we present an alternative approach independent of any prior definition of what actually constitutes a "module". In a self-consistent manner, proteins are grouped into "functional roles" if they interact in similar ways with other proteins according to their functional roles. Such grouping may well result in cohesive modules again, but only if the network structure actually supports this. We applied our method to the PIN from the Human Protein Reference Database (HPRD) and found that a representation of the network in terms of cohesive modules, at least on a global scale, does not optimally represent the network's structure because it focuses on finding independent groups of proteins. In contrast, a decomposition into functional roles is able to depict the structure much better as it also takes into account the interdependencies between roles and even allows groupings based on the absence of interactions between proteins in the same functional role. This, for example, is the case for transmembrane proteins, which could never be recognized as a cohesive group of nodes in a PIN. When mapping experimental methods onto the groups, we identified profound differences in the coverage suggesting that our method is able to capture experimental bias in the data, too. For example yeast-two-hybrid data were highly overrepresented in one particular group. Thus, there is more structure in protein-interaction networks than cohesive modules alone and we believe this finding can significantly improve automated function prediction algorithms.
Notes:
Alexander Keller, Frank Förster, Tobias Müller, Thomas Dandekar, Jörg Schultz, Matthias Wolf (2010)  Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees.   Biol Direct 5: 01  
Abstract: BACKGROUND: In several studies, secondary structures of ribosomal genes have been used to improve the quality of phylogenetic reconstructions. An extensive evaluation of the benefits of secondary structure, however, is lacking. RESULTS: This is the first study to counter this deficiency. We inspected the accuracy and robustness of phylogenetics with individual secondary structures by simulation experiments for artificial tree topologies with up to 18 taxa and for divergency levels in the range of typical phylogenetic studies. We chose the internal transcribed spacer 2 of the ribosomal cistron as an exemplary marker region. Simulation integrated the coevolution process of sequences with secondary structures. Additionally, the phylogenetic power of marker size duplication was investigated and compared with sequence and sequence-structure reconstruction methods. The results clearly show that accuracy and robustness of Neighbor Joining trees are largely improved by structural information in contrast to sequence only data, whereas a doubled marker size only accounts for robustness. CONCLUSIONS: Individual secondary structures of ribosomal RNA sequences provide a valuable gain of information content that is useful for phylogenetics. Thus, the usage of ITS2 sequence together with secondary structure for taxonomic inferences is recommended. Other reconstruction methods as maximum likelihood, bayesian inference or maximum parsimony may equally profit from secondary structure inclusion. REVIEWERS: This article was reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin. OPEN PEER REVIEW: Reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin. For the full reviews, please go to the Reviewers' comments section.
Notes:
2009
Seidl, Schultz (2009)  Evolutionary flexibility of protein complexes.   BMC Evol Biol 9: 1. Jul  
Abstract: ABSTRACT: BACKGROUND: Proteins play a key role in cellular life. They do not act alone but are organised in complexes. Throughout the life of a cell, complexes are dynamic in their composition due to attachments and shared components. Experimental and computational evidence indicate that consecutive addition and secondary losses of components played a major role in the evolution of some complexes, mostly without affecting the core function. Here, we analysed in a large scale approach whether this flexibility in evolution is only limited to a distinct number of complexes or represents a more general trend. RESULTS: Focussing on human protein complexes, we based our analysis on a manually curated dataset from HPRD. In total, 1,060 complexes with 6,136 proteins from 2,187 unique genes were considered. We computed interologs in 25 different species and predicted the composition of complexes. Over the analysed species, the composition of most complexes was highly flexible and only 25% of all genes were never lost. Even if one component was lost at a particular point in time, the fraction of observed second, independent losses of additional components was high (75% of all complexes affected). Still, loss of whole complexes happened rarely. This biological signal deviated significantly from random models. We exemplified this trend on the anaphase promoting complex (APC) where a core is highly conserved throughout all metazoans, but flexibility in certain components is observable. CONCLUSIONS: Consecutive additions and losses of distinct units is a fundamental process in the evolution of protein complexes. These evolutionary events affecting genes coding for units in human protein complexes showed a significantly different phylogenetic pattern compared to randomly selected genes. Determination of taxon specific attachments or losses might be linked to specific cellular or morphological features. Thus, protein complexes contain not only structural and functional, but also evolutionary cores.
Notes:
Alexander Keller, Tina Schleicher, Jörg Schultz, Tobias Müller, Thomas Dandekar, Matthias Wolf (2009)  5.8S-28S rRNA interaction and HMM-based ITS2 annotation.   Gene 430: 1-2. 50-57 Feb  
Abstract: The internal transcribed spacer 2 (ITS2) of the nuclear ribosomal repeat unit is one of the most commonly applied phylogenetic markers. It is a fast evolving locus, which makes it appropriate for studies at low taxonomic levels, whereas its secondary structure is well conserved, and tree reconstructions are possible at higher taxonomic levels. However, annotation of start and end positions of the ITS2 differs markedly between studies. This is a severe shortcoming, as prediction of a correct secondary structure by standard ab initio folding programs requires accurate identification of the marker in question. Furthermore, the correct structure is essential for multiple sequence alignments based on individual structural features. The present study describes a new tool for the delimitation and identification of the ITS2. It is based on hidden Markov models (HMMs) and verifies annotations by comparison to a conserved structural motif in the 5.8S/28S rRNA regions. Our method was able to identify and delimit the ITS2 in more than 30000 entries lacking start and end annotations in GenBank. Furthermore, 45000 ITS2 sequences with a questionable annotation were re-annotated. Approximately 30000 entries from the ITS2-DB, that uses a homology-based method for structure prediction, were re-annotated. We show that the method is able to correctly annotate an ITS2 as small as 58 nt from Giardia lamblia and an ITS2 as large as 1160 nt from humans. Thus, our method should be a valuable guide during the first and crucial step in any ITS2-based phylogenetic analysis: the delineation of the correct sequence. Sequences can be submitted to the following website for HMM-based ITS2 delineation: http://its2.bioapps.biozentrum.uni-wuerzburg.de.
Notes:
Schwalie, Schultz (2009)  Positive Selection in Tick Saliva Proteins of the Salp15 Family.   J Mol Evol Jan  
Abstract: When taking their blood meal on the mammalian host, ticks transfer a multitude of different proteins from their saliva into the host. Some of these proteins are hijacked by pathogens for their own purposes. Borrelia burgdorferi, the Lyme disease agent, is critically dependent on the presence of the tick protein Salp15 when infecting the host. Similarly, Anaplasma phagocytophilum, which causes anaplasmosis, needs Salp16, a homologue of Salp15, to get transferred from the host into the tick. Here we analyzed whether adaptive evolution has shaped the Salp15 protein family. Using site-specific estimates of K(A)/K(S) ratios, we identified different positions within the Salp15 protein family which have undergone a phase of positive selection. Additionally, we analyzed the B. burgdorferi protein interacting with Salp15, OspC. Again, sites showing signs of positive selection were identified, although they are more likely a result of the antigenic features of OspC than of the influence of Salp15. The identification of probably functionally relevant sites in the Salp15 family might direct the detailed experimental analysis of their interaction with human and bacterial proteins.
Notes:
Kroiss, Fischer, Schultz (2009)  When one plus one equals three: Biochemistry and bioinformatics combine to answer complex questions.   Fly (Austin) 3: 3. Jul  
Abstract: The availability of whole genome assemblies from evolutionarily distant species and iterative search algorithms has boosted ortholog analyses. However, orthology per se is not a sufficient predictor of a specific function. In a recent study, we have combined bioinformatic analysis and biochemistry to study the evolution of the multi-component SMN (Survival Motor Neuron)-complex. This macromolecular machinery performs essential steps during the assembly of spliceosomal UsnRNPs. By orthology, many factors constituting the SMN-complex in humans developed early in evolution. Some were secondarily lost in Drosophila. Compositional investigation of the Drosophila SMN-complex by biochemistry revealed the absence of two predicted orthologs although the complex was functional. Their bioinformatical re-assessment showed rapid sequence divergence indicating loss of evolutionary pressure in Drosophila. As a tool to better understand the function of individual proteins in multimeric molecular machineries, we therefore advocate iterative combination of bioinformatics with biochemical or functional assessment.
Notes:
Jörg Schultz, Matthias Wolf (2009)  ITS2 sequence-structure analysis in phylogenetics: a how-to manual for molecular systematics.   Mol Phylogenet Evol 52: 2. 520-523 Aug  
Abstract: The information that can be obtained from the secondary structure of the nuclear ribosomal internal transcribed spacer 2 (ITS2) is substantial, and yet many studies exploit this information inconsistently or inappropriately. This review introduces a remedy in the form of a flowchart where we detail the steps involved in estimating structure-based phylogenetic trees from ITS2 data. The pipeline described consists of the ITS2 Database, 4SALE, the CBCAnalyzer, and ProfDistS. Based on these tools, we describe how to utilize ITS2 sequence and secondary structure information together with an ITS2 specific scoring matrix and an ITS2 specific substitution model. The phylogenetic results thus obtained have been shown to be more reliable than approaches based on primary sequence data alone. Moreover, compensatory base changes (CBCs) in ITS2 sequence-structure pairs are identified as a possible marker for distinguishing species.
Notes:
Koetschan, Förster, Keller, Schleicher, Ruderisch, Schwarz, Müller, Wolf, Schultz (2009)  The ITS2 Database III--sequences and structures for phylogeny.   Nucleic Acids Res Nov  
Abstract: The internal transcribed spacer 2 (ITS2) is a widely used phylogenetic marker. In the past, it has mainly been used for species level classifications. Nowadays, a wider applicability becomes apparent. Here, the conserved structure of the RNA molecule plays a vital role. We have developed the ITS2 Database (http://its2.bioapps.biozentrum.uni-wuerzburg.de) which holds information about sequence, structure and taxonomic classification of all ITS2 in GenBank. In the new version, we use Hidden Markov models (HMMs) for the identification and delineation of the ITS2 resulting in a major redesign of the annotation pipeline. This allowed the identification of more than 160 000 correct full length and more than 50 000 partial structures. In the web interface, these can now be searched with a modified BLAST considering both sequence and structure, enabling rapid taxon sampling. Novel sequences can be annotated using the HMM based approach and modelled according to multiple template structures. Sequences can be searched for known and newly identified motifs. Together, the database and the web server build an exhaustive resource for ITS2 based phylogenetic analyses.
Notes:
Roland Schwarz, Philipp N Seibel, Sven Rahmann, Christoph Schoen, Mirja Huenerberg, Clemens Müller-Reible, Thomas Dandekar, Rachel Karchin, Jörg Schultz, Tobias Müller (2009)  Detecting species-site dependencies in large multiple sequence alignments.   Nucleic Acids Res 37: 18. 5959-5968 Oct  
Abstract: Multiple sequence alignments (MSAs) are one of the most important sources of information in sequence analysis. Many methods have been proposed to detect, extract and visualize their most significant properties. To the same extent that site-specific methods like sequence logos successfully visualize site conservations and sequence-based methods like clustering approaches detect relationships between sequences, both types of methods fail at revealing informational elements of MSAs at the level of sequence-site interactions, i.e. finding clusters of sequences and sites responsible for their clustering, which together account for a high fraction of the overall information of the MSA. To fill this gap, we present here a method that combines the Fisher score-based embedding of sequences from a profile hidden Markov model (pHMM) with correspondence analysis. This method is capable of detecting and visualizing group-specific or conflicting signals in an MSA and allows for a detailed explorative investigation of alignments of any size tractable by pHMMs. Applications of our methods are exemplified on an alignment of the Neisseria surface antigen LP2086, where it is used to detect sites of recombinatory horizontal gene transfer and on the vitamin K epoxide reductase family to distinguish between evolutionary and functional signals.
Notes:
Benjamin Georgi, Jörg Schultz, Alexander Schliep (2009)  Partially-supervised protein subclass discovery with simultaneous annotation of functional residues.   BMC Struct Biol 9: 10  
Abstract: BACKGROUND: The study of functional subfamilies of protein domain families and the identification of the residues which determine substrate specificity is an important question in the analysis of protein domains. One way to address this question is the use of clustering methods for protein sequence data and approaches to predict functional residues based on such clusterings. The locations of putative functional residues in known protein structures provide insights into how different substrate specificities are reflected on the protein structure level. RESULTS: We have developed an extension of the context-specific independence mixture model clustering framework which allows for the integration of experimental data. As these are usually known only for a few proteins, our algorithm implements a partially-supervised learning approach. We discover domain subfamilies and predict functional residues for four protein domain families: phosphatases, pyridoxal dependent decarboxylases, WW and SH3 domains to demonstrate the usefulness of our approach. CONCLUSION: The partially-supervised clustering revealed biologically meaningful subfamilies even for highly heterogeneous domains and the predicted functional residues provide insights into the basis of the different substrate specificities.
Notes:
Felix Bemm, Roland Schwarz, Frank Förster, Jörg Schultz (2009)  A kinome of 2600 in the ciliate Paramecium tetraurelia.   FEBS Lett 583: 22. 3589-3592 Nov  
Abstract: Protein kinases play a crucial role in the regulation of cellular processes. Most eukaryotes reserve about 2.5% of their genes for protein kinases. We analysed the genome of the single-celled ciliate Paramecium tetraurelia and identified 2606 kinases, about 6.6% of its genes, representing the largest kinome to date. A gene tree combined with human kinases revealed a massive expansion of the calcium calmodulin regulated subfamily, underlining the importance of calcium in the physiology of P. tetraurelia. The kinases are embedded in only 40 domain architectures, contrasting 134 in human. This might indicate different mechanisms to achieve target specificity.
Notes:
2008
Blenk, Engelmann, Pinkert, Weniger, Schultz, Rosenwald, Mueller-Hermelink, Muller, Dandekar (2008)  Explorative data analysis of MCL reveals gene expression networks implicated in survival and prognosis supported by explorative CGH analysis.   BMC Cancer 8: 1. Apr  
Abstract: ABSTRACT: BACKGROUND: Mantle cell lymphoma (MCL) is an incurable B cell lymphoma and accounts for 6% of all non-Hodgkin's lymphomas. On the genetic level, MCL is characterized by the hallmark translocation t(11;14) that is present in most cases with few exceptions. Both gene expression and comparative genomic hybridization (CGH) data vary considerably between patients with implications for their prognosis. METHODS: We compare patients over and below the median of survival. Exploratory principal component analysis of gene expression data showed that the second principal component correlates well with patient survival. Explorative analysis of CGH data shows the same correlation. RESULTS: On chromosome 7 and 9 specific genes and bands are delineated which improve prognosis prediction independent of the previously described proliferation signature. We identify a compact survival predictor of seven genes for MCL patients. After extensive re-annotation using GEPAT, we established protein networks correlating with prognosis. Well known genes (CDC2, CCND1) and further proliferation markers (WEE1, CDC25, aurora kinases, BUB1, PCNA, E2F1) form a tight interaction network, but also non-proliferative genes (SOCS1, TUBA1B CEBPB) are shown to be associated with prognosis. Furthermore we show that aggressive MCL implicates a gene network shift to higher expressed genes in late cell cycle states and refine the set of non-proliferative genes implicated with bad prognosis in MCL. CONCLUSIONS: The results from explorative data analysis of gene expression and CGH data are complementary to each other. Including further tests such as Wilcoxon rank test we point both to proliferative and non-proliferative gene networks implicated in inferior prognosis of MCL and identify suitable markers both in gene expression and CGH data.
Notes:
Matthias Kroiss, Jörg Schultz, Julia Wiesner, Ashwin Chari, Albert Sickmann, Utz Fischer (2008)  Evolution of an RNP assembly system: a minimal SMN complex facilitates formation of UsnRNPs in Drosophila melanogaster.   Proc Natl Acad Sci U S A 105: 29. 10045-10050 Jul  
Abstract: In vertebrates, assembly of spliceosomal uridine-rich small nuclear ribonucleoproteins (UsnRNPs) is mediated by the SMN complex, a macromolecular entity composed of the proteins SMN and Gemins 2-8. Here we have studied the evolution of this machinery using complete genome assemblies of multiple model organisms. The SMN complex has gained complexity in evolution by a blockwise addition of Gemins onto an ancestral core complex composed of SMN and Gemin2. In contrast to this overall evolutionary trend to more complexity in metazoans, orthologs of most Gemins are missing in dipterans. In accordance with these bioinformatic data a previously undescribed biochemical purification strategy elucidated that the dipteran Drosophila melanogaster contains an SMN complex of remarkable simplicity. Surprisingly, this minimal complex not only mediates the assembly reaction in a manner very similar to its vertebrate counterpart, but also prevents misassembly onto nontarget RNAs. Our data suggest that only a minority of Gemins are required for the assembly reaction per se, whereas others may serve additional functions in the context of UsnRNP biogenesis. The evolution of the SMN complex is an interesting example of how the simplification of a biochemical process contributes to genome compaction.
Notes:
Matthias Wolf, Benjamin Ruderisch, Thomas Dandekar, Jörg Schultz, Tobias Müller (2008)  ProfDistS: (profile-) distance based phylogeny on sequence--structure alignments.   Bioinformatics 24: 20. 2401-2402 Oct  
Abstract: MOTIVATION: The Profile Neighbor Joining (PNJ) algorithm as implemented in the software ProfDist is computationally efficient in reconstructing very large trees. Besides the huge amount of sequence data the structure is important in RNA alignment analysis and phylogenetic reconstruction. RESULTS: For this ProfDistS provides a phylogenetic workflow that uses individual RNA secondary structures in reconstructing phylogenies based on sequence-structure alignments-using PNJ with manual or iterative and automatic profile definition. Moreover, ProfDistS can deal also with protein sequences.
Notes:
Christian Selig, Matthias Wolf, Tobias Müller, Thomas Dandekar, Jörg Schultz (2008)  The ITS2 Database II: homology modelling RNA structure for molecular systematics.   Nucleic Acids Res 36: Database issue. D377-D380 Jan  
Abstract: An increasing number of phylogenetic analyses are based on the internal transcribed spacer 2 (ITS2). They mainly use the fast evolving sequence for low-level analyses. When considering the highly conserved structure, the same marker could also be used for higher level phylogenies. Furthermore, structural features of the ITS2 allow distinguishing different species from each other. Despite its importance, the correct structure is only rarely found by standard RNA folding algorithms. To overcome this hindrance for a wider application of the ITS2, we have developed a homology modelling approach to predict the structure of RNA and present the results of modelling the ITS2 in the ITS2 Database. Here, we describe the database and the underlying algorithms which allowed us to predict the structure for 86 784 sequences, which is more than 55% of all GenBank entries concerning the ITS2. These are not equally distributed over all genera. There is a substantial amount of genera where the structure of nearly all sequences is predicted whereas for others no structure at all was found despite high sequence coverage. These genera might have evolved an ITS2 structure diverging from the standard one. The current version of the ITS2 Database can be accessed via http://its2.bioapps.biozentrum.uni-wuerzburg.de.
Notes:
2007
Markus Weniger, Julia C Engelmann, Jörg Schultz (2007)  Genome Expression Pathway Analysis Tool--analysis and visualization of microarray gene expression data under genomic, proteomic and metabolic context.   BMC Bioinformatics 8: 06  
Abstract: BACKGROUND: Regulation of gene expression is relevant to many areas of biology and medicine, in the study of treatments, diseases, and developmental stages. Microarrays can be used to measure the expression level of thousands of mRNAs at the same time, allowing insight into or comparison of different cellular conditions. The data derived out of microarray experiments is highly dimensional and often noisy, and interpretation of the results can get intricate. Although programs for the statistical analysis of microarray data exist, most of them lack an integration of analysis results and biological interpretation. RESULTS: We have developed GEPAT, Genome Expression Pathway Analysis Tool, offering an analysis of gene expression data under genomic, proteomic and metabolic context. We provide an integration of statistical methods for data import and data analysis together with a biological interpretation for subsets of probes or single probes on the chip. GEPAT imports various types of oligonucleotide and cDNA array data formats. Different normalization methods can be applied to the data, afterwards data annotation is performed. After import, GEPAT offers various statistical data analysis methods, as hierarchical, k-means and PCA clustering, a linear model based t-test or chromosomal profile comparison. The results of the analysis can be interpreted by enrichment of biological terms, pathway analysis or interaction networks. Different biological databases are included, to give various information for each probe on the chip. GEPAT offers no linear work flow, but allows the usage of any subset of probes and samples as a start for a new data analysis. GEPAT relies on established data analysis packages, offers a modular approach for an easy extension, and can be run on a computer grid to allow a large number of users. It is freely available under the LGPL open source license for academic and commercial users at http://gepat.sourceforge.net. CONCLUSION: GEPAT is a modular, scalable and professional-grade software integrating analysis and interpretation of microarray gene expression data. An installation available for academic users can be found at http://gepat.bioapps.biozentrum.uni-wuerzburg.de.
Notes:
Tobias Müller, Nicole Philippi, Thomas Dandekar, Jörg Schultz, Matthias Wolf (2007)  Distinguishing species.   RNA 13: 9. 1469-1472 Sep  
Abstract: Given two organisms, how can one distinguish whether they belong to the same species or not? This might be straightforward for two divergent organisms, but can be extremely difficult and laborious for closely related ones. A molecular marker giving a clear distinction would therefore be of immense benefit. The internal transcribed spacer 2 (ITS2) has been widely used for low-level phylogenetic analyses. Case studies revealed that a compensatory base change (CBC) in the helix II or helix III ITS2 secondary structure between two organisms correlated with sexual incompatibility. We analyzed more than 1300 closely related species to test whether this correlation is generally applicable. In 93%, where a CBC was found between organisms classified within the same genus, they belong to different species. Thus, a CBC in an ITS2 sequence-structure alignment is a sufficient condition to distinguish even closely related species.
Notes:
Zemojtel, Penzkofer, Schultz, Dandekar, Badge, Vingron (2007)  Exonization of active mouse L1s: a driver of transcriptome evolution?   BMC Genomics 8: 1. Oct  
Abstract: ABSTRACT: BACKGROUND: Long interspersed nuclear elements (LINE-1s, L1s) have been recently implicated in the regulation of mammalian transcriptomes. RESULTS: Here, we show that members of the three active mouse L1 subfamilies (A, GF and TF) contain, in addition to those on their sense strands, conserved functional splice sites on their antisense strands, which trigger multiple exonization events. The latter is particularly intriguing in the light of the strong antisense orientation bias of intronic L1s, implying that the toleration of antisense insertions results in an increased potential for exonization. CONCLUSIONS: In a genome-wide analysis, we have uncovered evidence suggesting that the mobility of the large number of retrotransposition-competent mouse L1s (~2400 potentially active L1s in NCBIm35) has significant potential to shape the mouse transcriptome by continuously generating insertions into transcriptional units.
Notes:
Matthias Wolf, Christian Selig, Tobias Müller, Nicole Philippi, Thomas Dandekar, Jörg Schultz (2007)  Placozoa: at least two   Biologia 62: 6. 641-645  
Abstract: Abstract: It was shown that compensatory base changes (CBCs) in internal transcribed spacer 2 (ITS2) sequence-structure alignments can be used for distinguishing species. Using the ITS2 Database in combination with 4SALE - a tool for synchronous RNA sequence and secondary structure alignment and editing - in this study we present an in-depth CBC analysis for placozoan ITS2 sequences and their respective secondary structures. This analysis indicates at least two distinct species in Trichoplax (Placozoa) supporting a recently suggested hypothesis, that Placozoa is âno longer a phylum of oneâ.
Notes:
2006
T K B Gandhi, Jun Zhong, Suresh Mathivanan, L Karthick, K N Chandrika, S Sujatha Mohan, Salil Sharma, Stefan Pinkert, Shilpa Nagaraju, Balamurugan Periaswamy, Goparani Mishra, Kannabiran Nandakumar, Beiyi Shen, Nandan Deshpande, Rashmi Nayak, Malabika Sarker, Jef D Boeke, Giovanni Parmigiani, Jörg Schultz, Joel S Bader, Akhilesh Pandey (2006)  Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets.   Nat Genet 38: 3. 285-293 Mar  
Abstract: We present the first analysis of the human proteome with regard to interactions between proteins. We also compare the human interactome with the available interaction datasets from yeast (Saccharomyces cerevisiae), worm (Caenorhabditis elegans) and fly (Drosophila melanogaster). Of >70,000 binary interactions, only 42 were common to human, worm and fly, and only 16 were common to all four datasets. An additional 36 interactions were common to fly and worm but were not observed in humans, although a coimmunoprecipitation assay showed that 9 of the interactions do occur in humans. A re-examination of the connectivity of essential genes in yeast and humans indicated that the available data do not support the presumption that the number of interaction partners can accurately predict whether a gene is essential. Finally, we found that proteins encoded by genes mutated in inherited genetic disorders are likely to interact with proteins known to cause similar disorders, suggesting the existence of disease subnetworks. The human interaction map constructed from our analysis should facilitate an integrative systems biology approach to elucidating the cellular networks that contribute to health and disease states.
Notes:
Ivica Letunic, Richard R Copley, Birgit Pils, Stefan Pinkert, Jörg Schultz, Peer Bork (2006)  SMART 5: domains in the context of genomes and networks.   Nucleic Acids Res 34: Database issue. D257-D260 Jan  
Abstract: The Simple Modular Architecture Research Tool (SMART) is an online resource (http://smart.embl.de/) used for protein domain identification and the analysis of protein domain architectures. Many new features were implemented to make SMART more accessible to scientists from different fields. The new 'Genomic' mode in SMART makes it easy to analyze domain architectures in completely sequenced genomes. Domain annotation has been updated with a detailed taxonomic breakdown and a prediction of the catalytic activity for 50 SMART domains is now available, based on the presence of essential amino acids. Furthermore, intrinsically disordered protein regions can be identified and displayed. The network context is now displayed in the results page for more than 350 000 proteins, enabling easy analyses of domain interactions.
Notes:
Jörg Schultz, Tobias Müller, Marco Achtziger, Philipp N Seibel, Thomas Dandekar, Matthias Wolf (2006)  The internal transcribed spacer 2 database--a web server for (not only) low level phylogenetic analyses.   Nucleic Acids Res 34: Web Server issue. W704-W707 Jul  
Abstract: The internal transcribed spacer 2 (ITS2) is a phylogenetic marker which has been of broad use in generic and infrageneric level classifications, as its sequence evolves comparably fast. Only recently, it became clear, that the ITS2 might be useful even for higher level systematic analyses. As the secondary structure is highly conserved within all eukaryotes it serves as a valuable template for the construction of highly reliable sequence-structure alignments, which build a fundament for subsequent analyses. Thus, any phylogenetic study using ITS2 has to consider both sequence and structure. We have integrated a homology based RNA structure prediction algorithm into a web server, which allows the detection and secondary structure prediction for ITS2 in any given sequence. Furthermore, the resource contains more than 25,000 pre-calculated secondary structures for the currently known ITS2 sequences. These can be taxonomically searched and browsed. Thus, our resource could become a starting point for ITS2-based phylogenetic analyses and is therefore complementary to databases of other phylogenetic markers, which focus on higher level analyses. The current version of the ITS2 database can be accessed via http://its2.bioapps.biozentrum.uni-wuerzburg.de.
Notes:
Philipp N Seibel, Tobias Müller, Thomas Dandekar, Jörg Schultz, Matthias Wolf (2006)  4SALE--a tool for synchronous RNA sequence and secondary structure alignment and editing.   BMC Bioinformatics 7: 11  
Abstract: BACKGROUND: In sequence analysis the multiple alignment builds the fundament of all proceeding analyses. Errors in an alignment could strongly influence all succeeding analyses and therefore could lead to wrong predictions. Hand-crafted and hand-improved alignments are necessary and meanwhile good common practice. For RNA sequences often the primary sequence as well as a secondary structure consensus is well known, e.g., the cloverleaf structure of the t-RNA. Recently, some alignment editors are proposed that are able to include and model both kinds of information. However, with the advent of a large amount of reliable RNA sequences together with their solved secondary structures (available from e.g. the ITS2 Database), we are faced with the problem to handle sequences and their associated secondary structures synchronously. RESULTS: 4SALE fills this gap. The application allows a fast sequence and synchronous secondary structure alignment for large data sets and for the first time synchronous manual editing of aligned sequences and their secondary structures. This study describes an algorithm for the synchronous alignment of sequences and their associated secondary structures as well as the main features of 4SALE used for further analyses and editing. 4SALE builds an optimal and unique starting point for every RNA sequence and structure analysis. CONCLUSION: 4SALE, which provides an user-friendly and intuitive interface, is a comprehensive toolbox for RNA analysis based on sequence and secondary structure information. The program connects sequence and structure databases like the ITS2 Database to phylogeny programs as for example the CBCAnalyzer. 4SALE is written in JAVA and therefore platform independent. The software is freely available and distributed from the website at http://4sale.bioapps.biozentrum.uni-wuerzburg.de.
Notes:
Torben Friedrich, Birgit Pils, Thomas Dandekar, Jörg Schultz, Tobias Müller (2006)  Modelling interaction sites in protein domains with interaction profile hidden Markov models.   Bioinformatics 22: 23. 2851-2857 Dec  
Abstract: MOTIVATION: Due to the growing number of completely sequenced genomes, functional annotation of proteins becomes a more and more important issue. Here, we describe a method for the prediction of sites within protein domains, which are part of protein-ligand interactions. As recently demonstrated, these sites are not trivial to detect because of a varying degree of conservation of their location and type within a domain family. RESULTS: The developed method for the prediction of protein-ligand interaction sites is based on a newly defined interaction profile hidden Markov model (ipHMM) topology that takes structural and sequence data into account. It is based on a homology search via a posterior decoding algorithm that yields probabilities for interacting sequence positions and inherits the efficiency and the power of the profile hidden Markov model (pHMM) methodology. The algorithm enhances the quality of interaction site predictions and is a suitable tool for large scale studies, which was already demonstrated for pHMMs. AVAILABILITY: The MATLAB-files are available on request from the first author.
Notes:
2005
Birgit Pils, Richard R Copley, Jörg Schultz (2005)  Variation in structural location and amino acid conservation of functional sites in protein domain families.   BMC Bioinformatics 6: 08  
Abstract: BACKGROUND: The functional sites of a protein present important information for determining its cellular function and are fundamental in drug design. Accordingly, accurate methods for the prediction of functional sites are of immense value. Most available methods are based on a set of homologous sequences and structural or evolutionary information, and assume that functional sites are more conserved than the average. In the analysis presented here, we have investigated the conservation of location and type of amino acids at functional sites, and compared the behaviour of functional sites between different protein domains. RESULTS: Functional sites were extracted from experimentally determined structural complexes from the Protein Data Bank harbouring a conserved protein domain from the SMART database. In general, functional (i.e. interacting) sites whose location is more highly conserved are also more conserved in their type of amino acid. However, even highly conserved functional sites can present a wide spectrum of amino acids. The degree of conservation strongly depends on the function of the protein domain and ranges from highly conserved in location and amino acid to very variable. Differentiation by binding partner shows that ion binding sites tend to be more conserved than functional sites binding peptides or nucleotides. CONCLUSION: The results gained by this analysis will help improve the accuracy of functional site prediction and facilitate the characterization of unknown protein sequences.
Notes:
Konrad Büssow, Christoph Scheich, Volker Sievert, Ulrich Harttig, Jörg Schultz, Bernd Simon, Peer Bork, Hans Lehrach, Udo Heinemann (2005)  Structural genomics of human proteins--target selection and generation of a public catalogue of expression clones.   Microb Cell Fact 4: Jul  
Abstract: BACKGROUND: The availability of suitable recombinant protein is still a major bottleneck in protein structure analysis. The Protein Structure Factory, part of the international structural genomics initiative, targets human proteins for structure determination. It has implemented high throughput procedures for all steps from cloning to structure calculation. This article describes the selection of human target proteins for structure analysis, our high throughput cloning strategy, and the expression of human proteins in Escherichia coli host cells. RESULTS AND CONCLUSION: Protein expression and sequence data of 1414 E. coli expression clones representing 537 different proteins are presented. 139 human proteins (18%) could be expressed and purified in soluble form and with the expected size. All E. coli expression clones are publicly available to facilitate further functional characterisation of this set of human proteins.
Notes:
Matthias Wolf, Marco Achtziger, Jörg Schultz, Thomas Dandekar, Tobias Müller (2005)  Homology modeling revealed more than 20,000 rRNA internal transcribed spacer 2 (ITS2) secondary structures.   RNA 11: 11. 1616-1623 Nov  
Abstract: Structural genomics meets phylogenetics and vice versa: Knowing rRNA secondary structures is a prerequisite for constructing rRNA alignments for inferring phylogenies, and inferring phylogenies is a precondition to understand the evolution of such rRNA secondary structures. Here, both scientific worlds go together. The rRNA internal transcribed spacer 2 (ITS2) region is a widely used phylogenetic marker. Because of its high variability at the sequence level, correct alignments have to take into account structural information. In this study, we examine the extent of the conservation in structure. We present (1) the homology modeled secondary structure of more than 20,000 ITS2 covering about 14,000 species; (2) a computational approach for homology modeling of rRNA structures, which additionally can be applied to other RNA families; and (3) a database providing about 25,000 ITS2 sequences with their associated secondary structures, a refined ITS2 specific general time reversible (GTR) substitution model, and a scoring matrix, available at http://its2.bioapps.biozentrum.uni-wuerzburg.de.
Notes:
Jörg Schultz, Stefanie Maisel, Daniel Gerlach, Tobias Müller, Matthias Wolf (2005)  A common core of secondary structure of the internal transcribed spacer 2 (ITS2) throughout the Eukaryota.   RNA 11: 4. 361-364 Apr  
Abstract: The ongoing characterization of novel species creates the need for a molecular marker which can be used for species- and, simultaneously, for mega-systematics. Recently, the use of the internal transcribed spacer 2 (ITS2) sequence was suggested, as it shows a high divergence in sequence with an assumed conservation in structure. This hypothesis was mainly based on small-scale analyses, comparing a limited number of sequences. Here, we report a large-scale analysis of more than 54,000 currently known ITS2 sequences with the goal to evaluate the hypothesis of a conserved structural core and to assess its use for automated large-scale phylogenetics. Structure prediction revealed that the previously described core structure can be found for more than 5000 sequences in a wide variety of taxa within the eukaryotes, indicating that the core secondary structure is indeed conserved. This conserved structure allowed an automated alignment of extremely divergent sequences as exemplified for the ITS2 sequences of a ctenophorean eumetazoon and a volvocalean green alga. All classified sequences, together with their structures can be accessed at http://www.biozentrum.uni-wuerzburg.de/bioinformatik/projects/ITS2.html. Furthermore, we found that, although sample sequences are known for most major taxa, there exists a profound divergence in coverage, which might become a hindrance for general usage. In summary, our analysis strengthens the potential of ITS2 as a general phylogenetic marker and provides a data source for further ITS2-based analyses.
Notes:
2004
Birgit Pils, Jörg Schultz (2004)  Inactive enzyme-homologues find new function in regulatory processes.   J Mol Biol 340: 3. 399-404 Jul  
Abstract: Although the catalytic center of an enzyme is usually highly conserved, there have been a few reports of proteins with substitutions at essential catalytic positions, which convert the enzyme into a catalytically inactive form. Here, we report a large-scale analysis of substitutions at enzymes' catalytic sites in order to gain insight into the function and evolution of inactive enzyme-homologues. Our analysis revealed that inactive enzyme-homologues are not an exception only found in single enzyme families, but that they are represented in a large variety of enzyme families and conserved among metazoan species. Even though they have lost their catalytic activity, they have adopted new functions and are now mainly involved in regulatory processes, as shown by several case studies. This modification of existing modules is an efficient mechanism to evolve new functions. The invention of inactive enzyme-homologues in metazoa has thereby led to an enhancement of complexity of regulatory networks.
Notes:
Jörg Schultz (2004)  HTTM, a horizontally transferred transmembrane domain.   Trends Biochem Sci 29: 1. 4-7 Jan  
Abstract: Sequence analysis of vitamin K-dependent gamma-carboxylases (VKGC) has revealed the presence of a novel domain, HTTM (for horizontally transferred transmembrane) in its N terminus. In contrast to most known domains, HTTM contains four transmembrane regions. Its occurrence in eukaryotes, bacteria and archaea is probably caused by horizontal gene transfer rather than by early evolution. The conservation of VKGC catalytic sites also indicates an enzymatic function for the other family members.
Notes:
Benjamin Schuster-Böckler, Jörg Schultz, Sven Rahmann (2004)  HMM Logos for visualization of protein families.   BMC Bioinformatics 5: Jan  
Abstract: BACKGROUND: Profile Hidden Markov Models (pHMMs) are a widely used tool for protein family research. Up to now, however, there exists no method to visualize all of their central aspects graphically in an intuitively understandable way. RESULTS: We present a visualization method that incorporates both emission and transition probabilities of the pHMM, thus extending sequence logos introduced by Schneider and Stephens. For each emitting state of the pHMM, we display a stack of letters. The stack height is determined by the deviation of the position's letter emission frequencies from the background frequencies. The stack width visualizes both the probability of reaching the state (the hitting probability) and the expected number of letters the state emits during a pass through the model (the state's expected contribution).A web interface offering online creation of HMM Logos and the corresponding source code can be found at the Logos web server of the Max Planck Institute for Molecular Genetics http://logos.molgen.mpg.de. CONCLUSIONS: We demonstrate that HMM Logos can be a useful tool for the biologist: We use them to highlight differences between two homologous subfamilies of GTPases, Rab and Ras, and we show that they are able to indicate structural elements of Ras.
Notes:
Birgit Pils, Jörg Schultz (2004)  Evolution of the multifunctional protein tyrosine phosphatase family.   Mol Biol Evol 21: 4. 625-631 Apr  
Abstract: The protein tyrosine phosphatase (PTP) family plays a central role in signal transduction pathways by controlling the phosphorylation state of serine, threonine, and tyrosine residues. PTPs can be divided into dual specificity phosphatases and the classical PTPs, which can comprise of one or two phosphatase domains. We studied amino acid substitutions at functional sites in the phosphatase domain and identified putative noncatalytic phosphatase domains in all subclasses of the PTP family. The presence of inactive phosphatase domains in all subclasses indicates that they were invented multiple times in evolution. Depending on the domain composition, loss of catalytic activity can result in different consequences for the function of the protein. Inactive single-domain phosphatases can still specifically bind substrate and protect it from dephosphorylation by other phosphatases. The inactive domains of tandem phosphatases can be further subdivided. The first class is more conserved, still able to bind phosphorylated tyrosine residues and might recruit multiphosphorylated substrates for the adjacent active domain. The second has accumulated several variable amino acid substitutions in the catalytic center, indicating a complete loss of tyrosine-binding capabilities. To study the impact of substitutions in the catalytic center to the evolution of the whole domain, we examined the evolutionary rates for each individual site and compared them between the classes. This analysis revealed a release of evolutionary constraint for multiple sites surrounding the catalytic center only in the second class, emphasizing its difference in function compared with the first class. Furthermore, we found a region of higher conservation common to both domain classes, suggesting a new regulatory center. We discuss the influence of evolutionary forces on the development of the phosphatase domain, which has led to additional functions, such as the specific protection of phosphorylated tyrosine residues, substrate recruitment, and regulation of the catalytic activity of adjacent domains.
Notes:
Ivica Letunic, Richard R Copley, Steffen Schmidt, Francesca D Ciccarelli, Tobias Doerks, Jörg Schultz, Chris P Ponting, Peer Bork (2004)  SMART 4.0: towards genomic data integration.   Nucleic Acids Res 32: Database issue. D142-D144 Jan  
Abstract: SMART (Simple Modular Architecture Research Tool) is a web tool (http://smart.embl.de/) for the identification and annotation of protein domains, and provides a platform for the comparative study of complex domain architectures in genes and proteins. The January 2004 release of SMART contains 685 protein domains. New developments in SMART are centred on the integration of data from completed metazoan genomes. SMART now uses predicted proteins from complete genomes in its source sequence databases, and integrates these with predictions of orthology. New visualization tools have been developed to allow analysis of gene intron-exon structure within the context of protein domain structure, and to align these displays to provide schematic comparisons of orthologous genes, or multiple transcripts from the same gene. Other improvements include the ability to query SMART by Gene Ontology terms, improved structure database searching and batch retrieval of multiple entries.
Notes:
2003
2002
Tobias Doerks, Richard R Copley, Jörg Schultz, Chris P Ponting, Peer Bork (2002)  Systematic identification of novel protein domain families associated with nuclear functions.   Genome Res 12: 1. 47-56 Jan  
Abstract: A systematic computational analysis of protein sequences containing known nuclear domains led to the identification of 28 novel domain families. This represents a 26% increase in the starting set of 107 known nuclear domain families used for the analysis. Most of the novel domains are present in all major eukaryotic lineages, but 3 are species specific. For about 500 of the 1200 proteins that contain these new domains, nuclear localization could be inferred, and for 700, additional features could be predicted. For example, we identified a new domain, likely to have a role downstream of the unfolded protein response; a nematode-specific signalling domain; and a widespread domain, likely to be a noncatalytic homolog of ubiquitin-conjugating enzymes.
Notes:
Ivica Letunic, Leo Goodstadt, Nicholas J Dickens, Tobias Doerks, Joerg Schultz, Richard Mott, Francesca Ciccarelli, Richard R Copley, Chris P Ponting, Peer Bork (2002)  Recent improvements to the SMART domain-based sequence annotation resource.   Nucleic Acids Res 30: 1. 242-244 Jan  
Abstract: SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain. We have implemented new analysis methods and updated others. New advanced queries provide direct access to the SMART relational database using SQL. This database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats. SMART output can now be easily included in users' documents. A SMART mirror has been created at http://smart.ox.ac.uk.
Notes:
Robert H Waterston, Kerstin Lindblad-Toh, Ewan Birney, Jane Rogers, Josep F Abril, Pankaj Agarwal, Richa Agarwala, Rachel Ainscough, Marina Alexandersson, Peter An, Stylianos E Antonarakis, John Attwood, Robert Baertsch, Jonathon Bailey, Karen Barlow, Stephan Beck, Eric Berry, Bruce Birren, Toby Bloom, Peer Bork, Marc Botcherby, Nicolas Bray, Michael R Brent, Daniel G Brown, Stephen D Brown, Carol Bult, John Burton, Jonathan Butler, Robert D Campbell, Piero Carninci, Simon Cawley, Francesca Chiaromonte, Asif T Chinwalla, Deanna M Church, Michele Clamp, Christopher Clee, Francis S Collins, Lisa L Cook, Richard R Copley, Alan Coulson, Olivier Couronne, James Cuff, Val Curwen, Tim Cutts, Mark Daly, Robert David, Joy Davies, Kimberly D Delehaunty, Justin Deri, Emmanouil T Dermitzakis, Colin Dewey, Nicholas J Dickens, Mark Diekhans, Sheila Dodge, Inna Dubchak, Diane M Dunn, Sean R Eddy, Laura Elnitski, Richard D Emes, Pallavi Eswara, Eduardo Eyras, Adam Felsenfeld, Ginger A Fewell, Paul Flicek, Karen Foley, Wayne N Frankel, Lucinda A Fulton, Robert S Fulton, Terrence S Furey, Diane Gage, Richard A Gibbs, Gustavo Glusman, Sante Gnerre, Nick Goldman, Leo Goodstadt, Darren Grafham, Tina A Graves, Eric D Green, Simon Gregory, Roderic Guigó, Mark Guyer, Ross C Hardison, David Haussler, Yoshihide Hayashizaki, LaDeana W Hillier, Angela Hinrichs, Wratko Hlavina, Timothy Holzer, Fan Hsu, Axin Hua, Tim Hubbard, Adrienne Hunt, Ian Jackson, David B Jaffe, L Steven Johnson, Matthew Jones, Thomas A Jones, Ann Joy, Michael Kamal, Elinor K Karlsson, Donna Karolchik, Arkadiusz Kasprzyk, Jun Kawai, Evan Keibler, Cristyn Kells, W James Kent, Andrew Kirby, Diana L Kolbe, Ian Korf, Raju S Kucherlapati, Edward J Kulbokas, David Kulp, Tom Landers, J P Leger, Steven Leonard, Ivica Letunic, Rosie Levine, Jia Li, Ming Li, Christine Lloyd, Susan Lucas, Bin Ma, Donna R Maglott, Elaine R Mardis, Lucy Matthews, Evan Mauceli, John H Mayer, Megan McCarthy, W Richard McCombie, Stuart McLaren, Kirsten McLay, John D McPherson, Jim Meldrim, Beverley Meredith, Jill P Mesirov, Webb Miller, Tracie L Miner, Emmanuel Mongin, Kate T Montgomery, Michael Morgan, Richard Mott, James C Mullikin, Donna M Muzny, William E Nash, Joanne O Nelson, Michael N Nhan, Robert Nicol, Zemin Ning, Chad Nusbaum, Michael J O'Connor, Yasushi Okazaki, Karen Oliver, Emma Overton-Larty, Lior Pachter, Genís Parra, Kymberlie H Pepin, Jane Peterson, Pavel Pevzner, Robert Plumb, Craig S Pohl, Alex Poliakov, Tracy C Ponce, Chris P Ponting, Simon Potter, Michael Quail, Alexandre Reymond, Bruce A Roe, Krishna M Roskin, Edward M Rubin, Alistair G Rust, Ralph Santos, Victor Sapojnikov, Brian Schultz, Jörg Schultz, Matthias S Schwartz, Scott Schwartz, Carol Scott, Steven Seaman, Steve Searle, Ted Sharpe, Andrew Sheridan, Ratna Shownkeen, Sarah Sims, Jonathan B Singer, Guy Slater, Arian Smit, Douglas R Smith, Brian Spencer, Arne Stabenau, Nicole Stange-Thomann, Charles Sugnet, Mikita Suyama, Glenn Tesler, Johanna Thompson, David Torrents, Evanne Trevaskis, John Tromp, Catherine Ucla, Abel Ureta-Vidal, Jade P Vinson, Andrew C Von Niederhausern, Claire M Wade, Melanie Wall, Ryan J Weber, Robert B Weiss, Michael C Wendl, Anthony P West, Kris Wetterstrand, Raymond Wheeler, Simon Whelan, Jamey Wierzbowski, David Willey, Sophie Williams, Richard K Wilson, Eitan Winter, Kim C Worley, Dudley Wyman, Shan Yang, Shiaw-Pyng Yang, Evgeny M Zdobnov, Michael C Zody, Eric S Lander (2002)  Initial sequencing and comparative analysis of the mouse genome.   Nature 420: 6915. 520-562 Dec  
Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
Notes:
George Dimopoulos, George K Christophides, Stephan Meister, Jörg Schultz, Kevin P White, Carolina Barillas-Mury, Fotis C Kafatos (2002)  Genome expression analysis of Anopheles gambiae: responses to injury, bacterial challenge, and malaria infection.   Proc Natl Acad Sci U S A 99: 13. 8814-8819 Jun  
Abstract: The complex gene expression responses of Anopheles gambiae to microbial and malaria challenges, injury, and oxidative stress (in the mosquito and/or a cultured cell line) were surveyed by using cDNA microarrays constructed from an EST-clone collection. The expression profiles were broadly subdivided into induced and down-regulated gene clusters. Gram+ and Gram- bacteria and microbial elicitors up-regulated a diverse set of genes, many belonging to the immunity class, and the response to malaria partially overlapped with this response. Oxidative stress activated a distinctive set of genes, mainly implicated in oxidoreductive processes. Injury up- and down-regulated gene clusters also were distinctive, prominently implicating glycolysis-related genes and citric acid cycle/oxidative phosphorylation/redox-mitochondrial functions, respectively. Cross-comparison of in vivo and in vitro responses indicated the existence of tightly coregulated gene groups that may correspond to gene pathways.
Notes:
Anne-Claude Gavin, Markus Bösche, Roland Krause, Paola Grandi, Martina Marzioch, Andreas Bauer, Jörg Schultz, Jens M Rick, Anne-Marie Michon, Cristina-Maria Cruciat, Marita Remor, Christian Höfert, Malgorzata Schelder, Miro Brajenovic, Heinz Ruffner, Alejandro Merino, Karin Klein, Manuela Hudak, David Dickson, Tatjana Rudi, Volker Gnau, Angela Bauch, Sonja Bastuck, Bettina Huhse, Christina Leutwein, Marie-Anne Heurtier, Richard R Copley, Angela Edelmann, Erich Querfurth, Vladimir Rybin, Gerard Drewes, Manfred Raida, Tewis Bouwmeester, Peer Bork, Bertrand Seraphin, Bernhard Kuster, Gitte Neubauer, Giulio Superti-Furga (2002)  Functional organization of the yeast proteome by systematic analysis of protein complexes.   Nature 415: 6868. 141-147 Jan  
Abstract: Most cellular processes are carried out by multiprotein complexes. The identification and analysis of their components provides insight into how the ensemble of expressed proteins (proteome) is organized into functional units. We used tandem-affinity purification (TAP) and mass spectrometry in a large-scale approach to characterize multiprotein complexes in Saccharomyces cerevisiae. We processed 1,739 genes, including 1,143 human orthologues of relevance to human biology, and purified 589 protein assemblies. Bioinformatic analysis of these assemblies defined 232 distinct multiprotein complexes and proposed new cellular roles for 344 proteins, including 231 proteins with no previous functional annotation. Comparison of yeast and human complexes showed that conservation across species extends from single proteins to their molecular environment. Our analysis provides an outline of the eukaryotic proteome as a network of protein complexes at a level of organization beyond binary interactions. This higher-order map contains fundamental biological information and offers the context for a more reasoned and informed approach to drug discovery.
Notes:
Jörg Schultz, Birgit Pils (2002)  Prediction of structure and functional residues for O-GlcNAcase, a divergent homologue of acetyltransferases.   FEBS Lett 529: 2-3. 179-182 Oct  
Abstract: N-Acetyl-beta-D-glucosaminidase (O-GlcNAcase) is a key enzyme in the posttranslational modification of intracellular proteins by O-linked N-acetylglucosamine (O-GlcNAc). Here, we show that this protein contains two catalytic domains, one homologous to bacterial hyaluronidases and one belonging to the GCN5-related family of acetyltransferases (GNATs). Using sequence and structural information, we predict that the GNAT homologous region contains the O-GlcNAcase activity. Thus, O-GlcNAcase is the first member of the GNAT family not involved in transfer of acetyl groups, adding a new mode of evolution to this large protein family. Comparison with solved structures of different GNATs led to a reliable structure prediction and mapping of residues involved in binding of the GlcNAc-modified proteins and catalysis.
Notes:
Richard Mott, Jörg Schultz, Peer Bork, Chris P Ponting (2002)  Predicting protein cellular localization using a domain projection method.   Genome Res 12: 8. 1168-1174 Aug  
Abstract: We investigate the co-occurrence of domain families in eukaryotic proteins to predict protein cellular localization. Approximately half (300) of SMART domains form a "small-world network", linked by no more than seven degrees of separation. Projection of the domains onto two-dimensional space reveals three clusters that correspond to cellular compartments containing secreted, cytoplasmic, and nuclear proteins. The projection method takes into account the existence of "bridging" domains, that is, instances where two domains might not occur with each other but frequently co-occur with a third domain; in such circumstances the domains are neighbors in the projection. While the majority of domains are specific to a compartment ("locale"), and hence may be used to localize any protein that contains such a domain, a small subset of domains either are present in multiple locales or occur in transmembrane proteins. Comparison with previously annotated proteins shows that SMART domain data used with this approach can predict, with 92% accuracy, the localizations of 23% of eukaryotic proteins. The coverage and accuracy will increase with improvements in domain database coverage. This method is complementary to approaches that use amino-acid composition or identify sorting sequences; these methods may be combined to further enhance prediction accuracy.
Notes:
2001
E S Lander, L M Linton, B Birren, C Nusbaum, M C Zody, J Baldwin, K Devon, K Dewar, M Doyle, W FitzHugh, R Funke, D Gage, K Harris, A Heaford, J Howland, L Kann, J Lehoczky, R LeVine, P McEwan, K McKernan, J Meldrim, J P Mesirov, C Miranda, W Morris, J Naylor, C Raymond, M Rosetti, R Santos, A Sheridan, C Sougnez, N Stange-Thomann, N Stojanovic, A Subramanian, D Wyman, J Rogers, J Sulston, R Ainscough, S Beck, D Bentley, J Burton, C Clee, N Carter, A Coulson, R Deadman, P Deloukas, A Dunham, I Dunham, R Durbin, L French, D Grafham, S Gregory, T Hubbard, S Humphray, A Hunt, M Jones, C Lloyd, A McMurray, L Matthews, S Mercer, S Milne, J C Mullikin, A Mungall, R Plumb, M Ross, R Shownkeen, S Sims, R H Waterston, R K Wilson, L W Hillier, J D McPherson, M A Marra, E R Mardis, L A Fulton, A T Chinwalla, K H Pepin, W R Gish, S L Chissoe, M C Wendl, K D Delehaunty, T L Miner, A Delehaunty, J B Kramer, L L Cook, R S Fulton, D L Johnson, P J Minx, S W Clifton, T Hawkins, E Branscomb, P Predki, P Richardson, S Wenning, T Slezak, N Doggett, J F Cheng, A Olsen, S Lucas, C Elkin, E Uberbacher, M Frazier, R A Gibbs, D M Muzny, S E Scherer, J B Bouck, E J Sodergren, K C Worley, C M Rives, J H Gorrell, M L Metzker, S L Naylor, R S Kucherlapati, D L Nelson, G M Weinstock, Y Sakaki, A Fujiyama, M Hattori, T Yada, A Toyoda, T Itoh, C Kawagoe, H Watanabe, Y Totoki, T Taylor, J Weissenbach, R Heilig, W Saurin, F Artiguenave, P Brottier, T Bruls, E Pelletier, C Robert, P Wincker, D R Smith, L Doucette-Stamm, M Rubenfield, K Weinstock, H M Lee, J Dubois, A Rosenthal, M Platzer, G Nyakatura, S Taudien, A Rump, H Yang, J Yu, J Wang, G Huang, J Gu, L Hood, L Rowen, A Madan, S Qin, R W Davis, N A Federspiel, A P Abola, M J Proctor, R M Myers, J Schmutz, M Dickson, J Grimwood, D R Cox, M V Olson, R Kaul, N Shimizu, K Kawasaki, S Minoshima, G A Evans, M Athanasiou, R Schultz, B A Roe, F Chen, H Pan, J Ramser, H Lehrach, R Reinhardt, W R McCombie, M de la Bastide, N Dedhia, H Blöcker, K Hornischer, G Nordsiek, R Agarwala, L Aravind, J A Bailey, A Bateman, S Batzoglou, E Birney, P Bork, D G Brown, C B Burge, L Cerutti, H C Chen, D Church, M Clamp, R R Copley, T Doerks, S R Eddy, E E Eichler, T S Furey, J Galagan, J G Gilbert, C Harmon, Y Hayashizaki, D Haussler, H Hermjakob, K Hokamp, W Jang, L S Johnson, T A Jones, S Kasif, A Kaspryzk, S Kennedy, W J Kent, P Kitts, E V Koonin, I Korf, D Kulp, D Lancet, T M Lowe, A McLysaght, T Mikkelsen, J V Moran, N Mulder, V J Pollara, C P Ponting, G Schuler, J Schultz, G Slater, A F Smit, E Stupka, J Szustakowski, D Thierry-Mieg, J Thierry-Mieg, L Wagner, J Wallis, R Wheeler, A Williams, Y I Wolf, K H Wolfe, S P Yang, R F Yeh, F Collins, M S Guyer, J Peterson, A Felsenfeld, K A Wetterstrand, A Patrinos, M J Morgan, P de Jong, J J Catanese, K Osoegawa, H Shizuya, S Choi, Y J Chen, J Szustakowki (2001)  Initial sequencing and analysis of the human genome.   Nature 409: 6822. 860-921 Feb  
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
Notes:
J Schultz, T Jones, P Bork, D Sheer, S Blencke, S Steyrer, U Wellbrock, D Bevec, A Ullrich, C Wallasch (2001)  Molecular characterization of a cDNA encoding functional human CLK4 kinase and localization to chromosome 5q35 [correction of 4q35].   Genomics 71: 3. 368-370 Feb  
Abstract: Phosphorylated serine- and arginine-rich (SR) proteins play an important role in the formation of spliceosomes, possibly controlling the regulation of alternative splicing. Enzymes that phosphorylate the SR proteins belong to the family of CDC2/CDC28-like kinases (CLK). Employing nucleotide sequence comparison of human expressed sequence tag sequences to the murine counterpart, we identified, cloned, and recombinantly expressed the human orthologue to the murine CLK4 cDNA. When fused to glutathione S-transferase, the catalytically active human CLK4 is able to autophosphorylate and to phosphorylate myelin basic protein, but not histone H2B as a substrate. Inspection of mRNA accumulation demonstrated gene expression in all human tissues, with the most prominent abundance in liver, kidney, brain, and heart. Using fluorescence in situ hybridization, the human CLK4 cDNA was localized to band q35 on chromosome 5 [corrected].
Notes:
2000
J Schultz, R R Copley, T Doerks, C P Ponting, P Bork (2000)  SMART: a web-based tool for the study of genetically mobile domains.   Nucleic Acids Res 28: 1. 231-234 Jan  
Abstract: SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures (http://SMART.embl-heidelberg.de ). More than 400 domain families found in signalling, extra-cellular and chromatin-associated proteins are detectable. These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues. Each domain found in a non-redundant protein database as well as search parameters and taxonomic information are stored in a relational database system. User interfaces to this database allow searches for proteins containing specific combinations of domains in defined taxa.
Notes:
G Dimopoulos, T L Casavant, S Chang, T Scheetz, C Roberts, M Donohue, J Schultz, V Benes, P Bork, W Ansorge, M B Soares, F C Kafatos (2000)  Anopheles gambiae pilot gene discovery project: identification of mosquito innate immunity genes from expressed sequence tags generated from immune-competent cell lines.   Proc Natl Acad Sci U S A 97: 12. 6619-6624 Jun  
Abstract: Together with AIDS and tuberculosis, malaria is at the top of the list of devastating infectious diseases. However, molecular genetic studies of its major vector, Anopheles gambiae, are still quite limited. We have conducted a pilot gene discovery project to accelerate progress in the molecular analysis of vector biology, with emphasis on the mosquito's antimalarial immune defense. A total of 5,925 expressed sequence tags were determined from normalized cDNA libraries derived from immune-responsive hemocyte-like cell lines. The 3,242 expressed sequence tag-containing cDNA clones were grouped into 2,380 clone clusters, potentially representing unique genes. Of these, 1,118 showed similarities to known genes from other organisms, but only 27 were identical to previously known mosquito genes. We identified 38 candidate genes, based on sequence similarity, that may be implicated in immune reactions including antimalarial defense; 19 of these were shown experimentally to be inducible by bacterial challenge, lending support to their proposed involvement in mosquito immunity.
Notes:
J Schultz, T Doerks, C P Ponting, R R Copley, P Bork (2000)  More than 1,000 putative new human signalling proteins revealed by EST data mining.   Nat Genet 25: 2. 201-204 Jun  
Abstract: Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based searches with a domain identification protocol, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.
Notes:
1999
R R Copley, J Schultz, C P Ponting, P Bork (1999)  Protein families in multicellular organisms.   Curr Opin Struct Biol 9: 3. 408-415 Jun  
Abstract: The complete sequence of the nematode worm Caenorhabditis elegans contains the genetic machinery that is required to undertake the core biological processes of single cells. However, the genome also encodes proteins that are associated with multicellularity, as well as others that are lineage-specific expansions of phylogenetically widespread families and yet more that are absent in non-nematodes. Ongoing analysis is beginning to illuminate the similarities and differences among human proteins and proteins that are encoded by the genomes of the multicellular worm and the unicellular yeast, and will be essential in determining the reliability of transferring experimental data among phylogenetically distant species.
Notes:
C P Ponting, J Schultz, F Milpetz, P Bork (1999)  SMART: identification and annotation of domains from signalling and extracellular protein sequences.   Nucleic Acids Res 27: 1. 229-232 Jan  
Abstract: SMART is a simple modular architecture research tool and database that provides domain identification and annotation on the WWW (http://coot.embl-heidelberg.de/SMART). The tool compares query sequences with its databases of domain sequences and multiple alignments whilst concurrently identifying compositionally biased regions such as signal peptide, transmembrane and coiled coil segments. Annotated and unannotated regions of the sequence can be used as queries in searches of sequence databases. The SMART alignment collection represents more than 250 signalling and extracellular domains. Each alignment is curated to assign appropriate domain boundaries and to ensure its quality. In addition, each domain is annotated extensively with respect to cellular localisation, species distribution, functional class, tertiary structure and functionally important residues.
Notes:
C P Ponting, L Aravind, J Schultz, P Bork, E V Koonin (1999)  Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer.   J Mol Biol 289: 4. 729-745 Jun  
Abstract: Phyletic distributions of eukaryotic signalling domains were studied using recently developed sensitive methods for protein sequence analysis, with an emphasis on the detection and accurate enumeration of homologues in bacteria and archaea. A major difference was found between the distributions of enzyme families that are typically found in all three divisions of cellular life and non-enzymatic domain families that are usually eukaryote-specific. Previously undetected bacterial homologues were identified for# plant pathogenesis-related proteins, Pad1, von Willebrand factor type A, src homology 3 and YWTD repeat-containing domains. Comparisons of the domain distributions in eukaryotes and prokaryotes enabled distinctions to be made between the domains originating prior to the last common ancestor of all known life forms and those apparently originating as consequences of horizontal gene transfer events. A number of transfers of signalling domains from eukaryotes to bacteria were confidently identified, in contrast to only a single case of apparent transfer from eukaryotes to archaea.
Notes:
1998
J Schultz, F Milpetz, P Bork, C P Ponting (1998)  SMART, a simple modular architecture research tool: identification of signaling domains.   Proc Natl Acad Sci U S A 95: 11. 5857-5864 May  
Abstract: Accurate multiple alignments of 86 domains that occur in signaling proteins have been constructed and used to provide a Web-based tool (SMART: simple modular architecture research tool) that allows rapid identification and annotation of signaling domain sequences. The majority of signaling proteins are multidomain in character with a considerable variety of domain combinations known. Comparison with established databases showed that 25% of our domain set could not be deduced from SwissProt and 41% could not be annotated by Pfam. SMART is able to determine the modular architectures of single sequences or genomes; application to the entire yeast genome revealed that at least 6.7% of its genes contain one or more signaling domains, approximately 350 greater than previously annotated. The process of constructing SMART predicted (i) novel domain homologues in unexpected locations such as band 4.1-homologous domains in focal adhesion kinases; (ii) previously unknown domain families, including a citron-homology domain; (iii) putative functions of domain families after identification of additional family members, for example, a ubiquitin-binding role for ubiquitin-associated domains (UBA); (iv) cellular roles for proteins, such predicted DEATH domains in netrin receptors further implicating these molecules in axonal guidance; (v) signaling domains in known disease genes such as SPRY domains in both marenostrin/pyrin and Midline 1; (vi) domains in unexpected phylogenetic contexts such as diacylglycerol kinase homologues in yeast and bacteria; and (vii) likely protein misclassifications exemplified by a predicted pleckstrin homology domain in a Candida albicans protein, previously described as an integrin.
Notes:
1997
J Schultz, C P Ponting, K Hofmann, P Bork (1997)  SAM as a protein interaction domain involved in developmental regulation.   Protein Sci 6: 1. 249-253 Jan  
Abstract: More than 60 previously undetected SAM domain-containing proteins have been identified using profile searching methods. Among these are over 40 EPH-related receptor tyrosine kinases (RPTK), Drosophila bicaudal-C, a p53 from Loligo forbesi, and diacyglycerol-kinase isoform delta. This extended dataset suggests that SAM is an evolutionary conserved protein binding domain that is involved in the regulation of numerous developmental processes among diverse eukaryotes. A conserved tyrosine in the SAM sequences of the EPH related RPTKs is likely to mediate cell-cell initiated signal transduction via the binding of SH2 containing proteins to phosphotyrosine.
Notes:
P Bork, J Schultz, C P Ponting (1997)  Cytoplasmic signalling domains: the next generation.   Trends Biochem Sci 22: 8. 296-298 Aug  
Abstract: Since the late 1980s, when Src-homology SH2 and SH3 domains were identified, the repertoire of non-catalytic signalling domains has increased to number over 30. As it is expected that further regulatory domains shall be found, unravelling the complex network of their interactions remains an on-going challenge.
Notes:
1996
P Bork, N P Brown, H Hegyi, J Schultz (1996)  The protein phosphatase 2C (PP2C) superfamily: detection of bacterial homologues.   Protein Sci 5: 7. 1421-1425 Jul  
Abstract: A thorough sequence analysis of the various members of the eukaryotic protein serine/threonine phosphatase 2C (PP2C) family revealed the conservation of 11 motifs. These motifs could be identified in numerous other sequences, including fungal adenylate cyclases that are predicted to contain a functionally active PP2C domain, and a family of prokaryotic serine/threonine phosphatases including SpoIIE. Phylogenetic analysis of all the proteins indicates a widespread sequence family for which a considerable number of isoenzymes can be inferred.
Notes:

Book chapters

2006
Powered by PublicationsList.org.