Abstract: Epidermolysis bullosa acquisita (EBA) is a chronic mucocutaneous autoimmune skin blistering disease. Several lines of evidence underscore the contribution of autoantibodies against type VII collagen (COL7) to the pathogenesis of EBA. Furthermore, EBA susceptibility is associated with the MHC haplotype in patients (HLA-DR2) and in immunization-induced EBA in mice (H2s). The latter study indicated an additional contribution of non-MHC genes to disease susceptibility. To identify non-MHC genes controlling EBA susceptibility, we intercrossed EBA-susceptible MRL/MpJ with EBA-resistant NZM2410/J and BXD2/TyJ as well as Cast mice. Mice of the fourth generation of this four-way autoimmune-prone advanced intercross line were immunized with a fragment of murine COL7 to induce EBA. Anti-COL7 autoantibodies were detected in 84% of mice, whereas deposition of complement at the dermal-epidermal junction (DEJ) was observed in 50% of the animals; 33% of immunized mice presented with overt clinical EBA. Onset of clinical disease was associated with several quantitative trait loci (QTLs) located on chromosomes 9, 12, 14, and 19, whereas maximum disease severity was linked to QTLs on chromosomes 1, 15, and 19. This more detailed insight into the pathogenesis of EBA may eventually lead to new treatment strategies for EBA and other autoantibody-mediated diseases.Journal of Investigative Dermatology advance online publication, 2 February 2012; doi:10.1038/jid.2011.466.
Abstract: SUMMARY: xQTL workbench is a scalable web platform for the mapping of quantitative trait loci (QTLs) at multiple levels: for example gene expression (eQTL), protein abundance (pQTL), metabolite abundance (mQTL) and phenotype (phQTL) data. Popular QTL mapping methods for model organism and human populations are accessible via the web user interface. Large calculations scale easily on to multi-core computers, clusters, and Cloud. All data involved can be uploaded and queried online: markers, genotypes, microarrays, NGS, LC-MS, GC-MS, NMR, etc. When new data types come available, xQTL workbench is quickly customized using the Molgenis software generator. AVAILABILITY: xQTL workbench runs on all common platforms, including Linux, Mac OS X and Windows. An online demo system, installation guide, tutorials, software and source code are available under the LGPL3 license from http://www.xqtl.org. CONTACT: Morris Swertz (m.a.swertz@rug.nl).
Abstract: Background and aims Autoimmune pancreatitis (AIP) represents a rare but clinically relevant cause of pancreatic inflammation. Using MRL/Mp mice as a model of spontaneous AIP, the genetic basis of the disease was studied. Methods To identify quantitative trait loci (QTL) of AIP, an advanced intercross line was studied, originating from MRL/MpJ parental mice and the following three mouse strains: Cast (healthy controls), BXD2 (susceptible to collagen induced arthritis), and NZM (a model of lupus erythematosus). This concept was chosen to identify both general autoimmune disease associated loci and AIP specific QTL. Therefore, generation G4 of outbred intercross mice was characterised phenotypically by scoring histopathological changes of the pancreas and genotyped with single nucleotide polymorphism (SNP) arrays. Data were analysed with the R implementation of HAPPY. Results Five QTLs, correlating with the severity of AIP, were identified. Two of them mapped to chromosome 4 and one to chromosomes 2, 5, and 6, respectively. The QTL on chromosome 6 displays the highest LOD score (5.4) and contains the C-type lectin domain family 4 member a2 in its peak region, which encodes a receptor protein of dendritic cells that has previously been implicated in autoimmune diseases such as Sjogren's syndrome. AIP candidate genes of other QTL's include heterogeneous nuclear ribonucleoprotein A3; nuclear factor, erythroid derived 2, like 2; Sjogren syndrome antigen B; and ubiquitin protein ligase E3 component n-recognin 3. Conclusions This study has identified QTLs and putative candidate genes of murine AIP. Their functional role and relevance to human AIP will be studied further.
Abstract: During a meeting of the SYSGENET working group âBioinformaticsâ, currently available software tools and databases for systems genetics in mice were reviewed and the needs for future developments discussed. The group evaluated interoperability and performed initial feasibility studies. To aid future compatibility of software and exchange of already developed software modules, a strong recommendation was made by the group to integrate HAPPY and R/qtl analysis toolboxes, GeneNetwork and XGAP database platforms, and TIQS and xQTL processing platforms. R should be used as the principal computer language for QTL data analysis in all platforms and a âcloudâ should be used for software dissemination to the community. Furthermore, the working group recommended that all data models and software source code should be made visible in public repositories to allow a coordinated effort on the use of common data structures and file formats.
Abstract: Some creatures living in extremely low temperatures can produce some special materials called "antifreeze proteins" (AFPs), which can prevent the cell and body fluids from freezing. AFPs are present in vertebrates, invertebrates, plants, bacteria, fungi, etc. Although AFPs have a common function, they show a high degree of diversity in sequences and structures. Therefore, sequence similarity based search methods often fails to predict AFPs from sequence databases. In this work, we report a random forest approach "AFP-Pred" for the prediction of antifreeze proteins from protein sequence. AFP-Pred was trained on the dataset containing 300 AFPs and 300 non-AFPs and tested on the dataset containing 181 AFPs and 9193 non-AFPs. AFP-Pred achieved 81.33% accuracy from training and 83.38% from testing. The performance of AFP-Pred was compared with BLAST and HMM. High prediction accuracy and successful of prediction of hypothetical proteins suggests that AFP-Pred can be a useful approach to identify antifreeze proteins from sequence information, irrespective of their sequence similarity.
Abstract: The Advanced Resource Connector (ARC) is a light-weight, non-intrusive, simple yet powerful Grid middleware capable of connecting highly heterogeneous computing and storage resources. ARC aims at providing general purpose, flexible, collaborative computing environments suitable for a range of uses, both in science and business. The server side offers the fundamental job execution management, information and data capabilities required for a Grid. Users are provided with an easy to install and use client which provides a basic toolbox for job- and data management. The KnowARC project developed the next-generation ARC middleware, implemented as Web Services with the aim of standard-compliant interoperability.
Abstract: The first scientific meeting of the newly established European SYSGENET network took place at the Helmholtz Centre for Infection Research (HZI) in Braunschweig, April 7-9, 2010. About 50 researchers working in the field of systems genetics using mouse genetic reference populations (GRP) participated in the meeting and exchanged their results, phenotyping approaches, and data analysis tools for studying systems genetics. In addition, the future of GRP resources and phenotyping in Europe was discussed.
Abstract: Eukaryotic protein secretion generally occurs via the classical secretory pathway that traverses the ER and Golgi apparatus. Secreted proteins usually contain a signal sequence with all the essential information required to target them for secretion. However, some proteins like fibroblast growth factors (FGF-1, FGF-2), interleukins (IL-1 alpha, IL-1 beta), galectins and thioredoxin are exported by an alternative pathway. This is known as leaderless or non-classical secretion and works without a signal sequence. Most computational methods for the identification of secretory proteins use the signal peptide as indicator and are therefore not able to identify substrates of non-classical secretion. In this work, we report a random forest method, SPRED, to identify secretory proteins from protein sequences irrespective of N-terminal signal peptides, thus allowing also correct classification of non-classical secretory proteins. Training was performed on a dataset containing 600 extracellular proteins and 600 cytoplasmic and/or nuclear proteins. The algorithm was tested on 180 extracellular proteins and 1380 cytoplasmic and/or nuclear proteins. We obtained 85.92% accuracy from training and 82.18% accuracy from testing. Since SPRED does not use N-terminal signals, it can detect non-classical secreted proteins by filtering those secreted proteins with an N-terminal signal by using SignalP. SPRED predicted 15 out of 19 experimentally verified non-classical secretory proteins. By scanning the entire human proteome we identified 566 protein sequences potentially undergoing non-classical secretion. The dataset and standalone version of the SPRED software is available at http://www.inb.uni-luebeck.de/tools-demos/spred/spred.
Abstract: Apoptosis is an essential process for controlling tissue homeostasis by regulating a physiological balance between cell proliferation and cell death. The subcellular locations of proteins performing the cell death are determined by mostly independent cellular mechanisms. The regular bioinformatics tools to predict the subcellular locations of such apoptotic proteins do often fail. This work proposes a model for the sorting of proteins that are involved in apoptosis, allowing us to both the prediction of their subcellular locations as well as the molecular properties that contributed to it. We report a novel hybrid Genetic Algorithm (GA)/Support Vector Machine (SVM) approach to predict apoptotic protein sequences using 119 sequence derived properties like frequency of amino acid groups, secondary structure, and physicochemical properties. GA is used for selecting a near-optimal subset of informative features that is most relevant for the classification. Jackknife cross-validation is applied to test the predictive capability of the proposed method on 317 apoptosis proteins. Our method achieved 85.80% accuracy using all 119 features and 89.91% accuracy for 25 features selected by GA. Our models were examined by a test dataset of 98 apoptosis proteins and obtained an overall accuracy of 90.34%. The results show that the proposed approach is promising; it is able to select small subsets of features and still improves the classification accuracy. Our model can contribute to the understanding of programmed cell death and drug discovery. The software and dataset are available at http://www.inb.uni-luebeck.de/tools-demos/apoptosis/GASVM.
Abstract: Despite its generalized use as drug therapy for multiple sclerosis (MS), the molecular mechanisms of action of interferon beta (IFNB) are still poorly understood. IFNB therapy is long-termed and clinical effects are not immediate, therefore reliable early biomarkers for IFNB activity should maintain a differential expression over time, but longitudinal studies at a transcriptional level have been rare. Microarrays were used to monitor 18 IFNB1b treated MS patients at four time points spanning a period of 1 year. Genes showing in the majority of patients the greatest and most consistent changes in their expression levels were studied. Interferon regulated genes were significantly overrepresented. Fifteen markers were differentially expressed during all three time points and followed a consistent time course pattern: EIF2AK2, IFI6, IFI44, IFI44L, IFIH1, IFIT1, IFIT2, IFIT3, ISG15, MX1, OASL, RSAD2, SN, XAF1 and the marker 238704_at. Except for the last one, these biomarkers were all formerly identified as being indicative for IFNB activity. Expression changes were both early detectable and long lasting and could thus be optimal biomarkers for IFNB activity in long-term studies. Other known biomarkers of IFNB activity were found to be differentially expressed just for certain periods after therapy onset: Interleukin-8 was a short lasting marker and changes in STAT1 were detected with delay.
Abstract: Medical image processing is known as a computationally expensive and data intensive domain. It is thus well-suited for Grid computing. However, Grid computing usually requires the applications to be designed for parallel processing, which is a challenge for medical imaging researchers in hospitals that are most often not used to this. Making parallel programming methods easier to apply can promote Grid technologies in clinical environments. Readily available, functional tools with an intuitive interface are required to really promote healthgrids. Moreover, the tools need to be well integrated with the Grid infrastructure. To facilitate the adoption of Grids in the Geneva University Hospitals we have set up a develop environment based on the Taverna workflow engine. Its usage with a medical imaging application on the hospitals' internal Grid cluster is presented in this paper.
Abstract: For the analysis of complex polygenic diseases, one does not expect all patients to share the same disease-associated alleles. Not even will disease-causing variations be assigned to the identical sets of genes between patients. However, one does expect overlaps in the sets of genes that are involved and even more so in their assigned molecular processes. Furthermore, the assignment of single nucleotide polymorphisms (SNPs) to genes is highly ambiguous for intergenic SNPs. The tool presented here hence adds external information, i.e. GeneOntology (GO) terms (Gene Ontology Consortium), to the analysis of SNP data. AVAILABILITY: A web interface and source code are offered at https://webtools.imbs.uni-luebeck.de/snptogo
Abstract: SUMMARY: This work presents two independent approaches for a seamless integration of computational grids with the bioinformatics workflow suite Taverna. These are supported by a unique relational database to link applications with grid resources and presents those as workflow elements. A web portal facilitates its collaborative maintenance. The first approach implements a gateway service to handle authentication certificates and all communication with the grid. It reads the database to spawn web services for workflow elements which are in turn used by Taverna. The second approach lets Taverna communicate with the grid on its own, by means of a newly developed plug-in. It reads the database and executes the needed tasks directly on the grid. While the gateway service is non-intrusive, the plug-in has technical advantages, e.g. by allowing data to remain on the grid while being passed between workflow elements. AVAILABILITY: http://grid.inb.uni-luebeck.de/
Abstract: The adoption of agent technologies and multi-agent systems constitutes an emerging area in bioinformatics. In this article, we report on the activity of the Working Group on Agents in Bioinformatics (BIOAGENTS) founded during the first AgentLink III Technical Forum meeting on the 2nd of July, 2004, in Rome. The meeting provided an opportunity for seeding collaborations between the agent and bioinformatics communities to develop a different (agent-based) approach of computational frameworks both for data analysis and management in bioinformatics and for systems modelling and simulation in computational and systems biology. The collaborations gave rise to applications and integrated tools that we summarize and discuss in context of the state of the art in this area. We investigate on future challenges and argue that the field should still be explored from many perspectives ranging from bio-conceptual languages for agent-based simulation, to the definition of bio-ontology-based declarative languages to be used by information agents, and to the adoption of agents for computational grids.
Abstract: Unbiased identification of susceptibility genes might provide new insights into pathogenic mechanisms that govern complex inflammatory diseases such as multiple sclerosis. In this study we fine mapped Eae18a, a region on rat chromosome 10 that regulates experimental autoimmune encephalomyelitis (EAE), an animal model for multiple sclerosis. We utilized two independent approaches: (1) in silico mapping based on sequence similarity between human multiple sclerosis susceptibility regions and rodent EAE quantitative trait loci and (2) linkage mapping in an F10 (DA x PVG.AV1) rat advanced intercrossed line. The linkage mapping defines Eae18a to a 5-Mb region, which overlaps one intergenomic consensus region identified in silico. The combined approach confirms experimentally, for the first time, the accuracy of the in silico method. Moreover, the shared intersection between the results of both mapping techniques defines a 1.06-Mb region containing 13 candidate genes for the regulation of neuroinflammation in humans, rats, and mice.
Abstract: Multiple sclerosis (MS) is the most prevalent chronic autoimmune, neurodegenerative disorder of the central nervous system (CNS). Despite substantial progress, treatment of MS and other autoimmune diseases is only moderately effective. It is anticipated that the treatment of autoimmune diseases with single drugs or biological approaches will in the future be complemented, or even replaced, by combination therapies, which include immunomodulation, elimination of infectious triggers and tissue repair. One proclaimed goal of biomedical research and clinical practice is the discovery of sets of genes with expression that correlates with successful outcomes of drug therapy, or with unfortunate side effects. Such information has direct consequences for selection, refinement or development of treatments and will soon be translated into clinical trials. The genome-wide RNA profile of an individual represents one complement to the comprehensive determination of disease- or drug response-related elements; comparable to a 'sentinel' method, it serves as a large-scale approach to MS biology. This work reviews the state of the art in MS research at the transcriptome level applying genomewide screening methods. It discusses implications in understanding disease pathogenicity, diagnostic markers, the identification of new therapeutic targets and a classification of patients towards the advent of tailored therapies.
Abstract: The generation of advanced intercross lines (AIL) is a powerful approach for high-resolution fine mapping of quantitative trait loci (QTLs), because they accumulate much more recombination events compared with conventional F2 intercross and N2 backcross. However, the application of this approach is severely hampered by the requirements of excessive resources to maintain such crosses, i.e., in terms of animal care, space, and time. Therefore, in this study, we produced an AIL to fine map collagen-induced arthritis (CIA) QTLs using comparatively limited resources. We used only 308 (DBA/1 x FVB/N)F11/12 AIL mice to refine QTLs controlling the severity and onset of arthritis as well as the Ab response and T cell subset in CIA, namely Cia2, Cia27, and Trmq3. These QTLs were originally identified in (DBA/1 x FVB/N)F2 progeny. The confidence intervals of the three QTLs were refined from 40, 43, and 48 Mb to 12, 4.1, and 12 Mb, respectively. The data were complemented by the use of another QTL fine-mapping approach, haplotype analysis, to further refine Cia2 into a 2-Mb genomic region. To aid in the search for candidate genes for the QTLs, genome-wide expression profiling was performed to identify strain-specific differentially expressed genes within the confidence intervals. Of the 1396 strain-specific differentially expressed genes, 3, 3, and 12 genes were within the support intervals of the Cia2, Cia27, and Trmq3, respectively. In addition, this study revealed that Cia27 and Trmq3 controlling anti-CII IgG2a Ab and CD4:CD8 T cell ratio, respectively, also regulated CIA clinical phenotypes.
Abstract: The complete DNA sequence of the human genome and of several related mammals are now available, due to the investments of enormous resources and advances in sequencing technology. Novel technologies have been developed to compare multiple genomes with each other, thus specifying regions of sequence similarity among mammals and with their pathogens. Larger blocks of sequence similarity (syntenic regions) have been determined and made publicly available. In many ways, novel insights can be gained by such data when combining external genetic or clinical information for these syntenic loci. These novel tools have proven to be successful in inferring functional equivalence between loci of multiple genomes. This review reports on the role of comparative genomics in research on autoimmune diseases, a field with strong dependencies on animal models of human diseases and the problem of an adequate information transfer between multiple organisms and research areas.
Abstract: One of the major quantitative trait loci (QTLs) associated with arthritis in crosses between B10.RIII and RIIIS/J mice is the Cia5 on chromosome 3. Early in the congenic mapping process it was clear that the locus was complex, consisting of several subloci with small effects. Therefore, we developed two novel strategies to dissect a QTL: the partial advanced inter-cross (PAI) strategy, with which we recently found the Cia5 region to consist of three loci, Cia5, Cia21 and Cia22, and now we introduce the QTL-chip strategy, where we have combined congenic mapping with a QTL-restricted expression profiling using a novel microarray design. The expression of QTL genes was compared between parental and congenic mice in lymph node, spleen and paw samples in five biological replicates and in dye-swapped experiments at three time points during the induction phase of arthritis. The QTL chip approach revealed 4 genes located in Cia21, differently expressed in lymph nodes, and 14 genes in Cia22, located within two clusters. One cluster contains six genes, differently expressed in spleen, and the second cluster contains eight genes, differently expressed in paws. We conclude the QTL-chip strategy to be valuable in the selection of candidate genes to be prioritized for further investigation.
Abstract: The existence of a soluble splice variant for a gene encoding a transmembrane protein suggests that this gene plays a role in intercellular signalling, particularly in immunological processes. Also, the absence of a splice variant of a reported soluble variant suggests exclusive control of the solubilisation by proteolytic cleavage. Soluble splice variants of membrane proteins may also be interesting targets for crystallisation as their structure may be expected to preserve, at least partially, their function as integral membrane proteins, whose structures are most difficult to determine. This paper presents a dataset derived from the literature in an attempt to collect all reported soluble variants of membrane proteins, be they splice variants or shedded. A list of soluble variants is derived in silico from Ensembl. These are checked on their presence in multiple organisms and their number of membranespanning regions is inspected. The findings then are confirmed by a comparison with identified proteins of a recent global proteomics study of human blood plasma. Finally, a tool to determine novel soluble variants by proteomics is provided.
Abstract: Genetic linkage and association studies define quantitative trait loci (QTLs) and susceptibility loci (SLs) that influence the phenotype of polygenic traits. A web-accessible application was created to identify intergenomic consensuses to fine map QTLs and SLs in silico and select particularly promising candidate genes for such traits. Furthermore, this approach offers an empirical evaluation of animal models for their applicability to the study of human traits. AVAILABILITY: http://qtl.pzr.uni-rostock.de/qtlmix.php CONTACT: serrano@pzr.uni-rostock.de.
Abstract: Common complex polygenic diseases as autoimmune diseases have not been completely understood on a molecular level. While many genes are known to be involved in the pathways responsible for the phenotype, explicit causes for the susceptibility of the disease remain to be elucidated. The susceptibility to disease is thought to be the result of genetic epistatic interactions between common polymorphic genes. This polymorphism is mostly caused by single nucleotide polymorphisms (SNPs). Human subpopulations are known to differ in the susceptibility to the diseases and generally in the distribution of single nucleotide polymorphisms. The here presented approach retrieves SNPs with the most divergent frequencies for selected human subpopulations to help defining properties for the experimental verification of SNPs within defined regions. A web-accessible program implementing this approach was evaluated for multiple sclerosis (MS), a common human polygenic disease. A link to a summary of data from "The SNP Consortium" (TSC) with sex-dependencies of SNPs is available. Associations of SNPs to genes, genetic markers and chromosomal loci are retrieved from the Ensembl project. This tool is recommended to be used in conjunction with microarray analyses or marker association studies that link genes or chromosomal loci to particular diseases.
Abstract: Genetic linkage and association studies define chromosomal regions, quantitative trait loci (QTLs), which influence the phenotype of polygenic diseases. Here, we describe a global approach to determine intergenomic consensus of those regions in order to fine map QTLs and select particularly promising candidate genes for disease susceptibility or other polygenic traits. Exemplarily, human multiple sclerosis (MS) susceptibility regions were compared for sequence similarity with mouse and rat QTLs in its animal model experimental allergic encephalomyelitis (EAE). The number of intergenomic MS/EAE consensus genes (295) is significantly higher than expected if the animal model was unrelated to the human disease. Hence, this approach contributes to the empirical evaluation of animal models for their applicability to the study of human diseases.
Abstract: A supervised nonlinear interpolation significantly improves the reliability of conversions from genetic distances to physical distances as compared with the linear ones. A webaccessible application was created that addresses this question with a graphical presentation that may be wrapped by local installations. MOTIVATION: Genetic linkage maps and radiation hybrid (RH) maps are based on the rate of uncoupling between linked genetic markers. These are usually measured in centiMorgan (cM) when uncoupling is originated by natural recombination or in centiRay (cR) for chromosomes that are irradiated artificially to separate the markers. Physical maps arise from genome-wide DNA sequencing and are measured in bp. This work was originally motivated as an extension of the software application Expressionview (Fischer et al., 2003), exploring its spectrum of appliance combining different mapping systems. The relationship between physical and genetic maps is known to be not always linear (Yu et al., 2001). The shift from the linear model seems to depend on local idiosyncrasies of the chromosomes and the kind of genetic map used. The present application addresses this problem for the first time. AVAILABILITY: http://qtl.pzr.uni-rostock.de/cartographer.php
Abstract: We present here a software tool for combined visualization of gene-expression data and quantitative trait loci (QTL). The application is implemented as an extension to the Ensembl project and caters for a direct transition from microarray experiments of gene or protein expression levels to the genomic context of individual genes and QTL. It supports the visualization of gene clusters and the selection of functional candidate genes in the context of research on complex traits.
Abstract: Hypoxia has a profound influence on progression and metastasis of malignant tumors. In the present report, we used the oligonucleotide microarray technique to identify new hypoxia-inducible genes in malignant melanoma with a special emphasis on angiogenesis factors. A commercially available Affymetrix gene chip system was used to analyze five melanoma cell lines of different aggressiveness. A total of 160 hypoxia-inducible genes were identified, clustering in four different functional clusters. In search of putative angiogenesis and tumor progression factors within these clusters, Cyr61, a recently discovered angiogenesis factor, was identified. Cyr61 was hypoxia-inducible in low aggressive melanoma cells; however, it showed constitutive high expression in highly aggressive melanoma cells. Further analyses of transcriptional mechanisms underlying Cyr61 gene expression under hypoxia demonstrated that an AP-1 binding motif within the Cyr61 promoter plays a central role in the hypoxic regulation of Cyr61. It could be shown by use of in vitro luciferase assays, electrophoretic mobility shift assays, and immunoprecipitation that hypoxia-inducible factor-1alpha interacts with c-Jun/AP-1 and may thereby contribute to Cyr61 transcriptional regulation under hypoxia. Taken together, the presented data show that Cyr61 is a hypoxia-inducible angiogenesis factor in malignant melanoma with tumor stage-dependent expression. This may argue for a hypoxia-induced selection process during tumor progression toward melanoma cells with constitutive high Cyr61 expression.
Abstract: The chromosome locations of 368 human Kruppel-type zinc finger (ZNF) PAC clones were physically mapped by FISH to human chromosomes in support of recent efforts of assigning KOX cDNAs (KOX1-KOX32) to zinc finger gene clusters. Recent mapping results were validated and confirmed by sequence comparisons to zinc finger gene sequences automatically annotated in EnsEMBL. In toto, 799 Kruppel-type zinc finger genes have been annotated in EnsEMBL of which 290 genes are found to encode KRAB domains. Sequence homologies of the zinc finger domains were used to establish phylogenic trees of KOX zinc finger genes as well as of all KRAB containing human zinc finger and KOX genes documenting the evolution of KRAB zinc finger genes late in primate evolution. A list of 368 assigned ZNF PAC clones is available under http://www.pzr.uni-rostock.de/supplements.
Abstract: G protein coupled receptors (GPCRs) are found in great numbers in most eukaryotic genomes. They are responsible for sensing a staggering variety of structurally diverse ligands, with their activation resulting in the initiation of a variety of cellular signalling cascades. The physiological response that is observed following receptor activation is governed by the guanine nucleotide-binding proteins (G proteins) to which a particular receptor chooses to couple. Previous investigations have demonstrated that the specificity of the receptor-G protein interaction is governed by the intracellular domains of the receptor. Despite many studies it has proven very difficult to predict de novo, from the receptor sequence alone, the G proteins to which a GPCR is most likely to couple. We have used a data-mining approach, combining pattern discovery with membrane topology prediction, to find patterns of amino acid residues in the intracellular domains of GPCR sequences that are specific for coupling to a particular functional class of G proteins. A prediction system was then built, being based upon these discovered patterns. We can report this approach was successful in the prediction of G protein coupling specificity of unknown sequences. Such predictions should be of great use in providing in silico characterisation of newly cloned receptor sequences and for improving the annotation of GPCRs stored in protein sequence databases. AVAILABILITY: http://www.ebi.ac.uk/~croning/coupling.html.
Abstract: MOTIVATION: A variety of tools are available to predict the topology of transmembrane proteins. To date no independent evaluation of the performance of these tools has been published. A better understanding of the strengths and weaknesses of the different tools would guide both the biologist and the bioinformatician to make better predictions of membrane protein topology. RESULTS: Here we present an evaluation of the performance of the currently best known and most widely used methods for the prediction of transmembrane regions in proteins. Our results show that TMHMM is currently the best performing transmembrane prediction program.
Abstract: Information agents integrate multiple distributed heterogeneous information sources. The challenging yet unsolved problem that remains, is to ensure the semantic consistency of the integrated data. In this paper we set out to develop a general approach to inconsistency management for information agents. It is implemented as part of the EDITtoTrEMBL system and applied on a large real-world problem in the domain of bioinformatics.
Abstract: SUMMARY: A collection of transmembrane proteins with annotated transmembrane regions, for which good experimental evidence exist, was created as a test or training set for algorithms to predict transmembrane regions in proteins.
Abstract: To cope with the increasing amount of sequence data, reliable automatic annotation tools are required. The TrEMBL database contains together with SWISS-PROT nearly all publicly available protein sequences, but in contrast to SWISS-PROT only limited functional annotation. To improve this situation, we had to develop a method of automatic annotation that produces highly reliable functional prediction using the language and the syntax of SWISS-PROT.
Abstract: SUMMARY: Many databases in molecular biology face the problem that the ever increasing rate of data production can no longer be handled by traditional methods, especially human curation. Therefore, a number of projects are currently investigating methods for automated sequence annotation. This paper describes the EBI's approach to this problem for protein sequences by integration of arbitrary analysis programs into a distributed and highly flexible environment. Our software framework allows an individual treatment of sequences depending on their particular properties, which is achieved through a high-level description of the preconditions and capabilities of analysing modules. This not only improves the overall performance of the annotation process, as unnecessary steps are avoided, but also enhances its quality since dependencies between different modules are taken into account. We have implemented a prototype and use it in the production of TrEMBL releases. AVAILABILITY: Upon request.
Abstract: Motivation: High-throughput technologies, like gene expression
arrays and next-generation sequencing, provide enormous data
sets, which are too large to transfer or download quickly. The study
of such data, for our application this means explaining the measurements
with a molecular interpretation of disease etiology, requires
continuous updates and refinements as novel interpretations
are pursued. The complexity of the problem requires a diverse range
of expertise. And thus a shared view is crucial for a successful collaboration
- within and between institutions.
Web services and traditional web pages provide centralized data
storage and synchronized presentation. Relying on a single central
server, though, comes with its own flavor of reliability and performance
issues. Every time the server is busy solving a request, the
user is forced to wait. It is therefore very beneficial to combine the
integrity of web services and the share-ability of web pages with the
fluency of a desktop application. Increasing the interactivity of data
presentation to each individual user allows for a more interactive
knowledge exchange on the group scale.
Here, we present a combination of Open Source technologies for
distributed, synchronized and failure-resistant storage of huge data
sets as the technological basement for globally fast access to research
data. Accordingly, this work explores the derived possibilities
for interactive presentation to a group of locally distributed researchers,
as enabled by a problem-tailored web application. To aid in the
investigative work, the user interface shows minimal latencies. These
goals are achieved by capitalizing on related developments in
distributed data storage and asynchronous web technologies, most
notably the non-relational database Apache Cassandra and the
Google Web Toolkit. This combines efficient pre-processing with
parallelisation.
The developed web application looks akin to a typical desktop application
and is highly responsive, since it downloads needed data in
parallel, while the user is happily working. The researcher can prepare
different data set views for different aspects of his analysis,
which are immediately available for colleagues and collaborators. By
*To whom correspondence should be addressed.
underpinning every decision and conference call with a synchronized
shared data set, group communication is greatly improved.
This work demonstrates that the interactivity with the user to work on
large data sets is strengthened with remote applications and typical
âshow next pageâ delays are overcome by employing the latest web
technologies. This way, the strong server-user interaction allows for
the seamless extension for serving additional users and thus allow
for collaborations.
Availability: Source code for the web application and the data storage
back-end was released under GNU Lesser General Public License
and is freely available for download from
http://github.com/fxtentacle/eQTL-GWT-Cassandra
Abstract: In the future, messages, e.g. speech, text or pictures, will be transmitted digitally since this is cheaper, more perfect and more flexible. It is possible to hide messages, which are of necessity much shorter, nearly unrecognizable for outsiders in such digitized messages. In this article we describe how computer based steganography works and give a summary on the results of our implementation.