Abstract: OBJECTIVE: To identify potential autoreactive B-cell and plasma-cell clones by quantitatively analysing the complete human B-cell receptor (BCR) repertoire in synovium and peripheral blood in early and established rheumatoid arthritis (RA). METHODS: The BCR repertoire was screened in synovium and blood of six patients with early RA (ERA) (<6 months) and six with established RA (ESRA) (>20 months). In two patients, the repertoires in different joints were compared. Repertoires were analysed by next-generation sequencing from mRNA, generating >10 000 BCR heavy-chain sequence reads per sample. For each clone, the degree of expansion was calculated as the percentage of the total number of reads encoding the specific clonal sequence. Clones with a frequency ≥0.5% were considered dominant. RESULTS: Multiple dominant clones were found in inflamed synovium but hardly any in blood. Within an individual patient, the same dominant clones were detected in different joints. The majority of the synovial clones were class-switched; however, the fraction of clones that expressed IgM was higher in ESRA than ERA patients. Dominant synovial clones showed autoreactive features: in ERA in particular the clones were enriched for immunoglobulin heavy chain gene segment V4-34 (IGHV4-34) and showed longer CDR3 lengths. Dominant synovial clones that did not encode IGHV4-34 also had longer CDR3s than peripheral blood. CONCLUSIONS: In RA, the synovium forms a niche where expanded-potentially autoreactive-B cells and plasma cells reside. The inflamed target tissue, especially in the earliest phase of disease, seems to be the most promising compartment for studying autoreactive cells.
Abstract: Large-scale population sequencing studies provide a complete picture of human genetic variation within the studied populations. A key challenge is to identify, among the myriad alleles, those variants that have an effect on molecular function, phenotypes, and reproductive fitness. Most non-neutral variation consists of deleterious alleles segregating at low population frequency due to incessant mutation. To date, studies characterizing selection against deleterious alleles have been based on allele frequency (testing for a relative excess of rare alleles) or ratio of polymorphism to divergence (testing for a relative increase in the number of polymorphic alleles). Here, starting from Maruyama's theoretical prediction (Maruyama T (1974), Am J Hum Genet USA 6:669-673) that a (slightly) deleterious allele is, on average, younger than a neutral allele segregating at the same frequency, we devised an approach to characterize selection based on allelic age. Unlike existing methods, it compares sets of neutral and deleterious sequence variants at the same allele frequency. When applied to human sequence data from the Genome of the Netherlands Project, our approach distinguishes low-frequency coding non-synonymous variants from synonymous and non-coding variants at the same allele frequency and discriminates between sets of variants independently predicted to be benign or damaging for protein structure and function. The results confirm the abundance of slightly deleterious coding variation in humans.
Abstract: Memory T cells form a highly specific defense layer against reinfection with previously encountered pathogens. In addition, memory T cells provide protection against pathogens that are similar, but not identical to the original infectious agent. This is because each T cell response harbors multiple clones with slightly different affinities, thereby creating T cell memory with a certain degree of diversity. Currently, the mechanisms that control size, diversity, and cross-reactivity of the memory T cell pool are incompletely defined. Previously, we established a role for apoptosis, mediated by the BH3-only protein Noxa, in controlling diversity of the effector T cell population. This function might positively or negatively impact T cell memory in terms of function, pool size, and cross-reactivity during recall responses. Therefore, we investigated the role of Noxa in T cell memory during acute and chronic infections. Upon influenza infection, Noxa(-/-) mice generate a memory compartment of increased size and clonal diversity. Reinfection resulted in an increased recall response, whereas cross-reactive responses were impaired. Chronic infection of Noxa(-/-) mice with mouse CMV resulted in enhanced memory cell inflation, but no obvious pathology. In contrast, in a model of continuous, high-level T cell activation, reduced apoptosis of activated T cells rapidly led to severe organ pathology and premature death in Noxa-deficient mice. These results establish Noxa as an important regulator of the number of memory cells formed during infection. Chronic immune activation in the absence of Noxa leads to excessive accumulation of primed cells, which may result in severe pathology.
Abstract: Nicolaides-Baraitser syndrome (NBS) is characterized by sparse hair, distinctive facial morphology, distal-limb anomalies and intellectual disability. We sequenced the exomes of ten individuals with NBS and identified heterozygous variants in SMARCA2 in eight of them. Extended molecular screening identified nonsynonymous SMARCA2 mutations in 36 of 44 individuals with NBS; these mutations were confirmed to be de novo when parental samples were available. SMARCA2 encodes the core catalytic unit of the SWI/SNF ATP-dependent chromatin remodeling complex that is involved in the regulation of gene transcription. The mutations cluster within sequences that encode ultra-conserved motifs in the catalytic ATPase region of the protein. These alterations likely do not impair SWI/SNF complex assembly but may be associated with disrupted ATPase activity. The identification of SMARCA2 mutations in humans provides insight into the function of the Snf2 helicase family.
Abstract: To profile quantitatively the T-cell repertoire in multiple joints and peripheral blood of patients with recent onset (early) or established rheumatoid arthritis (RA) using a novel next-generation sequencing protocol to identify potential autoreactive clones.
Abstract: CD8(+) T-cell responses against latent viruses can cover considerable portions of the CD8(+) T-cell compartment for many decades, yet their initiation and maintenance remains poorly characterized in humans. A key question is whether the clonal repertoire that is raised during the initial antiviral response can be maintained over these long periods. To investigate this we combined next-generation sequencing of the T-cell receptor repertoire with tetramer-sorting to identify, quantify and longitudinally follow virus-specific clones within the CD8(+) T-cell compartment. Using this approach we studied primary infections of human cytomegalovirus (hCMV) and Epstein Barr virus (EBV) in renal transplant recipients. For both viruses we found that nearly all virus-specific CD8(+) T-cell clones that appeared during the early phase of infection were maintained at high frequencies during the 5-year follow-up and hardly any new anti-viral clones appeared. Both in transplant recipients and in healthy carriers the clones specific for these latent viruses were highly dominant within the CD8(+) T-cell receptor Vβ repertoire. These findings suggest that the initial antiviral response in humans is maintained in a stable fashion without signs of contraction or changes of the clonal repertoire.
Abstract: Virus discovery combining sequence unbiased amplification with next generation sequencing is now state-of-the-art. We have previously determined that the performance of the unbiased amplification technique which is operational at our institute, VIDISCA-454, is efficient when respiratory samples are used as input. The performance of the assay is, however, not known for other clinical materials like blood or stool samples. Here, we investigated the sensitivity of VIDISCA-454 with feces-suspensions and serum samples that are positive and that have been quantified for norovirus and human immunodeficiency virus type 1, respectively. The performance of VIDISCA-454 in serum samples was equal to its performance in respiratory material, with an estimated lower threshold of 1,000 viral genome copies. The estimated threshold in feces-suspension is around 200,000 viral genome copies. The decreased sensitivity in feces suspension is mainly due to sequences that share no recognizable identity with known sequences. Most likely these sequences originate from bacteria and phages which are not completely sequenced.
Abstract: Studying transcriptomes by ultra deep sequencing provides an in-depth picture of transcriptional regulation and it facilitates the detection of rare transcriptional events. Using ultra deep sequencing of amplicons we identified known isoforms and also various new low frequency variants. Most of these variants likely involve the splicing machinery except for two events that we named variations affecting multiple exons, which are mainly deletions affecting parts of adjacent exons and intra-exonic deletions. Both events involve short identical sequences of 1 to 8 nucleotides at the junction and canonical splice sites are missing. They were identified in different genes and species at very low frequencies. We excluded that they are an artifact of PCR, sequencing, or reverse transcription. We propose that these variants represent intramolecular slippage events that require short identical sequences for reannealing of dissociated transcripts.
Abstract: Neisseria meningitidis is an obligate human pathogen. While it is a frequent commensal of the upper respiratory tract, in some individuals the bacterium spreads to the bloodstream, causing meningitis and/or sepsis, which are serious conditions with high morbidity and mortality. Here we report the availability of the genome sequence of the widely used serogroup B laboratory strain H44/76.
Abstract: In 5-40% of respiratory infections in children, the diagnostics remain negative, suggesting that the patients might be infected with a yet unknown pathogen. Virus discovery cDNA-AFLP (VIDISCA) is a virus discovery method based on recognition of restriction enzyme cleavage sites, ligation of adaptors and subsequent amplification by PCR. However, direct discovery of unknown pathogens in nasopharyngeal swabs is difficult due to the high concentration of ribosomal RNA (rRNA) that acts as competitor. In the current study we optimized VIDISCA by adjusting the reverse transcription enzymes and decreasing rRNA amplification in the reverse transcription, using hexamer oligonucleotides that do not anneal to rRNA. Residual cDNA synthesis on rRNA templates was further reduced with oligonucleotides that anneal to rRNA but can not be extended due to 3'-dideoxy-C6-modification. With these modifications >90% reduction of rRNA amplification was established. Further improvement of the VIDISCA sensitivity was obtained by high throughput sequencing (VIDISCA-454). Eighteen nasopharyngeal swabs were analysed, all containing known respiratory viruses. We could identify the proper virus in the majority of samples tested (11/18). The median load in the VIDISCA-454 positive samples was 7.2 E5 viral genome copies/ml (ranging from 1.4 E3-7.7 E6). Our results show that optimization of VIDISCA and subsequent high-throughput-sequencing enhances sensitivity drastically and provides the opportunity to perform virus discovery directly in patient material.
Abstract: Myoclonus-dystonia (M-D) is a neurological movement disorder with involuntary jerky and dystonic movements as major symptoms. About 50% of M-D patients have a mutation in É›-sarcoglycan (SGCE), a maternally imprinted gene that is widely expressed. As little is known about SGCE function, one can only speculate about the pathomechanisms of the exclusively neurological phenotype in M-D. We characterized different SGCE isoforms in the human brain using ultra-deep sequencing. We show that a major brain-specific isoform is differentially expressed in the human brain with a notably high expression in the cerebellum, namely in the Purkinje cells and neurons of the dentate nucleus. Its expression was low in the globus pallidus and moderate to low in caudate nucleus, putamen and substantia nigra. Our data are compatible with a model in which dysfunction of the cerebellum is involved in the pathogenesis of M-D.
Abstract: The immune system is able to respond to millions of antigens using adaptive receptors, including the alphabeta-T-cell receptor (TCR). Upon antigen encounter a T-cell may proliferate to produce a clone of TCR-identical cells, which develop a memory phenotype. Previous studies suggested that most memory clones are clearly expanded. In accordance, the beta-chain repertoire of T-cell memory subsets was reported to be 10 times less diverse than those of naive subsets, reflecting stringent selection. However, due to technological limitations detailed information was lacking regarding the size of clonal expansions and the diversity of the TCR-repertoire in naive and memory T-cell populations. Here, using high-throughput sequencing, we show that the memory repertoire in human peripheral blood contains only few expanded clones and consists mainly of low frequency clones. Additionally, the memory repertoire is much more diverse than expected. In two healthy persons we observed that only 2-7% of the CD4 and CD8 memory clones found were clearly expanded. In line with this observation we show that the beta-chains repertoire size of the CD4 memory compartment is only two times smaller, and that of the CD8 memory compartment is only 3-10 times smaller than the naive compartments. Our results show that the T-cell memory compartment has a very different distribution of clones than anticipated. This has important implications for the current dogma of immunological memory, and changes the interpretation of repertoire aberrations in (patho-)physiological situations such as ageing and auto-immunity. It raises new questions on the factors that steer maturation of memory phenotype and determine the size of memory clones.
Abstract: Bioinformatics is confronted with a new data explosion due to the availability of high throughput DNA sequencers. Data storage and analysis becomes a problem on local servers, and therefore it is needed to switch to other IT infrastructures. Grid and workflow technology can help to handle the data more efficiently, as well as facilitate collaborations. However, interfaces to grids are often unfriendly to novice users.
Abstract: Gene-oriented sequence clusters (transcriptional units) have found many applications in genomics research including the construction of transcriptome maps and identification of splice variants. We developed a new method to construct transcriptional that uses the genomic sequence as a template. We present and discuss our method in detail together with an evaluation of the transcriptional units for human. We constructed 33,007 and 27,792 transcriptional units for human and mouse, respectively. The sensitivity (81%) and specificity (90%) of our method compares favorably to other established methods. We evaluated the representation of experimentally validated and predicted intergenic spliced transcripts in humans and show that we correctly represent a large fraction of these cases by single transcriptional units. Our method performs well, but the evaluation of the final set of transcriptional units show that improvements to the algorithm are still possible. However, because the precise number and types of errors are difficult to track, it is not obvious how to significantly improve the algorithm. We believe that ongoing research efforts are necessary to further improve current methods. This should include detailed documentation, comparison, and evaluation of current methods.
Abstract: The chromosomal gene expression profiles established by the Human Transcriptome Map (HTM) revealed a clustering of highly expressed genes in about 30 domains, called ridges. To physically characterize ridges, we constructed a new HTM based on the draft human genome sequence (HTMseq). Expression of 25,003 genes can be analyzed online in a multitude of tissues (http://bioinfo.amc.uva.nl/HTMseq). Ridges are found to be very gene-dense domains with a high GC content, a high SINE repeat density, and a low LINE repeat density. Genes in ridges have significantly shorter introns than genes outside of ridges. The HTMseq also identifies a significant clustering of weakly expressed genes in domains with fully opposite characteristics (antiridges). Both types of domains are open to tissue-specific expression regulation, but the maximal expression levels in ridges are considerably higher than in antiridges. Ridges are therefore an integral part of a higher order structure in the genome related to transcriptional regulation.
Abstract: Gain of chromosome 17q material is the most frequent genetic abnormality in neuroblastomas. The common region of gain is at least 375 cR large, which has precluded the identification of genes with a role in neuroblastoma pathogenesis. Neuroblastoma also frequently show amplification of the N-myc oncogene, which correlates closely with 17q gain. Both events are strong predictors of unfavorable prognosis. To identify genes that are part of the N-myc downstream pathway, we constructed SAGE libraries of an N-myc transfected and a control cell line. This identified the chromosome 17q genes nm23-H1 and nm23-H2 as being 6-10 times induced in the N-myc expressing cells. Northern and Western blot analysis confirmed this up-regulation. Time-course experiment shows that both genes are induced within 4 h after N-myc is switched on. Furthermore, we demonstrate also that c-myc can up-regulate nm23-H1 and nm23-H2 expression. Neuroblastoma tumor and cell line panels reveal a striking correlation between N-myc amplification and mRNA and protein expression of both nm23 genes. We show that the nm23 genes are located at the edge of the common region of chromosome 17q gain previously described in neuroblastoma cell lines. Our findings suggest that nm23-H1 and nm23-H2 expression is increased by 17q gain in neuroblastoma and can be further up-regulated by myc overexpression. These observations suggest a major role for nm23-H1 and nm23-H2 in tumorigenesis of unfavorable neuroblastomas.
Abstract: Neuroblastoma is an embryonal tumor originating from neural crest-derived cells. Here we present the serendipitous cloning of amplified sequences of chromosome 2p15 in neuroblastoma cell line IMR32. The amplified region was analyzed for oncogene activation using a SAGE (serial analysis of gene expression) library of IMR32. SAGE permits a quantitative analysis of all transcripts of a tissue or cell line. The expression of genes and ESTs mapping within a 30-cR region covering the amplicon was compared to 4 additional SAGE libraries of neuroblastomas and 12 SAGE libraries of other tissues in the CGAP databases. The IMR32 SAGE database revealed increased expression of the MEIS1 oncogene, whereas other SAGE libraries showed little or no MEIS1 expression. MEIS1 turned out to be highly amplified and overexpressed in IMR32. Analysis of 24 neuroblastoma cell lines and 22 tumors showed high-level expression in about 25% of the cases. The MEIS1 homeobox protein forms a complex with the HOXA9 and PBX proteins that are implicated in human leukemia. MEIS1 is a target of retroviral insertion in murine leukemia. This is the first report of a MEIS1 amplification and high expression levels in human cancer and the first time that identification of a candidate target of amplification is facilitated by high-throughput mRNA expression profiling.
Abstract: The chromosomal position of human genes is rapidly being established. We integrated these mapping data with genome-wide messenger RNA expression profiles as provided by SAGE (serial analysis of gene expression). Over 2.45 million SAGE transcript tags, including 160,000 tags of neuroblastomas, are presently known for 12 tissue types. We developed algorithms to assign these tags to UniGene clusters and their chromosomal position. The resulting Human Transcriptome Map generates gene expression profiles for any chromosomal region in 12 normal and pathologic tissue types. The map reveals a clustering of highly expressed genes to specific chromosomal regions. It provides a tool to search for genes that are overexpressed or silenced in cancer.
Abstract: MOTIVATION: SAGE enables the determination of genome-wide mRNA expression profiles. A comprehensive analysis of SAGE data requires software, which integrates (statistical) data analysis methods with a database system. Furthermore, to facilitate data sharing between users, the application should reside on a central server and be accessed via the internet. Since such an application was not available we developed the USAGE package. RESULTS: USAGE is a web-based application that comprises an integrated set of tools, which offers many functions for analysing and comparing SAGE data. Additionally, USAGE includes a statistical method for the planning of new SAGE experiments. USAGE is available in a multi-user environment giving users the option of sharing data. USAGE is interfaced to a relational database to store data and analysis results. The USAGE query editor allows the composition of queries for searching this database. Several database functions have been included which enable the selection and combination of data. USAGE provides the biologist increased functionality and flexibility for analysing SAGE data. AVAILABILITY: USAGE is freely accessible for academic institutions at http://www.cmbi.kun.nl/usage/. The source code of USAGE is freely available for academic institutions on request from the first author.
Abstract: Scientific research has become very data and compute intensive because of the progress in data acquisition and measurement devices, which is particularly true in Life Sciences. To cope with this deluge of data, scientists use distributed computing and storage infrastructures. The use of such infrastructures introduces by itself new challenges to the scientists in terms of proper and efficient use. Scientific workflow management systems play an important role in facilitating the use of the infrastructure by hiding some of its complexity. Although most scientific workflow management systems are provenance-aware, not all of them come with provenance functionality out of the box. In this paper we describe the improvement and integration of a provenance system into an e-infrastructure for biomedical research based on the MOTEUR workflow management system. The main contributions of the paper are: presenting an OPM implementation using relational database backend for the provenance store, providing an e-infrastructure with a comprehensive provenance system, defining a generic approach to provenance implementation, potentially suitable for other workflow systems and application domains and demonstrating the value of this system based on use cases presenting the provenance data through a user-friendly web interface.