Abstract: ABSTRACT: BACKGROUND: The use of global gene expression profiling is a well established approach to understand biological processes. One of the major goals of these investigations is to identify sets of genes with similar expression patterns. Such gene signatures may be very informative and reveal new aspects of particular biological processes. A logical and systematic next step is to reduce the identified gene signatures to the regulatory components that induce the relevant gene expression changes. A central issue in this context is to identify transcription factors, or transcription factor binding sites (TFBS), likely to be of importance for the expression of the gene signatures. RESULTS: We develop a strategy that efficiently produces TFBS/promoter databases based on user-defined criteria. The resulting databases constitute all genes in the Santa Cruz database and the positions for all TFBS provided by the user as PWMs. These databases are then used for two purposes, to identify significant TFBS in the promoters in sets of genes and to identify clusters of co-occurring TFBS. We use two criteria for significance, significantly enriched TFBS in terms of total number of binding sites for the promoters, and significantly present TFBS in terms of the fraction of promoters with binding sites. Significant TFBS are identified by a re-sampling procedure in which the query gene set is compared with typically 100 000 gene lists of similar size randomly drawn from the TFBS/promoter database. We apply this strategy to a large number of published ChIP-Chip data sets and show that the proposed approach faithfully reproduces ChIP-Chip results. The strategy also identifies relevant TFBS when analyzing gene signatures obtained from the MSigDB database. In addition, we show that several TFBS are highly correlated and that co-occurring TFBS define functionally related sets of genes. CONCLUSIONS: The presented approach of promoter analysis faithfully reproduces the results from several ChIP-Chip and MSigDB derived gene sets and hence may prove to be an important method in the analysis of gene signatures obtained through ChIP-Chip or global gene expression experiments. We show that TFBS are organized in clusters of co-occurring TFBS that together define highly coherent sets of genes.
Abstract: ABSTRACT: BACKGROUND: Urothelial carcinoma (UC) is characterized by nonrandom chromosomal aberrations, varying from one or a few changes in early-stage and low-grade tumors, to highly rearranged karyotypes in muscle-invasive lesions. Recent array-CGH analyses have shed further light on the genomic changes underlying the neoplastic development of UC, and have facilitated the molecular delineation amplified and deleted regions to the level of specific candidate genes. In the present investigation we combine detailed genomic information with expression information to identify putative target genes for genomic amplifications. METHODS: We analyzed 38 urothelial carcinomas by whole-genome tiling resolution array-CGH and high density expression profiling to identify putative target genes in common genomic amplifications. When necessary expression profiling was complemented with Q-PCR of individual genes. RESULTS: Three genomic segments were frequently and exclusively amplified in high grade tumors; 1q23, 6p22 and 8q22, respectively. Detailed mapping of the 1q23 segment showed a heterogeneous amplification pattern and no obvious commonly amplified region. The 6p22 amplicon was defined by a 1.8 Mb core region present in all amplifications, flanked both distally and proximally by segments amplified to a lesser extent. By combining genomic profiles with expression profiles we could show that amplification of E2F3, CDKAL1, SOX4, and MBOAT1 as well as NUP153, AOF1, FAM8A1 and DEK in 6p22 was associated with increased gene expression. Amplification of the 8q22 segment was primarily associated with YWHAZ (14-3-3-zeta) and POLR2K over expression. The possible importance of the YWHA genes in the development of urothelial carcinomas was supported by another recurrent amplicon paralogous to 8q22, in 2p25, where increased copy numbers lead to enhanced expression of YWHAQ (14-3-3-theta). Homozygous deletions were identified at 10 different genomic locations, most frequently affecting CDKN2A/CDKN2B in 9p21 (32%). Notably, the latter occurred mutually exclusive with 6p22 amplifications. CONCLUSION: The presented data indicates 6p22 as a composite amplicon with more than one possible target gene. The data also suggests that amplification of 6p22 and homozygous deletions of 9p21 may have complementary roles. Furthermore, the analysis of paralogous regions that showed genomic amplification indicated altered expression of YWHA (14-3-3) genes as important events in the development of UC.
Abstract: DNA methylation is an important epigenetic modification that regulates several genes crucial for tumor development. To identify epigenetically regulated genes in bladder cancer, we performed genome wide expression analyses of eight-bladder cancer cell lines treated with the demethylating agents 5-aza-2'-cytidine and zebularine. To identify methylated C-residues, we sequenced cloned DNA fragments from bisulfite-treated genomic DNA. We identified a total of 1092 genes that showed > or =2-fold altered expression in at least one cell line; 710 showed up-regulation and 382 down-regulation. Extensive sequencing of promoters from 25 genes in eight cell lines showed an association between methylation pattern and expression in 13 genes, including both CpG island and non-CpG island genes. Overall, the methylation patterns showed a patchy appearance with short segments showing high level of methylation separated by larger segments with no methylation. This pattern was not associated with MeCP2 binding sites or with evolutionarily conserved sequences. The genes UBXD2, AQP11, and TIMP1 showed particular patchy methylation patterns. We found several high-scoring and evolutionarily conserved transcription factor binding sites affected by methylated C residues. Two of the genes, FGF18 and MMP11, that were down-regulated as response to 5-aza-2'-cytidine and zebularine treatment showed methylation at specific sites in the untreated cells indicating an activating result of methylation. Apart from identifying epigenetically regulated genes, including TGFBR1, NUPR1, FGF18, TIMP1, and MMP11, that may be of importance for bladder cancer development the presented data also highlight the organization of the modified segments in methylated promoters. This article contains supplementary material available via the Internet at http://www.interscience.wiley.com/jpages/1045-2257/suppmat.
Abstract: BACKGROUND: An alternative to standard approaches to uncover biologically meaningful structures in micro array data is to treat the data as a blind source separation (BSS) problem. BSS attempts to separate a mixture of signals into their different sources and refers to the problem of recovering signals from several observed linear mixtures. In the context of micro array data, "sources" may correspond to specific cellular responses or to co-regulated genes. RESULTS: We applied independent component analysis (ICA) to three different microarray data sets; two tumor data sets and one time series experiment. To obtain reliable components we used iterated ICA to estimate component centrotypes. We found that many of the low ranking components indeed may show a strong biological coherence and hence be of biological significance. Generally ICA achieved a higher resolution when compared with results based on correlated expression and a larger number of gene clusters with significantly enriched for gene ontology (GO) categories. In addition, components characteristic for molecular subtypes and for tumors with specific chromosomal translocations were identified. ICA also identified more than one gene clusters significant for the same GO categories and hence disclosed a higher level of biological heterogeneity, even within coherent groups of genes. CONCLUSION: Although the ICA approach primarily detects hidden variables, these surfaced as highly correlated genes in time series data and in one instance in the tumor data. This further strengthens the biological relevance of latent variables detected by ICA.
Abstract: BACKGROUND: The use of global gene expression profiling to identify sets of genes with similar expression patterns is rapidly becoming a widespread approach for understanding biological processes. A logical and systematic approach to study co-expressed genes is to analyze their promoter sequences to identify transcription factors that may be involved in establishing specific profiles and that may be experimentally investigated. RESULTS: We introduce promoter clustering i.e. grouping of promoters with respect to their high scoring motif content, and show that this approach greatly enhances the identification of common and significant transcription factor binding sites (TFBS) in co-expressed genes. We apply this method to two different dataset, one consisting of micro array data from 108 leukemias (AMLs) and a second from a time series experiment, and show that biologically relevant promoter patterns may be obtained using phylogenetic foot-printing methodology. In addition, we also found that 15% of the analyzed promoter regions contained transcription factors start sites for additional genes transcribed in the opposite direction. CONCLUSION: Promoter clustering based on global promoter features greatly improve the identification of shared TFBS in co-expressed genes. We believe that the outlined approach may be a useful first step to identify transcription factors that contribute to specific features of gene expression profiles.
Abstract: SUMMARY: Array Close Information Database is an online database for information about microarray cDNA clones. For each clone, the database contents include assigned UniGene cluster(s), location in the full-length transcript, assigned gene ontology terms and position in the genome assembly. AVAILABILITY: http://bioinfo.thep.lu.se/acid.html