hosted by
publicationslist.org
    

Jaakko Hollmén

Helsinki University of Technology, Department of Information and Computer Science
Jaakko.Hollmen@hut.fi

Journal articles

2009
Michaela Wrage, Salla Ruosaari, Paul P Eijk, Jussuf T Kaifi, Jaakko Hollmén, Emre F Yekebas, Jakob R Izbicki, Ruud H Brakenhoff, Thomas Streichert, Sabine Riethdorf, Markus Glatzel, Bauke Ylstra, Klaus Pantel, Harriet Wikman (2009)  Genomic profiles associated with early micrometastasis in lung cancer: relevance of 4q deletion.   Clin Cancer Res 15: 5. 1566-1574 Mar  
Abstract: PURPOSE: Bone marrow is a common homing organ for early disseminated tumor cells (DTC) and their presence can predict the subsequent occurrence of overt metastasis and survival in lung cancer. It is still unclear whether the shedding of DTC from the primary tumor is a random process or a selective release driven by a specific genomic pattern. EXPERIMENTAL DESIGN: DTCs were identified in bone marrow from lung cancer patients by an immunocytochemical cytokeratin assay. Genomic aberrations and expression profiles of the respective primary tumors were assessed by microarrays and fluorescence in situ hybridization analyses. The most significant results were validated on an independent set of primary lung tumors and brain metastases. RESULTS: Combination of DNA copy number profiles (array comparative genomic hybridization) with gene expression profiles identified five chromosomal regions differentiating bone marrow-negative from bone marrow-positive patients (4q12-q32, 10p12-p11, 10q21-q22, 17q21, and 20q11-q13). Copy number changes of 4q12-q32 were the most prominent finding, containing the highest number of differentially expressed genes irrespective of chromosomal size (P=0.018). Fluorescence in situ hybridization analyses on further primary lung tumor samples confirmed the association between loss of 4q and bone marrow-positive status. In bone marrow-positive patients, 4q was frequently lost (37% versus 7%), whereas gains could be commonly found among bone marrow-negative patients (7% versus 17%). The same loss was also found to be common in brain metastases from both small and non-small cell lung cancer patients (39%). CONCLUSIONS: Thus, our data indicate, for the first time, that early hematogenous dissemination of tumor cells might be driven by a specific pattern of genomic changes.
Notes:
2008
Salla Ruosaari, Tuija Hienonen-Kempas, Anne Puustinen, Virinder K Sarhadi, Jaakko Hollmén, Sakari Knuutila, Juha Saharinen, Harriet Wikman, Sisko Anttila (2008)  Pathways affected by asbestos exposure in normal and tumour tissue of lung cancer patients.   BMC Med Genomics 1: 11  
Abstract: BACKGROUND: Studies on asbestos-induced tumourigenesis have indicated the role of, e.g., reactive oxygen/nitrogen species, mitochondria, as well as NF-kappaB and MAPK signalling pathways. The exact molecular mechanisms contributing to asbestos-mediated carcinogenesis are, however, still to be characterized. METHODS: In this study, gene expression data analyses together with gene annotation data from the Gene Ontology (GO) database were utilized to identify pathways that are differentially regulated in lung and tumour tissues between asbestos-exposed and non-exposed lung cancer patients. Differentially regulated pathways were identified from gene expression data from 14 asbestos-exposed and 14 non-exposed lung cancer patients using custom-made software and Iterative Group Analysis (iGA). Western blotting was used to further characterize the findings, specifically to determine the protein levels of UBA1 and UBA7. RESULTS: Differences between asbestos-related and non-related lung tumours were detected in pathways associated with, e.g., ion transport, NF-kappaB signalling, DNA repair, as well as spliceosome and nucleosome complexes. A notable fraction of the pathways down-regulated in both normal and tumour tissue of the asbestos-exposed patients were related to protein ubiquitination, a versatile process regulating, for instance, DNA repair, cell cycle, and apoptosis, and thus being also a significant contributor of carcinogenesis. Even though UBA1 or UBA7, the early enzymes involved in protein ubiquitination and ubiquitin-like regulation of target proteins, did not underlie the exposure-related deregulation of ubiquitination, a difference was detected in the UBA1 and UBA7 levels between squamous cell carcinomas and respective normal lung tissue (p = 0.02 and p = 0.01) without regard to exposure status. CONCLUSION: Our results indicate alterations in protein ubiquitination related both to cancer type and asbestos. We present for the first time pathway analysis results on asbestos-associated lung cancer, providing important insight into the most relevant targets for future research.
Notes:
Samuel Myllykangas, Jarkko Tikka, Tom Böhling, Sakari Knuutila, Jaakko Hollmén (2008)  Classification of human cancers based on DNA copy number amplification modeling.   BMC Med Genomics 1: 05  
Abstract: BACKGROUND: DNA amplifications alter gene dosage in cancer genomes by multiplying the gene copy number. Amplifications are quintessential in a considerable number of advanced cancers of various anatomical locations. The aims of this study were to classify human cancers based on their amplification patterns, explore the biological and clinical fundamentals behind their amplification-pattern based classification, and understand the characteristics in human genomic architecture that associate with amplification mechanisms. METHODS: We applied a machine learning approach to model DNA copy number amplifications using a data set of binary amplification records at chromosome sub-band resolution from 4400 cases that represent 82 cancer types. Amplification data was fused with background data: clinical, histological and biological classifications, and cytogenetic annotations. Statistical hypothesis testing was used to mine associations between the data sets. RESULTS: Probabilistic clustering of each chromosome identified 111 amplification models and divided the cancer cases into clusters. The distribution of classification terms in the amplification-model based clustering of cancer cases revealed cancer classes that were associated with specific DNA copy number amplification models. Amplification patterns - finite or bounded descriptions of the ranges of the amplifications in the chromosome - were extracted from the clustered data and expressed according to the original cytogenetic nomenclature. This was achieved by maximal frequent itemset mining using the cluster-specific data sets. The boundaries of amplification patterns were shown to be enriched with fragile sites, telomeres, centromeres, and light chromosome bands. CONCLUSIONS: Our results demonstrate that amplifications are non-random chromosomal changes and specifically selected in tumor tissue microenvironment. Furthermore, statistical evidence showed that specific chromosomal features co-localize with amplification breakpoints and link them in the amplification process.
Notes:
Salla T Ruosaari, Penny E H Nymark, Mervi M Aavikko, Eeva Kettunen, Sakari Knuutila, Jaakko Hollmén, Hannu Norppa, Sisko L Anttila (2008)  Aberrations of chromosome 19 in asbestos-associated lung cancer and in asbestos-induced micronuclei of bronchial epithelial cells in vitro.   Carcinogenesis 29: 5. 913-917 May  
Abstract: Exposure to asbestos is known to induce lung cancer, and our previous studies have suggested that specific chromosomal regions, such as 19p13, are preferentially aberrant in lung tumours of asbestos-exposed patients. Here, we further examined the association between the 19p region and exposure to asbestos using array comparative genomic hybridization and fluorescence in situ hybridization (FISH) in lung tumours and FISH characterization of asbestos-induced micronuclei (MN) in human bronchial epithelial BEAS 2B cells in vitro. We detected an increased number of 19p losses in the tumours of asbestos-exposed patients in comparison with tumours from non-exposed subjects with similar distribution of tumour histology in both groups (13/33; 39% versus 3/25; 12%, P = 0.04). In BEAS 2B cells, a 48 h exposure to crocidolite asbestos (2.0 microg/cm(2)) was found to induce centromere-negative MN-harbouring chromosomal fragments. Furthermore, an increased frequency of rare MN containing a 19p fragment was observed after the crocidolite treatment in comparison with untreated controls (6/6000 versus 1/10 000, P = 0.01). The results suggest that 19p has significance in asbestos-associated carcinogenesis and that asbestos may be capable of inducing specific chromosome aberrations.
Notes:
J Tikka, J Hollmen (2008)  Sequential input selection algorithm for long-term prediction of time series   NEUROCOMPUTING 71: 25. 2604-2615 AUG  
Abstract: In time series prediction, making accurate predictions is often the primary goal. At the same time, interpretability of the models would be desirable. For the latter goal, we have devised a sequential input selection algorithm (SISAL) to choose a parsimonious, or sparse, set of input variables. Our proposed algorithm is a sequential backward selection type algorithm based on a cross-validation resampling procedure. Our strategy is to use a filter approach in the prediction: first we select a sparse set of inputs using linear models and then the selected inputs are used in the nonlinear prediction conducted with multilayer-perceptron networks. Furthermore, we perform a sensitivity analysis by quantifying the importance of the individual input variables in the nonlinear models using a method based on partial derivatives. Experiments are done with the Santa Fe laser data set that exhibits very nonlinear behavior and a data set in a problem of electricity load prediction. The results in the prediction problems of varying difficulty highlight the range of applicability of our proposed algorithm. In summary, our SISAL yields accurate and parsimonious prediction models giving insight to the original problem. (C) 2008 Elsevier B.V. All rights reserved.
Notes: Times Cited: 1
Samuel Myllykangas, Siina Junnila, Arto Kokkola, Reija Autio, Ilari Scheinin, Tuula Kiviluoto, Marja-Liisa Karjalainen-Lindsberg, Jaakko Hollmén, Sakari Knuutila, Pauli Puolakkainen, Outi Monni (2008)  Integrated gene copy number and expression microarray analysis of gastric cancer highlights potential target genes.   Int J Cancer 123: 4. 817-825 Aug  
Abstract: We performed an integrated array comparative genomic hybridization (aCGH) and expression microarray analysis of 8 normal gastric tissues and 38 primary tumors, including 25 intestinal and 13 diffuse gastric adenocarcinomas to identify genes whose expression is deregulated in association with copy number alteration. Our aim was also to identify molecular genetic alterations that are specific to particular clinicopathological characteristics of gastric cancer. Distinct molecular genetic profiles were identified for intestinal and diffuse gastric cancers and for tumors obtained from 2 different locations of the stomach. Interestingly, the ERBB2 amplification and gains at 20q13.12-q13.33 almost exclusively discriminated intestinal cancers from the diffuse type. In addition, the 17q12-q25 gain was characteristic to cancers located in corpus and the 20q13.12-q13.13 gain was more common in the antrum. Statistical analysis was performed using integrated copy number and expression data to identify genes showing differential expression associated with a copy number alteration. Genes with the highest statistical significance included ERBB2, MUC1, GRB7, PPP1R1B and PPARBP with concomitant changes in copy number and expression. Immunohistochemical analysis of ERBB2 and MUC1 on a tissue microarray containing 78 independent gastric tissues showed statistically significant differences (p < 0.05 and <0.001) in immunopositivity in the intestinal (31 and 70%) and diffuse subtypes (14 and 41%), respectively. In conclusion, our results demonstrate that intestinal and diffuse type gastric cancers as well as cancers located in different sites of the stomach have distinct molecular profiles which may have clinical value.
Notes:
2007
Harri Keski-Säntti, Timo Atula, Jarkko Tikka, Jaakko Hollmén, Antti A Mäkitie, Ilmo Leivo (2007)  Predictive value of histopathologic parameters in early squamous cell carcinoma of oral tongue.   Oral Oncol 43: 10. 1007-1013 Nov  
Abstract: The clinical course of early squamous cell carcinoma of oral tongue (OTSCC) is unpredictable and various histopathologic parameters of the primary tumour have been suggested as prognostic factors to be used in clinical decision-making. We reviewed clinicopathologic data of 73 patients diagnosed with Stage I-II OTSCC. Predictive value of pathological T-stage, depth of infiltration, grade, and mode of invasion with respect to local recurrences, occult cervical metastases, and disease specific survival (DSS) was analysed. Depth of infiltration and pT-stage significantly predicted occult nodal disease, while only pT-stage predicted local recurrence. Specific cut-off value for depth of infiltration separating high-risk and low-risk patients was not found. Significant correlations between the histopathologic parameters and DSS were not found. We conclude that depth of infiltration predicted occult nodal disease but its value in clinical decision-making is limited because of poor specificity when using a cut-off value that offers reasonable sensitivity for finding the patients with occult nodal disease. The risk for occult metastases and local recurrence was high in patients with pT2 tumours.
Notes:
H Wikman, S Ruosaari, P Nymark, V K Sarhadi, J Saharinen, E Vanhala, A Karjalainen, J Hollmén, S Knuutila, S Anttila (2007)  Gene expression and copy number profiling suggests the importance of allelic imbalance in 19p in asbestos-associated lung cancer.   Oncogene 26: 32. 4730-4737 Jul  
Abstract: Asbestos is a pulmonary carcinogen known to give rise to DNA and chromosomal damage, but the exact carcinogenic mechanisms are still largely unknown. In this study, gene expression arrays were performed on lung tumor samples from 14 heavily asbestos-exposed and 14 non-exposed patients matched for other characteristics. Using a two-step statistical analysis, 47 genes were revealed that could differentiate the tumors of asbestos-exposed from those of non-exposed patients. To identify asbestos-associated regions with DNA copy number and expressional changes, the gene expression data were combined with comparative genomic hybridization microarray data. As a result, a combinatory profile of DNA copy number aberrations and expressional changes significantly associated with asbestos exposure was obtained. Asbestos-related areas were detected in 2p21-p16.3, 3p21.31, 5q35.2-q35.3, 16p13.3, 19p13.3-p13.1 and 22q12.3-q13.1. The most prominent of these, 19p13, was further characterized by microsatellite analysis in 62 patients for the differences in allelic imbalance (AI) between the two groups of lung tumors. 79% of the exposed and 45% of the non-exposed patients (P=0.008) were found to be carriers of AI in their lung tumors. In the exposed group, AI in 19p was prevalent regardless of the histological tumor type. In adenocarcinomas, AI in 19p appeared to occur independently of the asbestos exposure.
Notes:
S Luyssaert, I A Janssens, M Sulkava, D Papale, A J Dolman, M Reichstein, J Hollmen, J G Martin, T Suni, T Vesala, D Loustau, B E Law, E J Moors (2007)  Photosynthesis drives anomalies in net carbon-exchange of pine forests at different latitudes   GLOBAL CHANGE BIOLOGY 13: 51. 2110-2127 OCT  
Abstract: The growth rate of atmospheric CO2 exhibits large temporal variation that is largely determined by year-to-year fluctuations in land-atmosphere CO2 fluxes. This land-atmosphere CO2-flux is driven by large-scale biomass burning and variation in net ecosystem exchange (NEE). Between- and within years, NEE varies due to fluctuations in climate. Studies on climatic influences on inter- and intra-annual variability in gross photosynthesis (GPP) and net carbon uptake in terrestrial ecosystems have shown conflicting results. These conflicts are in part related to differences in methodology and in part to the limited duration of some studies. Here, we introduce an observation-driven methodology that provides insight into the dependence of anomalies in CO2 fluxes on climatic conditions. The methodology was applied on fluxes from a boreal and two temperate pine forests. Annual anomalies in NEE were dominated by anomalies in GPP, which in turn were correlated with incident radiation and vapor pressure deficit (VPD). At all three sites positive anomalies in NEE (a reduced uptake or a stronger source than the daily sites specific long-term average) were observed on summer days characterized by low incident radiation, low VPD and high precipitation. Negative anomalies in NEE occurred mainly on summer days characterized by blue skies and mild temperatures. Our study clearly highlighted the need to use weather patterns rather than single climatic variables to understand anomalous CO2 fluxes. Temperature generally showed little direct effect on anomalies in NEE but became important when the mean daily air temperature exceeded 23 degrees C. On such days GPP decreased likely because VPD exceeded 2.0 kPa, inhibiting photosynthetic uptake. However, while GPP decreased, the high temperature stimulated respiration, resulting in positive anomalies in NEE. Climatic extremes in summer were more frequent and severe in the South than in the North, and had larger effects in the South because the criteria to inhibit photosynthesis are more often met.
Notes: Times Cited: 8
M Sulkava, S Luyssaert, P Rautio, I A Janssens, J Hollmen (2007)  Modeling the effects of varying data quality on trend detection in environmental monitoring   ECOLOGICAL INFORMATICS 2: 38. 167-176 JUN 1  
Abstract: Detection of changes in ecosystem characteristics is a principal tool for identifying and understanding the effects of anthropogenic activities on the condition and functioning of ecosystems. It is widely known that temporal trends can be blurred by the imprecision of the data. Research program managers are aware of the difficulties surrounding representative sampling and therefore enforce strict sampling protocols. Standardized sampling can be so effective that the initially much smaller uncertainty in the instrumental analysis becomes substantial. However, until now the effect of the quality of the instrumental analysis on the time required for trend detection has only rarely been quantified. In this paper, we present a novel technique and theoretical computations for the detection of trends in single and combined indices. The theory is clarified with examples from the International Co-operative Programme on Assessment and Monitoring of Air Pollution on Forests (ICP Forests). Moreover, the theoretical computations were made for normalized or scaled distributions and are therefore equally valid outside the field of environmental monitoring. The results show that, when sampling protocols largely reduce the variability of representative sampling, poor quality of the instrumental analysis blurs the data such that environmental monitoring or long-term ecological research programs can lose the ability to detect trends by causing up to three decades-long delay in detecting changes. We can thus conclude that high quality of the instrumental analysis is a prerequisite for a sensitive monitoring program.
Notes: Times Cited: 4
P Nymark, P M Lindholm, M V Korpela, L Lahti, S Ruosaari, S Kaski, J Hollmen, S Anttila, V L Kinnula, S Knuutila (2007)  Gene expression profiles in asbestos-exposed epithelial and mesothelial lung cell lines   BMC GENOMICS 8: 54. - MAR 1  
Abstract: Background: Asbestos has been shown to cause chromosomal damage and DNA aberrations. Exposure to asbestos causes many lung diseases e. g. asbestosis, malignant mesothelioma, and lung cancer, but the disease-related processes are still largely unknown. We exposed the human cell lines A549, Beas-2B and Met5A to crocidolite asbestos and determined time-dependent gene expression profiles by using Affymetrix arrays. The hybridization data was analyzed by using an algorithm specifically designed for clustering of short time series expression data. A canonical correlation analysis was applied to identify correlations between the cell lines, and a Gene Ontology analysis method for the identification of enriched, differentially expressed biological processes. Results: We recognized a large number of previously known as well as new potential asbestos-associated genes and biological processes, and identified chromosomal regions enriched with genes potentially contributing to common responses to asbestos in these cell lines. These include genes such as the thioredoxin domain containing gene ( TXNDC) and the potential tumor suppressor, BCL2/adenovirus E1B 19kD-interacting protein gene ( BNIP3L), GO-terms such as "positive regulation of I-kappaB kinase/NF-kappaB cascade" and "positive regulation of transcription, DNA-dependent", and chromosomal regions such as 2p22, 9p13, and 14q21. We present the complete data sets as Additional files. Conclusion: This study identifies several interesting targets for further investigation in relation to asbestos-associated diseases.
Notes: Times Cited: 12
Rashi Gupta, Salla Ruosaari, Sangita Kulathinal, Jaakko Hollmén, Petri Auvinen (2007)  Microarray image segmentation using additional dye--an experimental study.   Mol Cell Probes 21: 5-6. 321-328 Oct/Dec  
Abstract: The DNA microarray technique allows monitoring the expression levels of thousands of genes simultaneously. A single DNA microarray experiment involves a number of error-prone manual and automated processes, which influence the results and have an impact on the subsequent stages of analysis. Typical problems of arrays are pinning errors while probe printing and the corruption of spots by noise patches. These errors should be detected at the time of image analysis in order to prevent the erroneous intensities from ending up in the analysis and inference stages. RESULTS: In this paper we introduce the concept (referred to as SybrSpot) of utilizing information provided by an additional dye, SYBR green RNA II, for segmentation of gene expression microarrays. Owing to the effective binding of the SYBR green RNA II to the array probes, an image with high signal-to-noise ratio is obtained. This image is used to learn about the spot quality and to flag spots which are not reliably hybridized and corrupted by noise. Further, we compare SybrSpot with GenePix and demonstrate that SybrSpot performs better than GenePix when flagging spots with no probes or weak probes. AVAILABILITY: The code is available upon request to authors.
Notes:
Penny Nymark, Pamela M Lindholm, Mikko V Korpela, Leo Lahti, Salla Ruosaari, Samuel Kaski, Jaakko Hollmén, Sisko Anttila, Vuokko L Kinnula, Sakari Knuutila (2007)  Gene expression profiles in asbestos-exposed epithelial and mesothelial lung cell lines.   BMC Genomics 8: 03  
Abstract: BACKGROUND: Asbestos has been shown to cause chromosomal damage and DNA aberrations. Exposure to asbestos causes many lung diseases e.g. asbestosis, malignant mesothelioma, and lung cancer, but the disease-related processes are still largely unknown. We exposed the human cell lines A549, Beas-2B and Met5A to crocidolite asbestos and determined time-dependent gene expression profiles by using Affymetrix arrays. The hybridization data was analyzed by using an algorithm specifically designed for clustering of short time series expression data. A canonical correlation analysis was applied to identify correlations between the cell lines, and a Gene Ontology analysis method for the identification of enriched, differentially expressed biological processes. RESULTS: We recognized a large number of previously known as well as new potential asbestos-associated genes and biological processes, and identified chromosomal regions enriched with genes potentially contributing to common responses to asbestos in these cell lines. These include genes such as the thioredoxin domain containing gene (TXNDC) and the potential tumor suppressor, BCL2/adenovirus E1B 19kD-interacting protein gene (BNIP3L), GO-terms such as "positive regulation of I-kappaB kinase/NF-kappaB cascade" and "positive regulation of transcription, DNA-dependent", and chromosomal regions such as 2p22, 9p13, and 14q21. We present the complete data sets as Additional files. CONCLUSION: This study identifies several interesting targets for further investigation in relation to asbestos-associated diseases.
Notes:
2006
A Rasinen, J Hollmen, H Mannila (2006)  Analysis of Linux evolution using aligned source code segments   DISCOVERY SCIENCE, PROCEEDINGS 4265: 12. 209-218  
Abstract: The Linux operating system embodies a development history of 15 years and community effort of hundreds of voluntary developers. We examine the structure and evolution of the Linux kernel by considering the source code of the kernel as ordinary text without any regard to its semantics. After selecting three functionally central modules to study, we identified code segments using local alignments of source code from a reduced set of file comparisons. The further stages of the analyses take advantage of these identified alignments. We build module-specific visualizations, or descendant graphs, to visualize the overall code migration between versions and files. More detailed view can be achieved with chain graphs which show the time evolution of alignments between selected files. The methods used here may also prove useful in studying large collections of legacy code, whose original maintainers are not available.
Notes: Times Cited: 0
J Tikka, A Lendasse, J Hollmen (2006)  Analysis of fast input selection : Application in time series prediction   ARTIFICIAL NEURAL NETWORKS - ICANN 2006, PT 2 4132: 9. 161-170  
Abstract: In time series prediction, accuracy of predictions is often the primary goal. At the same time, however, it would be very desirable if we could give interpretation to the system under study. For this goal, we have devised a fast input selection algorithm to choose a parsimonious, or sparse set of input variables. The method is an algorithm in the spirit of backward selection used in conjunction with the resampling procedure. In this paper, our strategy is to select a sparse set of inputs using linear models and after that the selected inputs are also used in the nonlinear prediction based on multi-layer perceptron networks. We compare the prediction accuracy of our parsimonious non-linear models with the linear models and the regularized non-linear perceptron networks. Furthermore, we quantify the importance of the individual input variables in the non-linear models using the partial derivatives. The experiments in a problem of electricity load prediction demonstrate that the fast input selection method yields accurate and parsimonious prediction models giving insight to the original problem.
Notes: Times Cited: 5
M Sulkava, J Tikka, J Hollmen (2006)  Sparse regression for analyzing the development of foliar nutrient concentrations in coniferous trees   ECOLOGICAL MODELLING 191: 19. 118-130 JAN 27  
Abstract: Analyzing and predicting the development of foliar nutrient concentrations are important and challenging tasks in environmental monitoring. This article presents how linear sparse regression models can be used to represent the relations between different foliar nutrient concentration measurements of coniferous trees in consecutive years. In the experiments the models proved to be capable of providing relatively good and reliable predictions of the development of foliage with a considerably small number of regressors. Two methods for estimating sparse models were compared to more conventional linear regression models. Differences in the prediction accuracies between the sparse and full models were minor, but the sparse models were found to highlight important dependencies between the nutrient measurements better than the other regression models. The use of sparse models is, therefore, advantageous in the analysis and interpretation of the development of foliar nutrient concentrations. (c) 2005 Elsevier B.V. All rights reserved.
Notes: Times Cited: 4
S Myllykangas, J Himberg, T Bohling, B Nagy, J Hollmen, S Knuutila (2006)  DNA copy number amplification profiling of human neoplasms   ONCOGENE 25: 41. 7324-7332 NOV  
Abstract: DNA copy number amplications activate oncogenes and are hallmarks of nearly all advanced tumors. Amplified genes represent attractive targets for therapy, diagnostics and prognostics. To investigate DNA amplications in different neoplasms, we performed a bibliomics survey using 838 published chromosomal comparative genomic hybridization studies and collected amplification data at chromosome band resolution from more than 4500 cases. Amplification profiles were determined for 73 distinct neoplasms. Neoplasms were clustered according to the amplification profiles, and frequently amplificed chromosomal loci (amplification hot spots) were identified using computational modeling. To investigate the site specificity and mechanisms of gene amplifications, colocalization of amplification hot spots, cancer genes, fragile sites, virus integration sites and gene size cohorts were tested in a statistical framework. Amplification-based clustering demonstrated that cancers with similar etiology, cell-of-origin or topographical location have a tendency to obtain convergent amplification profiles. The identified amplification hot spots were colocalized with the known fragile sites, cancer genes and virus integration sites, but global statistical significance could not be ascertained. Large genes were significantly over-represented on the fragile sites and the reported amplifcation hot spots. These findings indicate that amplifications are selected in the cancer tissue environment according to the qualitative traits and localization of cancer genes.
Notes: Times Cited: 21
Penny Nymark, Harriet Wikman, Salla Ruosaari, Jaakko Hollmén, Esa Vanhala, Antti Karjalainen, Sisko Anttila, Sakari Knuutila (2006)  Identification of specific gene copy number changes in asbestos-related lung cancer.   Cancer Res 66: 11. 5737-5743 Jun  
Abstract: Asbestos is a well-known lung cancer-causing mineral fiber. In vitro and in vivo experiments have shown that asbestos can cause chromosomal damage and aberrations. Lung tumors, in general, have several recurrently amplified and deleted chromosomal regions. To investigate whether a distinct chromosomal aberration profile could be detected in the lung tumors of heavily asbestos-exposed patients, we analyzed the copy number profiles of 14 lung tumors from highly asbestos-exposed patients and 14 matched tumors from nonexposed patients using classic comparative genomic hybridization (CGH). A specific profile could lead to identification of the underlying genes that may act as mediators of tumor formation and progression. In addition, array CGH analyses on cDNA microarrays (13,000 clones) were carried out on 20 of the same patients. Classic CGH showed, on average, more aberrations in asbestos-exposed than in nonexposed patients, and an altered region in chromosome 2 seemed to occur more frequently in the asbestos-exposed patients. Array CGH revealed aberrations in 18 regions that were significantly associated with either of the two groups. The most significant regions were 2p21-p16.3, 5q35.3, 9q33.3-q34.11, 9q34.13-q34.3, 11p15.5, 14q11.2, and 19p13.1-p13.3 (P < 0.005). Furthermore, 11 fragile sites coincided with the 18 asbestos-associated regions (P = 0.08), which may imply preferentially caused DNA damage at these sites. Our findings are the first evidence, indicating that asbestos exposure may produce a specific DNA damage profile.
Notes:
S Myllykangas, J Himberg, T Böhling, B Nagy, J Hollmén, S Knuutila (2006)  DNA copy number amplification profiling of human neoplasms.   Oncogene 25: 55. 7324-7332 Nov  
Abstract: DNA copy number amplifications activate oncogenes and are hallmarks of nearly all advanced tumors. Amplified genes represent attractive targets for therapy, diagnostics and prognostics. To investigate DNA amplifications in different neoplasms, we performed a bibliomics survey using 838 published chromosomal comparative genomic hybridization studies and collected amplification data at chromosome band resolution from more than 4500 cases. Amplification profiles were determined for 73 distinct neoplasms. Neoplasms were clustered according to the amplification profiles, and frequently amplified chromosomal loci (amplification hot spots) were identified using computational modeling. To investigate the site specificity and mechanisms of gene amplifications, colocalization of amplification hot spots, cancer genes, fragile sites, virus integration sites and gene size cohorts were tested in a statistical framework. Amplification-based clustering demonstrated that cancers with similar etiology, cell-of-origin or topographical location have a tendency to obtain convergent amplification profiles. The identified amplification hot spots were colocalized with the known fragile sites, cancer genes and virus integration sites, but global statistical significance could not be ascertained. Large genes were significantly overrepresented on the fragile sites and the reported amplification hot spots. These findings indicate that amplifications are selected in the cancer tissue environment according to the qualitative traits and localization of cancer genes.
Notes:
2005
J Tikka, J Hollmen, A Lendasse (2005)  Input selection for long-term prediction of time series   COMPUTATIONAL INTELLIGENCE AND BIOINSPIRED SYSTEMS, PROCEEDINGS 3512: 10. 1002-1009  
Abstract: Prediction of time series is an important problem in many areas of science and engineering. Extending the horizon of predictions further to the future is the challenging and difficult task of long-term prediction. In this paper, we investigate the problem of selecting noncontiguous input variables for an autoregressive prediction model in order to improve the prediction ability. We present an algorithm in the spirit of backward selection which removes variables sequentially from the prediction models based on the significance of the individual regressors. We successfully test the algorithm with a non-linear system by selecting inputs with a linear model and finally train a non-linear predictor with the selected variables on Santa Fe laser data set.
Notes: Times Cited: 3
M Sulkava, P Rautio, J Hollmen (2005)  Combining measurement quality into monitoring trends in foliar nutrient concentrations   ARTIFICIAL NEURAL NETWORKS : FORMAL MODELS AND THEIR APPLICATIONS - ICANN 2005, PT 2, PROCEEDINGS 3697: 6. 761-767  
Abstract: Quality of measurements is an important factor affecting the reliability of analyses in environmental sciences. In this paper we combine foliar measurement data from Finland and results of multiple measurement quality tests from different sources in order to study the effect of measurement quality on the reliability of foliar nutrient analysis. In particular, we study the use of weighted linear regression models in detecting trends in foliar time series data and show that the development of measurement quality has a clear effect on the significance of results.
Notes: Times Cited: 1
Sebastiaan Luyssaert, Mika Sulkava, Hannu Raitio, Jaakko Hollmén (2005)  Are N and S deposition altering the mineral composition of Norway spruce and Scots pine needles in Finland?   Environ Pollut 138: 1. 5-17 Nov  
Abstract: Data from a large-scale foliar survey were used to calculate the extent to which N and S deposition determined the mineral composition of Scots pine and Norway spruce needles in Finland. Foliar data were available from 367 needle samples collected on 36 plots sampled almost annually between 1987 and 2000. A literature study of controlled experiments revealed that acidifying deposition mediates increasing N and S concentrations, and decreasing Mg:N and Ca:Al ratios in the needles. When this fingerprint for N and S elevated deposition on tree foliage was observed simultaneously with increased N and S inputs, it was considered sufficient evidence for assuming that acidifying deposition had altered the mineral composition of tree needles on that plot in the given year. Evidence for deposition-induced changes in the mineral composition of tree foliage was calculated on the basis of a simple frequency model. In the late eighties the evidence was found on 43% of the Norway spruce and 27% of Scots pine plots. The proportion of changed needle mineral composition decreased to below 8% for both species in the late nineties.
Notes:
Peddinti V Gopalacharyulu, Erno Lindfors, Catherine Bounsaythip, Teemu Kivioja, Laxman Yetukuri, Jaakko Hollmén, Matej Oresic (2005)  Data integration and visualization system for enabling conceptual biology.   Bioinformatics 21 Suppl 1: i177-i185 Jun  
Abstract: MOTIVATION: Integration of heterogeneous data in life sciences is a growing and recognized challenge. The problem is not only to enable the study of such data within the context of a biological question but also more fundamentally, how to represent the available knowledge and make it accessible for mining. RESULTS: Our integration approach is based on the premise that relationships between biological entities can be represented as a complex network. The context dependency is achieved by a judicious use of distance measures on these networks. The biological entities and the distances between them are mapped for the purpose of visualization into the lower dimensional space using the Sammon's mapping. The system implementation is based on a multi-tier architecture using a native XML database and a software tool for querying and visualizing complex biological networks. The functionality of our system is demonstrated with two examples: (1) A multiple pathway retrieval, in which, given a pathway name, the system finds all the relationships related to the query by checking available metabolic pathway, transcriptional, signaling, protein-protein interaction and ontology annotation resources and (2) A protein neighborhood search, in which given a protein name, the system finds all its connected entities within a specified depth. These two examples show that our system is able to conceptually traverse different databases to produce testable hypotheses and lead towards answers to complex biological questions.
Notes:
E Kettunen, A G Nicholson, B Nagy, H Wikman, J K Seppänen, T Stjernvall, T Ollikainen, V Kinnula, S Nordling, J Hollmén, S Anttila, S Knuutila (2005)  L1CAM, INP10, P-cadherin, tPA and ITGB4 over-expression in malignant pleural mesotheliomas revealed by combined use of cDNA and tissue microarray.   Carcinogenesis 26: 1. 17-25 Jan  
Abstract: Malignant pleural mesothelioma (MM) is a rare tumour with high mortality, which can exhibit various morphologies classified as epithelioid, biphasic and sarcomatoid subtypes. To investigate the molecular changes in these tumours, we studied gene expression patterns by combined use of cDNA arrays and tumour tissue microarrays (TMA). Deregulation of the expression of 588 cancer-related genes was screened in 16 MM comprising all three subtypes and compared with references, i.e. normal mesothelial cell lines and pleural mesothelium. Array data were analysed using three statistical methods; principal component analysis (PCA), permutation test and receiver operating characteristic (ROC) curves. Eleven genes were verified by real-time RT-PCR. Genes encoding two adhesion molecules [COL1A2 and integrin beta4 (ITGB4)] and a chemokine (INP10) were up-regulated in MM compared with both the cell lines and pleural mesothelium. There was a type-specific up-regulation of semaphorin E, ITGB4 and P-cadherin in epithelioid MM, matrix metalloproteinase 9 (MMP9) and tissue-type plasminogen activator (tPA) in sarcomatoid MM and neural cell adhesion molecule L1 (L1CAM) and INP10 in biphasic MM. Immunohistochemistry on TMA containing 47 MM (26 epithelioid, 15 sarcomatoid and six biphasic) was performed for five proteins, ITGB4, P-cadherin, tPA, INP10 and L1CAM. INP10 expression was increased in MM in general compared with normal mesothelium, while increased expression of P-cadherin, L1CAM and ITGB4 was more specific in MMs exhibiting an epithelioid growth pattern. The over-expression of tPA was more frequent in epithelioid MM despite higher mRNA levels in sarcomatoid and biphasic MM. We conclude that several proteins, associated with cell adhesion either directly (ITGB4, L1CAM, P-cadherin) or as a regulatory factor (INP10), are differentially expressed in MM. In particular, INP10, ITGB4 and COL1A2 were up-regulated in MM compared with both reference sample types, suggesting a relationship with development of these tumours.
Notes:
2004
A Bykowski, J K Seppanen, J Hollmen (2004)  Model-independent bounding of the supports of Boolean formulae in binary data   DATABASE SUPPORT FOR DATA MINING APPLICATIONS : DISCOVERING KNOWLEDGE WITH INDUCTIVE QUERIES 2682: 17. 234-249  
Abstract: Data mining algorithms such as the Apriori method for finding frequent sets in sparse binary data can be used for efficient computation of a large number of summaries from huge data sets. The collection of frequent sets gives a collection of marginal frequencies about the underlying data set. Sometimes, we would like to use a collection of such marginal frequencies instead of the entire data set (e.g. when the original data is inaccessible for confidentiality reasons) to compute other interesting summaries. Using combinatorial arguments, we may obtain tight upper and lower bounds on the values of inferred summaries. In this paper, we consider a class of summaries wider than frequent sets, namely that of frequencies of arbitrary Boolean formulae. Given frequencies of a number of any different Boolean formulae, we consider the problem of finding tight bounds on the frequency of another arbitrary formula. We give a general formulation of the problem of bounding formula frequencies given some background information, and show how the bounds can be obtained by solving a linear programming problem. We illustrate the accuracy of the bounds by giving empirical results on real data sets.
Notes: Times Cited: 3
Eeva Kettunen, Sisko Anttila, Jouni K Seppänen, Antti Karjalainen, Henrik Edgren, Irmeli Lindström, Reijo Salovaara, Anna-Maria Nissén, Jarmo Salo, Karin Mattson, Jaakko Hollmén, Sakari Knuutila, Harriet Wikman (2004)  Differentially expressed genes in nonsmall cell lung cancer: expression profiling of cancer-related genes in squamous cell lung cancer.   Cancer Genet Cytogenet 149: 2. 98-106 Mar  
Abstract: The expression patterns of cancer-related genes in 13 cases of squamous cell lung cancer (SCC) were characterized and compared with those in normal lung tissue and 13 adenocarcinomas (AC), the other major type of nonsmall cell lung cancer (NSCLC). cDNA array was used to screen the gene expression levels and the array results were verified using a real-time reverse-transcriptase-polymerase chain reaction (RT-PCR). Thirty-nine percent of the 25 most upregulated and the 25 most downregulated genes were common to SCC and AC. Of these genes, DSP, HMGA1 (alias HMGIY), TIMP1, MIF, CCNB1, TN, MMP11, and MMP12 were upregulated and COPEB (alias CPBP), TYROBP, BENE, BMPR2, SOCS3, TIMP3, CAV1, and CAV2 were downregulated. The expression levels of several genes from distinct protein families (cytokeratins and hemidesmosomal proteins) were markedly increased in SCC compared with AC and normal lung. In addition, several genes, overexpressed in SCC, such as HMGA1, CDK4, IGFBP3, MMP9, MMP11, MMP12, and MMP14, fell into distinct chromosomal loci, which we have detected as gained regions on the basis of comparative genomic hybridization data. Our study revealed new candidate genes involved in NSCLC.
Notes:
Harriet Wikman, Jouni K Seppänen, Virinder K Sarhadi, Eeva Kettunen, Kaisa Salmenkivi, Eeva Kuosma, Katri Vainio-Siukola, Balint Nagy, Antti Karjalainen, Thanos Sioris, Jarmo Salo, Jaakko Hollmén, Sakari Knuutila, Sisko Anttila (2004)  Caveolins as tumour markers in lung cancer detected by combined use of cDNA and tissue microarrays.   J Pathol 203: 1. 584-593 May  
Abstract: To identify new potential diagnostic markers for lung cancer, the expression profiles of 37 lung tumours were analysed using cDNA arrays. Seven samples were from small-cell lung cancer (SCLC), two from large-cell neuroendocrine tumours (LCNEC), and 28 from other non-small-cell lung cancers (mainly squamous cell cancer and adenocarcinoma). Principal component analysis and the permutation test were used to detect differences in the gene expression profiles and a set of genes was found that distinguished high-grade neuroendocrine carcinomas (SCLC and LCNEC) from other lung cancers. In addition, several genes, such as caveolin-1 (CAV1) and caveolin-2 (CAV2), were constantly deregulated in all types of tumour sample, compared with normal tissue. The expression of these two genes was investigated further at the protein level on a tissue microarray containing tumours from 161 patients and normal tissues. Immunostaining for CAV1 was negative in 48% of tumours, whereas 28% of the tumours did not express CAV2. Lack of CAV1 protein expression was not caused by methylation or mutation. In stage I adenocarcinomas, CAV2 protein expression correlated with shorter survival. In conclusion, the present study was able to identify genes that have not previously been implicated in lung cancer by the combined use of two different array techniques. Some of these genes may provide novel diagnostic markers for lung cancer.
Notes:
Sebastiaan Luyssaert, Mika Sulkava, Hannu Raitio, Jaakko Hollmén (2004)  Evaluation of forest nutrition based on large-scale foliar surveys: are nutrition profiles the way of the future?   J Environ Monit 6: 2. 160-167 Feb  
Abstract: This paper introduces the use of nutrition profiles as a first step in the development of a concept that is suitable for evaluating forest nutrition on the basis of large-scale foliar surveys. Nutrition profiles of a tree or stand were defined as the nutrient status, which accounts for all element concentrations, contents and interactions between two or more elements. Therefore a nutrition profile overcomes the shortcomings associated with the commonly used concepts for evaluating forest nutrition. Nutrition profiles can be calculated by means of a neural network, i.e. a self-organizing map, and an agglomerative clustering algorithm with pruning. As an example, nutrition profiles were calculated to describe the temporal variation in the mineral composition of Scots pine and Norway spruce needles in Finland between 1987 and 2000. The temporal trends in the frequency distribution of the nutrition profiles of Scots pine indicated that, between 1987 and 2000, the N, S, P, K, Ca, Mg and Al decreased, whereas the needle mass (NM) increased or remained unchanged. As there were no temporal trends in the frequency distribution of the nutrition profiles of Norway spruce, the mineral composition of the needles of Norway spruce needles subsequently did not change. Interpretation of the (lack of) temporal trends was outside the scope of this example. However, nutrition profiles prove to be a new and better concept for the evaluation of the mineral composition of large-scale surveys only when a biological interpretation of the nutrition profiles can be provided.
Notes:
2003
2002
S Ruosaari, J Hollmen (2002)  Image analysis for detecting faulty spots from microarray images   DISCOVERY SCIENCE, PROCEEDINGS 2534: 13. 259-266  
Abstract: Microarrays allow the monitoring of thousands of genes simultaneously. Before a measure of gene activity of an organism is obtained, however, many stages in the error-prone manual and automated process have to be performed. Without quality control, the resulting measures may, instead of being estimates of gene activity, be due to noise or systematic variation. We address the problem of detecting spots of low quality from the microarray images to prevent them to enter the subsequent analysis. We extract features describing spatial characteristics of the spots on the microarray image and train a classifier using a set of labeled spots. We assess the results for classification of individual spots using ROC analysis and for a compound classification using a non-symmetric cost structure for misclassifications.
Notes: Times Cited: 3
T Niini, K Vettenranta, J Hollmén, M L Larramendy, Y Aalto, H Wikman, B Nagy, J K Seppänen, A Ferrer Salvador, H Mannila, U M Saarinen-Pihkala, S Knuutila (2002)  Expression of myeloid-specific genes in childhood acute lymphoblastic leukemia - a cDNA array study.   Leukemia 16: 11. 2213-2221 Nov  
Abstract: Several specific cytogenetic changes are known to be associated with childhood acute lymphoblastic leukemia (ALL), and many of them are important prognostic factors for the disease. Little is known, however, about the changes in gene expression in ALL. Recently, the development of cDNA array technology has enabled the study of expression of hundreds to thousands of genes in a single experiment. We used the cDNA array method to study the gene expression profiles of 17 children with precursor-B ALL. Normal B cells from adenoids were used as reference material. We discuss the 25 genes that were most over-expressed compared to the reference. These included four genes that are normally expressed only in the myeloid lineages of the hematopoietic cells: RNASE2, GCSFR, PRTN3 and CLC. We also detected over-expression of S100A12, expressed in nerve cells but also in myeloid cells. In addition to the myeloid-specific genes, other over-expressed genes included AML1, LCP2 and FGF6. In conclusion, our study revealed novel information about gene expression in childhood ALL. The data obtained may contribute to further studies of the pathogenesis and prognosis of childhood ALL.
Notes:
Harriet Wikman, Eeva Kettunen, Jouni K Seppänen, Antti Karjalainen, Jaakko Hollmén, Sisko Anttila, Sakari Knuutila (2002)  Identification of differentially expressed genes in pulmonary adenocarcinoma by using cDNA array.   Oncogene 21: 37. 5804-5813 Aug  
Abstract: No clear patterns in molecular changes underlying the malignant processes in lung cancer of different histological types have been found so far. To identify critical genes in lung cancer progression we compared the expression profile of cancer related genes in 14 pulmonary adenocarcinoma patients with normal lung tissue by using the cDNA array technique. Principal component analyses (PCA) and permutation test were used to detect the differentially expressed genes. The expression profiles of 10 genes were confirmed by semi-quantitative real-time RT-PCR. In tumour samples, as compared to normal lung tissue, the up-regulated genes included such known tumour markers as CCNB1, PLK, tenascin, KRT8, KRT19 and TOP2A. The down-regulated genes included caveolin 1 and 2, and TIMP3. We also describe, for the first time, down-regulation of the interesting SOCS2 and 3, DOC2 and gravin. We show that silencing of SOCS2 is not caused by methylation of exon 1 of the gene. In conclusion, by using the cDNA array technique we were able to reveal marked differences in the gene expression level between normal lung and tumour tissue and find possible new tumour markers for pulmonary adenocarcinoma.
Notes:
Ying Zhu, Jaakko Hollmén, Riikka Räty, Yan Aalto, Balint Nagy, Erkki Elonen, Juha Kere, Heikki Mannila, Kaarle Franssila, Sakari Knuutila (2002)  Investigatory and analytical approaches to differential gene expression profiling in mantle cell lymphoma.   Br J Haematol 119: 4. 905-915 Dec  
Abstract: Mantle cell lymphoma (MCL) is a non-Hodgkin's lymphoma of B-cell lineage. The blastoid variant of MCL, characterized by high mitotic rate, is clinically more aggressive than common MCL. We used the cDNA array technology to examine the gene expression profiles of both blastoid variant and common MCL. The data was analysed by regression analysis, principal component analysis and the naive Bayes' classifier. Eight genes were identified as differentially deregulated between the two groups. Oncogenes CMYC, BCL2 and PIM1 were upregulated more frequently in the blastoid variant than in common MCL. This implied that the gp130-mediated signal transducer and activator of transcription 3 (STAT3) signalling pathway was involved in the blastoid variant transformation of MCL. Other differentially deregulated genes were TOP1, CD23, CD45, CD70 and NFATC. By using the eight differentially deregulated genes, we created a classifier to distinguish the blastoid variant from common MCL with high accuracy. We also identified 18 genes that were deregulated in both groups. Among them, BCL1, CALLA/CD10 and GRN were suggested to be oncogenes. The products of RGS1, RGS2, ANX2 and CD44H were suggested to promote tumour metastasis. CD66D was suggested to be a tumour suppressor gene.
Notes:
1999
E Alhoniemi, J Hollmen, O Simula, J Vesanto (1999)  Process monitoring and modeling using the self-organizing map   INTEGRATED COMPUTER-AIDED ENGINEERING 6: 31. 3-14  
Abstract: The Self-Organizing Map (SOM) is a powerful neural network method for analysis and visualization of high-dimensional data. It maps nonlinear statistical dependencies between high-dimensional measurement data into simple geometric relationships on a usually two-dimensional grid. The mapping roughly preserves the most important topological and metric relationships of the original data elements and, thus, inherently clusters the data. The need for visualization and clustering occurs, for instance, in the analysis of various engineering problems. In this paper, the SOM has been applied in monitoring and modeling of complex industrial processes. Case studies, including pulp process, steel production, and paper industry are described.
Notes: Times Cited: 46
Powered by publicationslist.org.