hosted by
publicationslist.org
    

Luciano Milanesi

Dr. Luciano Milanesi
CNR-ITB, Via Fratelli Cervi, 93
20090 Segrate (Milan), Italy
Cel. +39-3351050042 Fax. +390226422770
e-mail: luciano.milanesi@itb.cnr.it
Skype: luciano.milanesi
Professor Luciano Milanesi received the BS degree in Atomic Physics in 1981. In 1986 he received the Ph.D. degree in Health and Hospital Physics, participating to the development of the project "Cyclotron facility for Positron Emission Tomography application in Nuclear Medicine".
Since 1987 he has been Staff scientist at the Italian National Research Council (CNR) and since 2004 he has been head of the Bioinformatics Division the Bioinformatics for the "Centre for Bio-molecular Interdisciplinary Studies and Industrial applications".
In 2007 he became Director of the CNR Interdepartmental Bioinformatics Research Network among 12 CNR research Institutes in Life science, Medicine and ICT.
He has been principle investigator for several European Projects: 1996 - 1999 European Commission "TRADAT" TRAnscription Databases and Analysis Tools, "O2I" Online Research Information Environment for the Life Sciences; 2002-04 European Commission ORIEL "an Online Research Information Environment for the Life Sciences"; 2003-2004 NATO Science Programme "computer modelling, 5'-UTR, co-expressed genes, macrophages, Gene networks, Epstein-Barr Virus, cis elements recognition, B-DNA conformation"; 2004-06 European Commission INTAS "Modelling and analysis of mammalian cell-cycle regulatory networks in normal and pathological states by bio- and chemoinformatics". 2003-05 MIUR FIRB Post-Genomic "Bioinformatics for Genome and Proteome"; 2003-05 MIUR FIRB "GRID-IT: Enabling Platforms for high-performance computational GRIDS oriented to scalable virtual organisations."
He has been the coordinator of the European BIOINFOGRID project: “Bioinformatics Grid Applications for life science” and the MIR-FIRB “Laboratory of Bioinformatics Technologies” project. He is the principal investigator for the “Enabling Grids for E-Science EGEE II and EGEE III”, pan-European Biobanking and Biomolecular Resources Research Infrastructure (BBMRI) Projects.
Since 2002 Professor Milanesi has taught Bioinformatics for the course of Medical Biotechnology at the University of Milan, and since 2003 he has taught fundamentals in informatics at specialisation degrees for the University of Milan. He is a cofounder of BITS Bioinformatics Italian Association and cofounder of SYSBIOHEALTH System Biology for Health. He is a member of the Board of the International Neuroinformatics INCF Secretariat Karolinska Institutet, Sweden.
Professor Milanesi has published contributions in several books and scientific publications in Bioinformatics. He is the author of more than 260 publications in the field of Bioinformatics, Systems Biology and Medical Informatics.

Journal articles

2010
Giuseppe Vezzoli, Annalisa Terranegra, Teresa Arcidiacono, Giovanni Gambaro, Luciano Milanesi, Ettore Mosca, Laura Soldati (2010)  Calcium kidney stones are associated with a haplotype of the Calcium-sensing receptor gene regulatory region.   Nephrol Dial Transplant Jan  
Abstract: BACKGROUND: Calcium-sensing receptor gene (CaSR) is a candidate to explain susceptibility to calcium kidney stones. Thus, we studied CaSR gene single-nucleotide polymorphisms (SNPs) and haplotypes associated with stones. METHODS: Four hundred and sixty-three calcium stone formers and 213 healthy controls were genotyped for 21 SNPs mapping the whole CaSR gene. CaSR gene structure was studied. SNPs and haplotypes were analysed for association with stones. RESULTS: Three haplotype blocks were identified in the CaSR gene. The first block was characterized by six SNPs and included gene promoters. The rs7652589 and rs1501899 SNPs and the CATTCA haplotype of the first block were significantly more frequent in normocitraturic calcium kidney stone formers than controls. The risk of stones was increased in normocitraturic homozygous patients and heterozygotes for the CATTCA haplotype. The rate of stones was higher in stone formers with the CATTCA haplotype. In a three-generation family, calcium stones were associated with the CATTCA haplotype. The bioinformatic analysis identified a new site for the octamer-binding transcription factor 1 in the presence of the variant alleles at the rs7652589 and rs1501899 SNPs. This transcription factor may downregulate the transcription of vitamin D-dependent genes and the CaSR expression. Conclusion. SNPs and CATTCA haplotype of the CaSR gene first block is associated with kidney stones in normocitraturic patients.
Notes:
2009
Roberta Alfieri, Matteo Barberis, Ferdinando Chiaradonna, Daniela Gaglio, Luciano Milanesi, Marco Vanoni, Edda Klipp, Lilia Alberghina (2009)  Towards a systems biology approach to mammalian cell cycle: modeling the entrance into S phase of quiescent fibroblasts after serum stimulation.   BMC Bioinformatics 10 Suppl 12: 10  
Abstract: BACKGROUND : The cell cycle is a complex process that allows eukaryotic cells to replicate chromosomal DNA and partition it into two daughter cells. A relevant regulatory step is in the G0/G1 phase, a point called the restriction (R) point where intracellular and extracellular signals are monitored and integrated.Subcellular localization of cell cycle proteins is increasingly recognized as a major factor that regulates cell cycle transitions. Nevertheless, current mathematical models of the G1/S networks of mammalian cells do not consider this aspect. Hence, there is a need for a computational model that incorporates this regulatory aspect that has a relevant role in cancer, since altered localization of key cell cycle players, notably of inhibitors of cyclin-dependent kinases, has been reported to occur in neoplastic cells and to be linked to cancer aggressiveness. RESULTS : The network of the model components involved in the G1 to S transition process was identified through a literature and web-based data mining and the corresponding wiring diagram of the G1 to S transition drawn with Cell Designer notation. The model has been implemented in Mathematica using Ordinary Differential Equations. Time-courses of level and of sub-cellular localization of key cell cycle players in mouse fibroblasts re-entering the cell cycle after serum starvation/re-feeding have been used to constrain network design and parameter determination. The model allows to recapitulate events from growth factor stimulation to the onset of S phase. The R point estimated by simulation is consistent with the R point experimentally determined. CONCLUSION : The major element of novelty of our model of the G1 to S transition is the explicit modeling of cytoplasmic/nuclear shuttling of cyclins, cyclin-dependent kinases, their inhibitor and complexes. Sensitivity analysis of the network performance newly reveals that the biological effect brought about by Cki overexpression is strictly dependent on whether the Cki is promoting nuclear translocation of cyclin/Cdk containing complexes.
Notes:
Luciano Milanesi, Paolo Romano, Gastone Castellani, Daniel Remondini, Petro Liò (2009)  Trends in modeling Biomedical Complex Systems.   BMC Bioinformatics 10 Suppl 12: 10  
Abstract: In this paper we provide an introduction to the techniques for multi-scale complex biological systems, from the single bio-molecule to the cell, combining theoretical modeling, experiments, informatics tools and technologies suitable for biological and biomedical research, which are becoming increasingly multidisciplinary, multidimensional and information-driven. The most important concepts on mathematical modeling methodologies and statistical inference, bioinformatics and standards tools to investigate complex biomedical systems are discussed and the prominent literature useful to both the practitioner and the theoretician are presented.
Notes:
Ivan Merelli, Andrea Caprera, Alessandra Stella, Marcello Del Corvo, Luciano Milanesi, Barbara Lazzari (2009)  The Human EST Ontology Explorer: a tissue-oriented visualization system for ontologies distribution in human EST collections.   BMC Bioinformatics 10 Suppl 12: 10  
Abstract: BACKGROUND : The NCBI dbEST currently contains more than eight million human Expressed Sequenced Tags (ESTs). This wide collection represents an important source of information for gene expression studies, provided it can be inspected according to biologically relevant criteria. EST data can be browsed using different dedicated web resources, which allow to investigate library specific gene expression levels and to make comparisons among libraries, highlighting significant differences in gene expression. Nonetheless, no tool is available to examine distributions of quantitative EST collections in Gene Ontology (GO) categories, nor to retrieve information concerning library-dependent EST involvement in metabolic pathways. In this work we present the Human EST Ontology Explorer (HEOE) http://www.itb.cnr.it/ptp/human_est_explorer, a web facility for comparison of expression levels among libraries from several healthy and diseased tissues. RESULTS : The HEOE provides library-dependent statistics on the distribution of sequences in the GO Direct Acyclic Graph (DAG) that can be browsed at each GO hierarchical level. The tool is based on large-scale BLAST annotation of EST sequences. Due to the huge number of input sequences, this BLAST analysis was performed with the aid of grid computing technology, which is particularly suitable to address data parallel task. Relying on the achieved annotation, library-specific distributions of ESTs in the GO Graph were inferred. A pathway-based search interface was also implemented, for a quick evaluation of the representation of libraries in metabolic pathways. EST processing steps were integrated in a semi-automatic procedure that relies on Perl scripts and stores results in a MySQL database. A PHP-based web interface offers the possibility to simultaneously visualize, retrieve and compare data from the different libraries. Statistically significant differences in GO categories among user selected libraries can also be computed. CONCLUSION : The HEOE provides an alternative and complementary way to inspect EST expression levels with respect to approaches currently offered by other resources. Furthermore, BLAST computation on the whole human EST dataset was a suitable test of grid scalability in the context of large-scale bioinformatics analysis. The HEOE currently comprises sequence analysis from 70 non-normalized libraries, representing a comprehensive overview on healthy and unhealthy tissues. As the analysis procedure can be easily applied to other libraries, the number of represented tissues is intended to increase.
Notes:
Federica Chiappori, Pasqualina D'Ursi, Ivan Merelli, Luciano Milanesi, Ermanna Rovida (2009)  In silico saturation mutagenesis and docking screening for the analysis of protein-ligand interaction: the Endothelial Protein C Receptor case study.   BMC Bioinformatics 10 Suppl 12: 10  
Abstract: BACKGROUND : The design of mutants in protein functional regions, such as the ligand binding sites, is a powerful approach to recognize the determinants of specific protein activities in cellular pathways. For an exhaustive analysis of selected positions of protein structure large scale mutagenesis techniques are often employed, with laborious and time consuming experimental set-up. 'In silico' mutagenesis and screening simulation represents a valid alternative to laboratory methods to drive the 'in vivo' testing toward more focused objectives. RESULTS : We present here a high performance computational procedure for large-scale mutant modelling and subsequent evaluation of the effect on ligand binding affinity. The mutagenesis was performed with a 'saturation' approach, where all 20 natural amino acids were tested in positions involved in ligand binding sites. Each modelled mutant was subjected to molecular docking simulation and stability evaluation. The simulated protein-ligand complexes were screened for their impairment of binding ability based on change of calculated Ki compared to the wild-type.An example of application to the Endothelial Protein C Receptor residues involved in lipid binding is reported. CONCLUSION : The computational pipeline presented in this work is a useful tool for the design of structurally stable mutants with altered affinity for ligand binding, considerably reducing the number of mutants to be experimentally tested. The saturation mutagenesis procedure does not require previous knowledge of functional role of the residues involved and allows extensive exploration of all possible substitutions and their pairwise combinations. Mutants are screened by docking simulation and stability evaluation followed by a rationally driven selection of those presenting the required characteristics. The method can be employed in molecular recognition studies and as a preliminary approach to select models for experimental testing.
Notes:
Ettore Mosca, Gloria Bertoli, Eleonora Piscitelli, Laura Vilardo, Rolland A Reinbold, Ileana Zucchi, Luciano Milanesi (2009)  Identification of functionally related genes using data mining and data integration: a breast cancer case study.   BMC Bioinformatics 10 Suppl 12: 10  
Abstract: BACKGROUND : The identification of the organisation and dynamics of molecular pathways is crucial for the understanding of cell function. In order to reconstruct the molecular pathways in which a gene of interest is involved in regulating a cell, it is important to identify the set of genes to which it interacts with to determine cell function. In this context, the mining and the integration of a large amount of publicly available data, regarding the transcriptome and the proteome states of a cell, are a useful resource to complement biological research. RESULTS : We describe an approach for the identification of genes that interact with each other to regulate cell function. The strategy relies on the analysis of gene expression profile similarity, considering large datasets of expression data. During the similarity evaluation, the methodology determines the most significant subset of samples in which the evaluated genes are highly correlated. Hence, the strategy enables the exclusion of samples that are not relevant for each gene pair analysed. This feature is important when considering a large set of samples characterised by heterogeneous experimental conditions where different pools of biological processes can be active across the samples. The putative partners of the studied gene are then further characterised, analysing the distribution of the Gene Ontology terms and integrating the protein-protein interaction (PPI) data. The strategy was applied for the analysis of the functional relationships of a gene of known function, Pyruvate Kinase, and for the prediction of functional partners of the human transcription factor TBX3. In both cases the analysis was done on a dataset composed by breast primary tumour expression data derived from the literature. Integration and analysis of PPI data confirmed the prediction of the methodology, since the genes identified to be functionally related were associated to proteins close in the PPI network. Two genes among the predicted putative partners of TBX3 (GLI3 and GATA3) were confirmed by in vivo binding assays (crosslinking immunoprecipitation, X-ChIP) in which the putative DNA enhancer sequence sites of GATA3 and GLI3 were found to be bound by the Tbx3 protein. CONCLUSION : The presented strategy is demonstrated to be an effective approach to identify genes that establish functional relationships. The methodology identifies and characterises genes with a similar expression profile, through data mining and integrating data from publicly available resources, to contribute to a better understanding of gene regulation and cell function. The prediction of the TBX3 target genes GLI3 and GATA3 was experimentally confirmed.
Notes:
Roberto Barbera, Giacinto Donvito, Alberto Falzone, Giuseppe La Rocca, Luciano Milanesi, Giorgio Pietro Maggi, Saverio Vicario (2009)  The GENIUS Grid Portal and robot certificates: a new tool for e-Science.   BMC Bioinformatics 10 Suppl 6: 06  
Abstract: BACKGROUND : Grid technology is the computing model which allows users to share a wide pletora of distributed computational resources regardless of their geographical location. Up to now, the high security policy requested in order to access distributed computing resources has been a rather big limiting factor when trying to broaden the usage of Grids into a wide community of users. Grid security is indeed based on the Public Key Infrastructure (PKI) of X.509 certificates and the procedure to get and manage those certificates is unfortunately not straightforward. A first step to make Grids more appealing for new users has recently been achieved with the adoption of robot certificates. METHODS: Robot certificates have recently been introduced to perform automated tasks on Grids on behalf of users. They are extremely useful for instance to automate grid service monitoring, data processing production, distributed data collection systems. Basically these certificates can be used to identify a person responsible for an unattended service or process acting as client and/or server. Robot certificates can be installed on a smart card and used behind a portal by everyone interested in running the related applications in a Grid environment using a user-friendly graphic interface. In this work, the GENIUS Grid Portal, powered by EnginFrame, has been extended in order to support the new authentication based on the adoption of these robot certificates. RESULTS: The work carried out and reported in this manuscript is particularly relevant for all users who are not familiar with personal digital certificates and the technical aspects of the Grid Security Infrastructure (GSI). The valuable benefits introduced by robot certificates in e-Science can so be extended to users belonging to several scientific domains, providing an asset in raising Grid awareness to a wide number of potential users. CONCLUSION: The adoption of Grid portals extended with robot certificates, can really contribute to creating transparent access to computational resources of Grid Infrastructures, enhancing the spread of this new paradigm in researchers' working life to address new global scientific challenges. The evaluated solution can of course be extended to other portals, applications and scientific communities.
Notes:
Barbara Lazzari, Andrea Caprera, Alessandro Cestaro, Ivan Merelli, Marcello Del Corvo, Paolo Fontana, Luciano Milanesi, Riccardo Velasco, Alessandra Stella (2009)  Ontology-oriented retrieval of putative microRNAs in Vitis vinifera via GrapeMiRNA: a web database of de novo predicted grape microRNAs.   BMC Plant Biol 9: 06  
Abstract: BACKGROUND: Two complete genome sequences are available for Vitis vinifera Pinot noir. Based on the sequence and gene predictions produced by the IASMA, we performed an in silico detection of putative microRNA genes and of their targets, and collected the most reliable microRNA predictions in a web database. The application is available at http://www.itb.cnr.it/ptp/grapemirna/. DESCRIPTION: The program FindMiRNA was used to detect putative microRNA genes in the grape genome. A very high number of predictions was retrieved, calling for validation. Nine parameters were calculated and, based on the grape microRNAs dataset available at miRBase, thresholds were defined and applied to FindMiRNA predictions having targets in gene exons. In the resulting subset, predictions were ranked according to precursor positions and sequence similarity, and to target identity. To further validate FindMiRNA predictions, comparisons to the Arabidopsis genome, to the grape Genoscope genome, and to the grape EST collection were performed. Results were stored in a MySQL database and a web interface was prepared to query the database and retrieve predictions of interest. CONCLUSION: The GrapeMiRNA database encompasses 5,778 microRNA predictions spanning the whole grape genome. Predictions are integrated with information that can be of use in selection procedures. Tools added in the web interface also allow to inspect predictions according to gene ontology classes and metabolic pathways of targets. The GrapeMiRNA database can be of help in selecting candidate microRNA genes to be validated.
Notes:
Pasqualina D'Ursi, Federica Chiappori, Ivan Merelli, Paolo Cozzi, Ermanna Rovida, Luciano Milanesi (2009)  Virtual screening pipeline and ligand modelling for H5N1 neuraminidase.   Biochem Biophys Res Commun 383: 4. 445-449 Jun  
Abstract: The H5N1 virus neuraminidase structure was solved in two different conformations depending on the inhibitor concentration. In the absence of oseltamivir or at a low concentration, the neuraminidase structure assumes an open form that closes at a high oseltamivir concentration due to the shift of the so-called 150-loop near the active site. Although the close conformation is similar to all the other structurally known neuraminidase types, it doesn't appear to be the most likely physiological condition for N1. To investigate the specific ligand binding properties of the open form, we screened by docking simulation, a large dataset of ligands and compared the results with closed form. The virtual screening procedure was implemented in a docking pipeline that also performs a step-by-step, target specific, filtering approach for data reduction. The selected ligands display binding ability involving multiple sites of interaction including the active site and an adjacent cavity made available by the 150-loop shift. Two ligands are especially interesting and are proposed as substituents to design oseltamivir derivatives specifically suited for the open conformation.
Notes:
2008
Federica Viti, Ivan Merelli, Andrea Caprera, Barbara Lazzari, Alessandra Stella, Luciano Milanesi (2008)  Ontology-based, Tissue MicroArray oriented, image centered tissue bank.   BMC Bioinformatics 9 Suppl 4: 04  
Abstract: BACKGROUND: Tissue MicroArray technique is becoming increasingly important in pathology for the validation of experimental data from transcriptomic analysis. This approach produces many images which need to be properly managed, if possible with an infrastructure able to support tissue sharing between institutes. Moreover, the available frameworks oriented to Tissue MicroArray provide good storage for clinical patient, sample treatment and block construction information, but their utility is limited by the lack of data integration with biomolecular information. RESULTS: In this work we propose a Tissue MicroArray web oriented system to support researchers in managing bio-samples and, through the use of ontologies, enables tissue sharing aimed at the design of Tissue MicroArray experiments and results evaluation. Indeed, our system provides ontological description both for pre-analysis tissue images and for post-process analysis image results, which is crucial for information exchange. Moreover, working on well-defined terms it is then possible to query web resources for literature articles to integrate both pathology and bioinformatics data. CONCLUSIONS: Using this system, users associate an ontology-based description to each image uploaded into the database and also integrate results with the ontological description of biosequences identified in every tissue. Moreover, it is possible to integrate the ontological description provided by the user with a full compliant gene ontology definition, enabling statistical studies about correlation between the analyzed pathology and the most commonly related biological processes.
Notes:
Mirko Francesconi, Daniel Remondini, Nicola Neretti, John M Sedivy, Leon N Cooper, Ettore Verondini, Luciano Milanesi, Gastone Castellani (2008)  Reconstructing networks of pathways via significance analysis of their intersections.   BMC Bioinformatics 9 Suppl 4: 04  
Abstract: BACKGROUND: Significance analysis at single gene level may suffer from the limited number of samples and experimental noise that can severely limit the power of the chosen statistical test. This problem is typically approached by applying post hoc corrections to control the false discovery rate, without taking into account prior biological knowledge. Pathway or gene ontology analysis can provide an alternative way to relax the significance threshold applied to single genes and may lead to a better biological interpretation. RESULTS: Here we propose a new analysis method based on the study of networks of pathways. These networks are reconstructed considering both the significance of single pathways (network nodes) and the intersection between them (links).We apply this method for the reconstruction of networks of pathways to two gene expression datasets: the first one obtained from a c-Myc rat fibroblast cell line expressing a conditional Myc-estrogen receptor oncoprotein; the second one obtained from the comparison of Acute Myeloid Leukemia and Acute Lymphoblastic Leukemia derived from bone marrow samples. CONCLUSION: Our method extends statistical models that have been recently adopted for the significance analysis of functional groups of genes to infer links between these groups. We show that groups of genes at the interface between different pathways can be considered as relevant even if the pathways they belong to are not significant by themselves.
Notes:
Chanchal K Mitra, Luciano Milanesi (2008)  An unusal distribution of 6-nt sequences near the transcription start site.   J Integr Bioinform 5: 2. 08  
Abstract: A new look at the transcription start is presented in which we can see transcription factors binding to both sides of the TSS as an essential requirement. Naturally the factor binding to the downstream region must be removed so that transcription process can continue. The presence of a number of distinct transcription factors also can be used to explain selective activation of various genes. The transcription start site by itself plays only a minor role in the whole process. We also suggest that mutations close to the TSS on the coding side can be fatal even if preserves the codon table.
Notes:
Barbara Lazzari, Andrea Caprera, Alberto Vecchietti, Ivan Merelli, Francesca Barale, Luciano Milanesi, Alessandra Stella, Carlo Pozzi (2008)  Version VI of the ESTree db: an improved tool for peach transcriptome analysis.   BMC Bioinformatics 9 Suppl 2: 03  
Abstract: BACKGROUND: The ESTree database (db) is a collection of Prunus persica and Prunus dulcis EST sequences that in its current version encompasses 75,404 sequences from 3 almond and 19 peach libraries. Nine peach genotypes and four peach tissues are represented, from four fruit developmental stages. The aim of this work was to implement the already existing ESTree db by adding new sequences and analysis programs. Particular care was given to the implementation of the web interface, that allows querying each of the database features. RESULTS: A Perl modular pipeline is the backbone of sequence analysis in the ESTree db project. Outputs obtained during the pipeline steps are automatically arrayed into the fields of a MySQL database. Apart from standard clustering and annotation analyses, version VI of the ESTree db encompasses new tools for tandem repeat identification, annotation against genomic Rosaceae sequences, and positioning on the database of oligomer sequences that were used in a peach microarray study. Furthermore, known protein patterns and motifs were identified by comparison to PROSITE. Based on data retrieved from sequence annotation against the UniProtKB database, a script was prepared to track positions of homologous hits on the GO tree and build statistics on the ontologies distribution in GO functional categories. EST mapping data were also integrated in the database. The PHP-based web interface was upgraded and extended. The aim of the authors was to enable querying the database according to all the biological aspects that can be investigated from the analysis of data available in the ESTree db. This is achieved by allowing multiple searches on logical subsets of sequences that represent different biological situations or features. CONCLUSIONS: The version VI of ESTree db offers a broad overview on peach gene expression. Sequence analyses results contained in the database, extensively linked to external related resources, represent a large amount of information that can be queried via the tools offered in the web interface. Flexibility and modularity of the ESTree analysis pipeline and of the web interface allowed the authors to set up similar structures for different datasets, with limited manual intervention.
Notes:
Francesca Panzitta, Andrea Caprera, Ivan Merelli, Luciano Milanesi, John L Williams, Barbara Lazzari, Alessandra Stella (2008)  Mining the bovine genome with the "Bovine SNP Retriever".   J Hered 99: 6. 696-698 Nov/Dec  
Abstract: Online resources for the bovine genome analysis are provided at the most important Web sites. Nonetheless, retrieval of single-nucleotide polymorphism (SNP)-related information is not always easy when searches must focus on complementary features. In this work, we present the Bovine SNP Retriever: a user-friendly tool for bovine SNP retrieval that also facilities the retrieval of SNP-related information within user-selected quantitative traits loci regions and reverse electronic polymerase chain reaction analysis on the bovine genome. The Bovine SNP Retriever is available at http://www.itb.cnr.it/ptp/bovine_snp_retriever/.
Notes:
Alessandro Orro, Guia Guffanti, Erika Salvi, Fabio Macciardi, Luciano Milanesi (2008)  SNPLims: a data management system for genome wide association studies.   BMC Bioinformatics 9 Suppl 2: 03  
Abstract: BACKGROUND: Recent progresses in genotyping technologies allow the generation high-density genetic maps using hundreds of thousands of genetic markers for each DNA sample. The availability of this large amount of genotypic data facilitates the whole genome search for genetic basis of diseases.We need a suitable information management system to efficiently manage the data flow produced by whole genome genotyping and to make it available for further analyses. RESULTS: We have developed an information system mainly devoted to the storage and management of SNP genotype data produced by the Illumina platform from the raw outputs of genotyping into a relational database.The relational database can be accessed in order to import any existing data and export user-defined formats compatible with many different genetic analysis programs.After calculating family-based or case-control association study data, the results can be imported in SNPLims. One of the main features is to allow the user to rapidly identify and annotate statistically relevant polymorphisms from the large volume of data analyzed. Results can be easily visualized either graphically or creating ASCII comma separated format output files, which can be used as input to further analyses. CONCLUSIONS: The proposed infrastructure allows to manage a relatively large amount of genotypes for each sample and an arbitrary number of samples and phenotypes. Moreover, it enables the users to control the quality of the data and to perform the most common screening analyses and identify genes that become "candidate" for the disease under consideration.
Notes:
Roberta Alfieri, Ivan Merelli, Ettore Mosca, Luciano Milanesi (2008)  The cell cycle DB: a systems biology approach to cell cycle analysis.   Nucleic Acids Res 36: Database issue. D641-D645 Jan  
Abstract: The cell cycle database is a biological resource that collects the most relevant information related to genes and proteins involved in human and yeast cell cycle processes. The database, which is accessible at the web site http://www.itb.cnr.it/cellcycle, has been developed in a systems biology context, since it also stores the cell cycle mathematical models published in the recent years, with the possibility to simulate them directly. The aim of our resource is to give an exhaustive view of the cell cycle process starting from its building-blocks, genes and proteins, toward the pathway they create, represented by the models.
Notes:
2007
Fedor Kolpakov, Vladimir Poroikov, Ruslan Sharipov, Yury Kondrakhin, Alexey Zakharov, Alexey Lagunin, Luciano Milanesi, Alexander Kel (2007)  CYCLONET--an integrated database on cell cycle regulation and carcinogenesis.   Nucleic Acids Res 35: Database issue. D550-D556 Jan  
Abstract: Computational modelling of mammalian cell cycle regulation is a challenging task, which requires comprehensive knowledge on many interrelated processes in the cell. We have developed a web-based integrated database on cell cycle regulation in mammals in normal and pathological states (Cyclonet database). It integrates data obtained by 'omics' sciences and chemoinformatics on the basis of systems biology approach. Cyclonet is a specialized resource, which enables researchers working in the field of anticancer drug discovery to analyze the wealth of currently available information in a systematic way. Cyclonet contains information on relevant genes and molecules; diagrams and models of cell cycle regulation and results of their simulation; microarray data on cell cycle and on various types of cancer, information on drug targets and their ligands, as well as extensive bibliography on modelling of cell cycle and cancer-related gene expression data. The Cyclonet database is also accessible through the BioUML workbench, which allows flexible querying, analyzing and editing the data by means of visual modelling. Cyclonet aims to predict promising anticancer targets and their agents by application of Prediction of Activity Spectra for Substances. The Cyclonet database is available at http://cyclonet.biouml.org.
Notes:
Pasqualina D'Ursi, Francesca Marino, Andrea Caprera, Luciano Milanesi, Elena M Faioni, Ermanna Rovida (2007)  ProCMD: a database and 3D web resource for protein C mutants.   BMC Bioinformatics 8 Suppl 1: 03  
Abstract: BACKGROUND: Activated Protein C (ProC) is an anticoagulant plasma serine protease which also plays an important role in controlling inflammation and cell proliferation. Several mutations of the gene are associated with phenotypic functional deficiency of protein C, and with the risk of developing venous thrombosis. Structure prediction and computational analysis of the mutants have proven to be a valuable aid in understanding the molecular aspects of clinical thrombophilia. RESULTS: We have built a specialized relational database and a search tool for natural mutants of protein C. It contains 195 entries that include 182 missense and 13 stop mutations. A menu driven search engine allows the user to retrieve stored information for each variant, that include genetic as well as structural data and a multiple alignment highlighting the substituted position. Molecular models of variants can be visualized with interactive tools; PDB coordinates of the models are also available for further analysis. Furthermore, an automatic modelling interface allows the user to generate multiple alignments and 3D models of new variants. CONCLUSION: ProCMD is an up-to-date interactive mutant database that integrates phenotypical descriptions with functional and structural data obtained by computational approaches. It will be useful in the research and clinical fields to help elucidate the chain of events leading from a molecular defect to the related disease. It is available for academics at the URL http://www.itb.cnr.it/procmd/.
Notes:
Emanuela Merelli, Giuliano Armano, Nicola Cannata, Flavio Corradini, d'Inverno Mark, Andreas Doms, Phillip Lord, Andrew Martin, Luciano Milanesi, Steffen Möller, Michael Schroeder, Michael Luck (2007)  Agents in bioinformatics, computational and systems biology.   Brief Bioinform 8: 1. 45-59 Jan  
Abstract: The adoption of agent technologies and multi-agent systems constitutes an emerging area in bioinformatics. In this article, we report on the activity of the Working Group on Agents in Bioinformatics (BIOAGENTS) founded during the first AgentLink III Technical Forum meeting on the 2nd of July, 2004, in Rome. The meeting provided an opportunity for seeding collaborations between the agent and bioinformatics communities to develop a different (agent-based) approach of computational frameworks both for data analysis and management in bioinformatics and for systems modelling and simulation in computational and systems biology. The collaborations gave rise to applications and integrated tools that we summarize and discuss in context of the state of the art in this area. We investigate on future challenges and argue that the field should still be explored from many perspectives ranging from bio-conceptual languages for agent-based simulation, to the definition of bio-ontology-based declarative languages to be used by information agents, and to the adoption of agents for computational grids.
Notes:
Gabriele A Trombetti, Raoul J P Bonnal, Ermanno Rizzi, Gianluca De Bellis, Luciano Milanesi (2007)  Data handling strategies for high throughput pyrosequencers.   BMC Bioinformatics 8 Suppl 1: 03  
Abstract: BACKGROUND: New high throughput pyrosequencers such as the 454 Life Sciences GS 20 are capable of massively parallelizing DNA sequencing providing an unprecedented rate of output data as well as potentially reducing costs. However, these new pyrosequencers bear a different error profile and provide shorter reads than those of a more traditional Sanger sequencer. These facts pose new challenges regarding how the data are handled and analyzed, in addition, the steep increase in the sequencers throughput calls for much computation power at a low cost. RESULTS: To address these challenges, we created an automated multi-step computation pipeline integrated with a database storage system. This allowed us to store, handle, index and search (1) the output data from the GS20 sequencer (2) analysis projects, possibly multiple on every dataset (3) final results of analysis computations (4) intermediate results of computations (these allow hand-made comparisons and hence further searches by the biologists). Repeatability of computations was also a requirement. In order to access the needed computation power, we ported the pipeline to the European Grid: a large community of clusters, load balanced as a whole. In order to better achieve this Grid port we created Vnas: an innovative Grid job submission, virtual sandbox manager and job callback framework.After some runs of the pipeline aimed at tuning the parameters and thresholds for optimal results, we successfully analyzed 273 sequenced amplicons from a cancerous human sample and correctly found punctual mutations confirmed by either Sanger resequencing or NCBI dbSNP. The sequencing was performed with our 454 Life Sciences GS 20 pyrosequencer. CONCLUSION: We handled the steep increase in throughput from the new pyrosequencer by building an automated computation pipeline associated with database storage, and by leveraging the computing power of the European Grid. The Grid platform offers a very cost effective choice for uneven workloads, typical in many scientific research fields, provided its peculiarities can be accepted (these are discussed). The mentioned infrastructure was used to analyze human amplicons for mutations. More analyses will be performed in the future.
Notes:
Ezio Bartocci, Diletta Cacciagrano, Nicola Cannata, Flavio Corradini, Emanuela Merelli, Luciano Milanesi, Paolo Romano (2007)  An agent-based multilayer architecture for bioinformatics grids.   IEEE Trans Nanobioscience 6: 2. 142-148 Jun  
Abstract: Due to the huge volume and complexity of biological data available today, a fundamental component of biomedical research is now in silico analysis. This includes modelling and simulation of biological systems and processes, as well as automated bioinformatics analysis of high-throughput data. The quest for bioinformatics resources (including databases, tools, and knowledge) becomes therefore of extreme importance. Bioinformatics itself is in rapid evolution and dedicated Grid cyberinfrastructures already offer easier access and sharing of resources. Furthermore, the concept of the Grid is progressively interleaving with those of Web Services, semantics, and software agents. Agent-based systems can play a key role in learning, planning, interaction, and coordination. Agents constitute also a natural paradigm to engineer simulations of complex systems like the molecular ones. We present here an agent-based, multilayer architecture for bioinformatics Grids. It is intended to support both the execution of complex in silico experiments and the simulation of biological systems. In the architecture a pivotal role is assigned to an "alive" semantic index of resources, which is also expected to facilitate users' awareness of the bioinformatics domain.
Notes:
Roberta Alfieri, Ivan Merelli, Ettore Mosca, Luciano Milanesi (2007)  A data integration approach for cell cycle analysis oriented to model simulation in systems biology.   BMC Syst Biol 1: 08  
Abstract: BACKGROUND: The cell cycle is one of the biological processes most frequently investigated in systems biology studies and it involves the knowledge of a large number of genes and networks of protein interactions. A deep knowledge of the molecular aspect of this biological process can contribute to making cancer research more accurate and innovative. In this context the mathematical modelling of the cell cycle has a relevant role to quantify the behaviour of each component of the systems. The mathematical modelling of a biological process such as the cell cycle allows a systemic description that helps to highlight some features such as emergent properties which could be hidden when the analysis is performed only from a reductionism point of view. Moreover, in modelling complex systems, a complete annotation of all the components is equally important to understand the interaction mechanism inside the network: for this reason data integration of the model components has high relevance in systems biology studies. DESCRIPTION: In this work, we present a resource, the Cell Cycle Database, intended to support systems biology analysis on the Cell Cycle process, based on two organisms, yeast and mammalian. The database integrates information about genes and proteins involved in the cell cycle process, stores complete models of the interaction networks and allows the mathematical simulation over time of the quantitative behaviour of each component. To accomplish this task, we developed, a web interface for browsing information related to cell cycle genes, proteins and mathematical models. In this framework, we have implemented a pipeline which allows users to deal with the mathematical part of the models, in order to solve, using different variables, the ordinary differential equation systems that describe the biological process. CONCLUSION: This integrated system is freely available in order to support systems biology research on the cell cycle and it aims to become a useful resource for collecting all the information related to actual and future models of this network. The flexibility of the database allows the addition of mathematical data which are used for simulating the behavior of the cell cycle components in the different models. The resource deals with two relevant problems in systems biology: data integration and mathematical simulation of a crucial biological process related to cancer, such as the cell cycle. In this way the resource is useful both to retrieve information about cell cycle model components and to analyze their dynamical properties. The Cell Cycle Database can be used to find system-level properties, such as stable steady states and oscillations, by coupling structure and dynamical information about models.
Notes:
Angelo Boccia, Gianluca Busiello, Luciano Milanesi, Giovanni Paolella (2007)  A fast job scheduling system for a wide range of bioinformatic applications.   IEEE Trans Nanobioscience 6: 2. 149-154 Jun  
Abstract: Bioinformatic tools are often used by researchers through interactive Web interfaces, resulting in a strong demand for computational resources. The tools are of different kind and range from simple, quick tasks, to complex analyses requiring minutes to hours of processing time and often longer than that. Batteries of computational nodes, such as those found in parallel clusters, provide a platform of choice for this application, especially when a relatively large number of concurrent requests is expected. Here, we describe a scheduling architecture operating at the application level, able to distribute jobs over a large number of hierarchically organized nodes. While not contrasting and peacefully living together with low-level scheduling software, the system takes advantage of tools, such as SQL servers, commonly used in Web applications, to produce low latency and performance which compares well and often surpasses that of more traditional, dedicated schedulers. The system provides the basic functionality necessary to node selection, task execution and service management and monitoring, and may combine loosely linked computational resources, such as those located in geographically distinct sites.
Notes:
Elizabeth van der Wath, Loukas Moutsianas, Richard van der Wath, Alet Visagie, Luciano Milanesi, Pietro Liò (2007)  Grid methodology for identifying co-regulated genes and transcription factor binding sites.   IEEE Trans Nanobioscience 6: 2. 162-167 Jun  
Abstract: The identification of the genes that are coordinately regulated is an important and challenging task of bioinformatics and represents a first step in the elucidation of the topology of transcriptional networks. We first compare the performances, in a grid setting, of the Markov clustering algorithm with respect to the k-means using microarray test data sets. The gene expression information of the clustered genes can be used to annotate transcription binding sites upstream co-regulated genes. The methodology uses a regression model that relates gene expression levels to the matching scores of nucleotide patterns allowing us to identify DNA-binding sites from a collection of noncoding DNA sequences from co-regulated genes. Here we discuss extending the approach to multiple species exploiting the grid framework.
Notes:
Ivan Merelli, Giulia Morra, Luciano Milanesi (2007)  Evaluation of a grid based molecular dynamics approach for polypeptide simulations.   IEEE Trans Nanobioscience 6: 3. 229-234 Sep  
Abstract: Molecular dynamics is very important for biomedical research because it makes possible simulation of the behavior of a biological macromolecule in silico. However, molecular dynamics is computationally rather expensive: the simulation of some nanoseconds of dynamics for a large macromolecule such as a protein takes very long time, due to the high number of operations that are needed for solving the Newton's equations in the case of a system of thousands of atoms. In order to obtain biologically significant data, it is desirable to use high-performance computation resources to perform these simulations. Recently, a distributed computing approach based on replacing a single long simulation with many independent short trajectories has been introduced, which in many cases provides valuable results. This study concerns the development of an infrastructure to run molecular dynamics simulations on a grid platform in a distributed way. The implemented software allows the parallel submission of different simulations that are singularly short but together bring important biological information. Moreover, each simulation is divided into a chain of jobs to avoid data loss in case of system failure and to contain the dimension of each data transfer from the grid. The results confirm that the distributed approach on grid computing is particularly suitable for molecular dynamics simulations thanks to the elevated scalability.
Notes:
J Y Bansard, D Rebholz-Schuhmann, G Cameron, D Clark, E van Mulligen, E Beltrame, E Barbolla, F Del Hoyo Martin-Sanchez, L Milanesi, I Tollis, J van der Lei, J L Coatrieux (2007)  Medical informatics and bioinformatics: a bibliometric study.   IEEE Trans Inf Technol Biomed 11: 3. 237-243 May  
Abstract: This paper reports on an analysis of the bioinformatics and medical informatics literature with the objective to identify upcoming trends that are shared among both research fields to derive benefits from potential collaborative initiatives for their future. Our results present the main characteristics of the two fields and show that these domains are still relatively separated.
Notes:
Alessandro Orro, Luciano Milanesi (2007)  An agent approach for protein function analysis in a grid infrastructure.   Stud Health Technol Inform 126: 314-321  
Abstract: Many tasks in bioinformatics can be faced only using a combination of computational tools. In particular, functional annotation of gene products can be a very expensive task that may require the application of many analysis together with a manual intervention of biologists. In this area, the phylogenomics inference is one of the most accurate analysis methodologies for functional annotation that is not yet widely used due to the computational cost of some steps in its protocol. This paper discusses the implementation and deployment of such analysis protocol in a distributed grid environment using an agent architecture in order to simplify the interaction between users and the grid.
Notes:
Paolo Romano, Ezio Bartocci, Guglielmo Bertolini, Flavio De Paoli, Domenico Marra, Giancarlo Mauri, Emanuela Merelli, Luciano Milanesi (2007)  Biowep: a workflow enactment portal for bioinformatics applications.   BMC Bioinformatics 8 Suppl 1: 03  
Abstract: BACKGROUND: The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. RESULTS: We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. CONCLUSION: We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics - LITBIO.
Notes:
Dietrich Rebholz-Schuhman, Graham Cameron, Dominic Clark, Erik van Mulligen, Jean-Louis Coatrieux, Eva Del Hoyo Barbolla, Fernando Martin-Sanchez, Luciano Milanesi, Ivan Porro, Francesco Beltrame, Ioannis Tollis, Johan Van der Lei (2007)  SYMBIOmatics: synergies in Medical Informatics and Bioinformatics--exploring current scientific literature for emerging topics.   BMC Bioinformatics 8 Suppl 1: 03  
Abstract: BACKGROUND: The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to http://www.symbiomatics.org). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics. RESULTS: This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000-2005 ("recent") and 1990-1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays. CONCLUSION: We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science.
Notes:
Gabriele A Trombetti, Ivan Merelli, Alessandro Orro, Luciano Milanesi (2007)  BGBlast: a BLAST grid implementation with database self-updating and adaptive replication.   Stud Health Technol Inform 126: 23-30  
Abstract: BLAST is probably the most used application in bioinformatics teams. BLAST complexity tends to be a concern when the query sequence sets and reference databases are large. Here we present BGBlast: an approach for handling the computational complexity of large BLAST executions by porting BLAST to the Grid platform, leveraging the power of the thousands of CPUs which compose the EGEE infrastructure. BGBlast provides innovative features for efficiently managing BLAST databases in the distributed Grid environment. The system (1) keeps the databases constantly up to date while still allowing the user to regress to earlier versions, (2) stores the older versions of databases on the Grid with a time and space efficient delta encoding and (3) manages the number of replicas for each database over the Grid with an adaptive algorithm, dynamically balancing between execution parallelism and storage costs.
Notes:
Roberta Alfieri, Ettore Mosca, Ivan Merelli, Luciano Milanesi (2007)  Parameter estimation for cell cycle ordinary differential equation (ODE) models using a grid approach.   Stud Health Technol Inform 126: 93-102  
Abstract: Cell cycle is one of the biological processes that has been investigated the most in the recent years, this due to its importance in cancer studies and drug discovery. The complexity of this biological process is revealed every time a mathematical simulation of the processes is carried out. We propose an automated approach that mathematically simulates the cell cycle process with the aim to describe the best estimation of the model. We have implemented a system that starting from a cell cycle model is capable of retrieving from a specific database, called Cell Cycle Database, the necessary mathematical information to perform simulation using a grid approach and identify the best model related to a specific dataset of experimental results from the real biological system. Our system allows the visualization of mathematical expressions, such as the kinetic rate law of a reaction, and the direct simulation of the models with the aim to give the user the possibility to interact with the simulation system. The parameter estimation process usually implies time-consuming computations due to algorithms of linear regression and stochastic methods. In particular, in the case of a stochastic approach based on evolutionary algorithms, the iterative selection process implies many different computations. Therefore, a large number of ODE system simulations are required: the grid infrastructure allows to distribute and obtain the best model that fits the experimental data. The computation of many ODE systems can be distributed on different grid nodes so that the execution time for the estimation of the best model is reduced. This system will be useful for the comparison of models with different initial conditions related to normal and deregulated cell cycles.
Notes:
Federica Viti, Ivan Merelli, Antonella Galizia, Daniele D'Agostino, Andrea Clematis, Luciano Milanesi (2007)  Tissue MicroArray: a distributed Grid approach for image analysis.   Stud Health Technol Inform 126: 291-298  
Abstract: The Tissue MicroArray (TMA) technique is assuming even more importance. Digital images acquisition becomes fundamental to provide an automatic system for subsequent analysis. The accuracy of the results depends on the image resolution, which has to be very high in order to provide as many details as possible. Lossless formats are more suitable to bring information, but data file size become a critical factor researchers have to deal with. This affects not only storage methods but also computing times and performances. Pathologists and researchers who work with biological tissues, in particular with the TMA technique, need to consider a large number of case studies to formulate and validate their hypotheses. It is clear the importance of image sharing between different institutes worldwide to increase the amount of interesting data to work with. In this context, preserving the security of sensitive data is a fundamental issue. In most of the cases copying patient data in places different from the original database is forbidden by the owner institutes. Storage, computing and security are key problems of TMA methodology. In our system we tackle all these aspects using the EGEE (Enabling Grids for E-sciencE) Grid infrastructure. The Grid platform provides good storage, performance in image processing and safety of sensitive patient information: this architecture offers hundreds of Storage and Computing Elements and enables users to handle images without copying them to physical disks other than where they have been archived by the owner, giving back to end-users only the processed anonymous images. The efficiency of the TMA analysis process is obtained implementing algorithms based on functions provided by the Parallel IMAge processing Genoa Library (PIMA(GE)2 Lib). The acquisition of remotely distributed TMA images is made using specialized I/O functions based on the Grid File Access Library (GFAL) API. In our opinion this approach may represent important contribution to tele-pathology development.
Notes:
Barbara Lazzari, Andrea Caprera, Cristian Cosentino, Alessandra Stella, Luciano Milanesi, Angelo Viotti (2007)  ESTuber db: an online database for Tuber borchii EST sequences.   BMC Bioinformatics 8 Suppl 1: 03  
Abstract: BACKGROUND: The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface. RESULTS: Sequences included in the ESTuber db were clustered and annotated against three databases: the GenBank nr database, the UniProtKB database and a third in-house prepared database of fungi genomic sequences. An algorithm was implemented to infer statistical classification among Gene Ontology categories from the ontology occurrences deduced from the annotation procedure against the UniProtKB database. Ontologies were also deduced from the annotation of more than 130,000 EST sequences from five filamentous fungi, for intra-species comparison purposes.Further analyses were performed on the ESTuber db dataset, including tandem repeats search and comparison of the putative protein dataset inferred from the EST sequences to the PROSITE database for protein patterns identification. All the analyses were performed both on the complete sequence dataset and on the contig consensus sequences generated by the EST assembly procedure. CONCLUSION: The resulting web site is a resource of data and links related to truffle expressed genes. The Sequence Report and Contig Report pages are the web interface core structures which, together with the Text search utility and the Blast utility, allow easy access to the data stored in the database.
Notes:
2006
Hurng-Chun Lee, Jean Salzemann, Nicolas Jacq, Hsin-Yen Chen, Li-Yung Ho, Ivan Merelli, Luciano Milanesi, Vincent Breton, Simon C Lin, Ying-Ta Wu (2006)  Grid-enabled high-throughput in silico screening against influenza A neuraminidase.   IEEE Trans Nanobioscience 5: 4. 288-295 Dec  
Abstract: Encouraged by the success of the first EGEE biomedical data challenge against malaria (WISDOM), the second data challenge battling avian flu was kicked off in April 2006 to identify new drugs for the potential variants of the influenza A virus. Mobilizing thousands of CPUs on the Grid, the six-week-long high-throughput screening activity has fulfilled over 100 CPU years of computing power and produced around 600 gigabytes of results on the Grid for further biological analysis and testing. In the paper, we demonstrate the impact of a worldwide Grid infrastructure to efficiently deploy large-scale virtual screening to speed up the drug design process. Lessons learned through the data challenge activity are also discussed.
Notes:
Giuliana Saltini, Roberto Dominici, Carlo Lovati, Monica Cattaneo, Stefania Michelini, Giulia Malferrari, Andrea Caprera, Luciano Milanesi, Dario Finazzi, Pierluigi Bertora, Elio Scarpini, Daniela Galimberti, Eliana Venturelli, Massimo Musicco, Fulvio Adorni, Claudio Mariani, Ida Biunno (2006)  A novel polymorphism in SEL1L confers susceptibility to Alzheimer's disease.   Neurosci Lett 398: 1-2. 53-58 May  
Abstract: Alzheimer's disease (AD) is considered to be a conformational disease arising from the accumulation of misfolded and unfolded proteins in the endoplasmic reticulum (ER). SEL1L is a component of the ER stress degradation system, which serves to remove unfolded proteins by retrograde degradation using the ubiquitin-proteosome system. In order to identify genetic variations possibly involved in the disease, we analysed the entire SEL1L gene sequence in Italian sporadic AD patients. Here we report on the identification of a new polymorphism within the SEL1L intron 3 (IVS3-88 A>G), which contains potential binding sites for transcription factors involved in ER-induced stress. Our statistical analysis shows a possible role of the novel polymorphism as independent susceptibility factor of Alzheimer's dementia.
Notes:
Marco Severgnini, Silvio Bicciato, Eleonora Mangano, Francesca Scarlatti, Alessandra Mezzelani, Michela Mattioli, Riccardo Ghidoni, Clelia Peano, Raoul Bonnal, Federica Viti, Luciano Milanesi, Gianluca De Bellis, Cristina Battaglia (2006)  Strategies for comparing gene expression profiles from different microarray platforms: application to a case-control experiment.   Anal Biochem 353: 1. 43-56 Jun  
Abstract: Meta-analysis of microarray data is increasingly important, considering both the availability of multiple platforms using disparate technologies and the accumulation in public repositories of data sets from different laboratories. We addressed the issue of comparing gene expression profiles from two microarray platforms by devising a standardized investigative strategy. We tested this procedure by studying MDA-MB-231 cells, which undergo apoptosis on treatment with resveratrol. Gene expression profiles were obtained using high-density, short-oligonucleotide, single-color microarray platforms: GeneChip (Affymetrix) and CodeLink (Amersham). Interplatform analyses were carried out on 8414 common transcripts represented on both platforms, as identified by LocusLink ID, representing 70.8% and 88.6% of annotated GeneChip and CodeLink features, respectively. We identified 105 differentially expressed genes (DEGs) on CodeLink and 42 DEGs on GeneChip. Among them, only 9 DEGs were commonly identified by both platforms. Multiple analyses (BLAST alignment of probes with target sequences, gene ontology, literature mining, and quantitative real-time PCR) permitted us to investigate the factors contributing to the generation of platform-dependent results in single-color microarray experiments. An effective approach to cross-platform comparison involves microarrays of similar technologies, samples prepared by identical methods, and a standardized battery of bioinformatic and statistical analyses.
Notes:
L Milanesi, I Merelli (2006)  High performance GRID based implementation for genomics and protein analysis.   Stud Health Technol Inform 120: 374-380  
Abstract: Starting from the genomic and proteomic sequence data, a complex computational infrastructure as been established with the objective to develop a GRID based system to to automate the analysis, prediction and annotation processes of genomic DNA. To support of this type of analysis, several algorithms as been used to recognize biological signals involved in the identification of genes and proteins. The system implemented can be use to analyse the content of the large number of genomic sequences. For this reason, the system realized is capable of using a computational architecture specifically designed for intensive computing based on GRID technologies developed throughout the BIOINFOGRID European project. We developed a GRID based workflow to correlate different kind of Bioinformatics data, going from the Genomics Nucleotide to the Protein Sequence. The first step in the workflow consists of submitting a nucleotide sequence that is elaborated by a specific software for gene prediction. In particular this tool performs a search in the nucleotide sequence to find out the key components of gene. The predicted gene is then translated in the corresponding protein sequence. Based on protein sequence is then possible to identify the domains that characterize the protein functionality using specific tools of domain prediction. Protein domains classification are very important in the analysis of the macromolecular functionality. To analyze a whole protein family from large genome of various organism means to elaborate a large amount of data that requires huge computational resources. To analyze all this data we suggest the use of a high performance platform based on grid technology. We have implemented our applications on a wide area grid platform for scientific applications [http://www.grid.it and http://grid-it.cnaf.infn.it] composed of about 1000 CPU's. The grid infrastructure consists in a collection of computing elements and storage elements that jointly concur to define a platform for high performance elaboration. In this study a grid based application is presented to compute the protein domain analysis in a distributed way. This approach has high performance because the protein domains are checked with different software in parallel in different grid sites.
Notes:
2005
Giuliano Armano, Gianmaria Mancosu, Luciano Milanesi, Alessandro Orro, Massimiliano Saba, Eloisa Vargiu (2005)  A hybrid genetic-neural system for predicting protein secondary structure.   BMC Bioinformatics 6 Suppl 4: Dec  
Abstract: BACKGROUND: Due to the strict relation between protein function and structure, the prediction of protein 3D-structure has become one of the most important tasks in bioinformatics and proteomics. In fact, notwithstanding the increase of experimental data on protein structures available in public databases, the gap between known sequences and known tertiary structures is constantly increasing. The need for automatic methods has brought the development of several prediction and modelling tools, but a general methodology able to solve the problem has not yet been devised, and most methodologies concentrate on the simplified task of predicting secondary structure. RESULTS: In this paper we concentrate on the problem of predicting secondary structures by adopting a technology based on multiple experts. The system performs an overall processing based on two main steps: first, a "sequence-to-structure" prediction is enforced by resorting to a population of hybrid (genetic-neural) experts, and then a "structure-to-structure" prediction is performed by resorting to an artificial neural network. Experiments, performed on sequences taken from well-known protein databases, allowed to reach an accuracy of about 76%, which is comparable to those obtained by state-of-the-art predictors. CONCLUSION: The adoption of a hybrid technique, which encompasses genetic and neural technologies, has demonstrated to be a promising approach in the task of protein secondary structure prediction.
Notes:
Linda Pattini, Ivan Merelli, Sergio Cerutti, Luciano Milanesi (2005)  Representation and modeling of protein surface determinants.   IEEE Trans Nanobioscience 4: 4. 301-305 Dec  
Abstract: Surface characterization of peptides may provide useful information about functionality and potential interactions with other molecules. A description of a protein site through a surface that models the shape conferred by the exposed residues is an effective tool for the analysis and the modeling of proteins that may highlight similarities and relationships not detectable through comparisons at level of primary, secondary, and tertiary structure. This study concerns the development of a tool that extracts the residues that concur to the shape modeling of the surface of a protein or a portion of it. This task is accomplished without taking into account the order of the amino acids in the primary structure, but only according to the selection of a portion of the protein indicated through geometric parameters or an explicit list of amino acids belonging to the site of interest. Both in the case of an entire protein and in the case of a portion of it, the method provides the mesh that models the surface described by the exposed residues that constitute the external envelope. The developed tool which allows the extraction of the exposed residues, and thus of the potential function determinants, is applied to identify the amino acids that concur to the structural interaction in several protein complexes.
Notes:
Paolo Romano, Domenico Marra, Luciano Milanesi (2005)  Web services and workflow management for biological resources.   BMC Bioinformatics 6 Suppl 4: Dec  
Abstract: BACKGROUND: The completion of the Human Genome Project has resulted in large quantities of biological data which are proving difficult to manage and integrate effectively. There is a need for a system that is able to automate accesses to remote sites and to "understand" the information that it is managing in order to link data properly. Workflow management systems combined with Web Services are promising Information and Communication Technologies (ICT) tools. Some have already been proposed and are being increasingly applied to the biomedical domain, especially as many biology-related Web Services are now becoming available. Information on biological resources and on genomic sequences mutations are two examples of very specialized datasets that are useful for specific research domains. RESULTS: The architecture of a system that is able to access and execute predefined workflows is presented in this paper. Web Services allowing access to the IARC TP53 Mutation Database and CABRI catalogues of biological resources have been implemented and are available on-line. Example workflows which retrieve data from these Web Services have also been created and are available on-line. CONCLUSION: We present a general architecture and some building blocks for the implementation of a system that is able to remotely execute workflows of biomedical interest and show how this approach can effectively produce useful outputs. The further development and implementation of Web Services allowing access to an exhaustive set of biomedical databases and the creation of effective and useful workflows will improve the automation of in-silico analysis.
Notes:
Luciano Milanesi, Mauro Petrillo, Leandra Sepe, Angelo Boccia, Nunzio D'Agostino, Myriam Passamano, Salvatore Di Nardo, Gianluca Tasco, Rita Casadio, Giovanni Paolella (2005)  Systematic analysis of human kinase genes: a large number of genes and alternative splicing events result in functional and structural diversity.   BMC Bioinformatics 6 Suppl 4: Dec  
Abstract: BACKGROUND: Protein kinases are a well defined family of proteins, characterized by the presence of a common kinase catalytic domain and playing a significant role in many important cellular processes, such as proliferation, maintenance of cell shape, apoptosis. In many members of the family, additional non-kinase domains contribute further specialization, resulting in subcellular localization, protein binding and regulation of activity, among others. About 500 genes encode members of the kinase family in the human genome, and although many of them represent well known genes, a larger number of genes code for proteins of more recent identification, or for unknown proteins identified as kinase only after computational studies. RESULTS: A systematic in silico study performed on the human genome, led to the identification of 5 genes, on chromosome 1, 11, 13, 15 and 16 respectively, and 1 pseudogene on chromosome X; some of these genes are reported as kinases from NCBI but are absent in other databases, such as KinBase. Comparative analysis of 483 gene regions and subsequent computational analysis, aimed at identifying unannotated exons, indicates that a large number of kinase may code for alternately spliced forms or be incorrectly annotated. An InterProScan automated analysis was performed to study domain distribution and combination in the various families. At the same time, other structural features were also added to the annotation process, including the putative presence of transmembrane alpha helices, and the cystein propensity to participate into a disulfide bridge. CONCLUSION: The predicted human kinome was extended by identifying both additional genes and potential splice variants, resulting in a varied panorama where functionality may be searched at the gene and protein level. Structural analysis of kinase proteins domains as defined in multiple sources together with transmembrane alpha helices and signal peptide prediction provides hints to function assignment. The results of the human kinome analysis are collected in the KinWeb database, available for browsing and searching over the internet, where all results from the comparative analysis and the gene structure annotation are made available, alongside the domain information. Kinases may be searched by domain combinations and the relative genes may be viewed in a graphic browser at various level of magnification up to gene organization on the full chromosome set.
Notes:
Igor B Rogozin, Boris A Malyarchuk, Youri I Pavlov, Luciano Milanesi (2005)  From context-dependence of mutations to molecular mechanisms of mutagenesis.   Pac Symp Biocomput 409-420  
Abstract: Mutation frequencies vary significantly along nucleotide sequences such that mutations often concentrate at certain positions called hotspots. Mutation hotspots in DNA reflect intrinsic properties of the mutation process, such as sequence specificity, that manifests itself at the level of interaction between mutagens, DNA, and the action of the repair and replication machineries. The nucleotide sequence context of mutational hotspots is a fingerprint of interactions between DNA and repair/replication/modification enzymes, and the analysis of hotspot context provides evidence of such interactions. The hotspots might also reflect structural and functional features of the respective DNA sequences and provide information about natural selection. We discuss analysis of 8-oxoguanine-induced mutations in pro- and eukaryotic genes, polymorphic positions in the human mitochondrial DNA and mutations in the HIV-1 retrovirus. Comparative analysis of 8-oxoguanine-induced mutations and spontaneous mutation spectra suggested that a substantial fraction of spontaneous A x T-->C x T mutations is caused by 8-oxoGTP in nucleotide pools. In the case of human mitochondrial DNA, significant differences between molecular mechanisms of mutations in hypervariable segments and coding part of DNA were detected. Analysis of mutations in the HIV-1 retrovirus suggested a complex interplay between molecular mechanisms of mutagenesis and natural selection.
Notes:
Pasqualina D'Ursi, Erika Salvi, Paola Fossa, Luciano Milanesi, Ermanna Rovida (2005)  Modelling the interaction of steroid receptors with endocrine disrupting chemicals.   BMC Bioinformatics 6 Suppl 4: Dec  
Abstract: BACKGROUND: The organic polychlorinated compounds like dichlorodiphenyltrichloroethane with its metabolites and polychlorinated biphenyls are a class of highly persistent environmental contaminants. They have been recognized to have detrimental health effects both on wildlife and humans acting as endocrine disrupters due to their ability of mimicking the action of the steroid hormones, and thus interfering with hormone response. There are several experimental evidences that they bind and activate human steroid receptors. However, despite the growing concern about the toxicological activity of endocrine disrupters, molecular data of the interaction of these compounds with biological targets are still lacking. RESULTS: We have used a flexible docking approach to characterize the molecular interaction of seven endocrine disrupting chemicals with estrogen, progesterone and androgen receptors in the ligand-binding domain. All ligands docked in the buried hydrophobic cavity corresponding to the hormone steroid pocket. The interaction was characterized by multiple hydrophobic contacts involving a different number of residues facing the binding pocket, depending on ligands orientation. The EDC ligands did not display a unique binding mode, probably due to their lipophilicity and flexibility, which conferred them a great adaptability into the hydrophobic and large binding pocket of steroid receptors. CONCLUSION: Our results are in agreement with toxicological data on binding and allow to describe a pattern of interactions for a group of ECD to steroid receptors suggesting the requirement of a hydrophobic cavity to accommodate these chlorine carrying compounds. Although the affinity is lower than for hormones, their action can be brought about by a possible synergistic effect.
Notes:
Barbara Lazzari, Andrea Caprera, Alberto Vecchietti, Alessandra Stella, Luciano Milanesi, Carlo Pozzi (2005)  ESTree db: a tool for peach functional genomics.   BMC Bioinformatics 6 Suppl 4: Dec  
Abstract: BACKGROUND: The ESTree db http://www.itb.cnr.it/estree/ represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. RESULTS: The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. CONCLUSION: The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig.
Notes:
Ivan Merelli, Giulia Morra, Daniele D'Agostino, Andrea Clematis, Luciano Milanesi (2005)  High performance workflow implementation for protein surface characterization using grid technology.   BMC Bioinformatics 6 Suppl 4: Dec  
Abstract: BACKGROUND: This study concerns the development of a high performance workflow that, using grid technology, correlates different kinds of Bioinformatics data, starting from the base pairs of the nucleotide sequence to the exposed residues of the protein surface. The implementation of this workflow is based on the Italian Grid.it project infrastructure, that is a network of several computational resources and storage facilities distributed at different grid sites. METHODS: Workflows are very common in Bioinformatics because they allow to process large quantities of data by delegating the management of resources to the information streaming. Grid technology optimizes the computational load during the different workflow steps, dividing the more expensive tasks into a set of small jobs. RESULTS: Grid technology allows efficient database management, a crucial problem for obtaining good results in Bioinformatics applications. The proposed workflow is implemented to integrate huge amounts of data and the results themselves must be stored into a relational database, which results as the added value to the global knowledge. CONCLUSION: A web interface has been developed to make this technology accessible to grid users. Once the workflow has started, by means of the simplified interface, it is possible to follow all the different steps throughout the data processing. Eventually, when the workflow has been terminated, the different features of the protein, like the amino acids exposed on the protein surface, can be compared with the data present in the output database.
Notes:
Giuliano Armano, Luciano Milanesi, Alessandro Orro (2005)  Multiple alignment through protein secondary-structure information.   IEEE Trans Nanobioscience 4: 3. 207-211 Sep  
Abstract: It is well known that protein secondary-structure information can help the process of performing multiple alignment, in particular when the amount of similarity among the involved sequences moves toward the "twilight zone" (less than 30% of pairwise similarity). In this paper, a multiple alignment algorithm is presented, explicitly designed for exploiting any available secondary-structure information. A layered architecture with two interacting levels has been defined for dealing with both primary- and secondary-structure information of target sequences. Secondary structure (either available or predicted by resorting to a technique based on multiple experts) is used to calculate an initial alignment at the secondary level, to be arranged by locally scoped operators devised to refine the alignment at the primary level. Aimed at evaluating the impact of secondary information on the quality of alignments, in particular alignments with a low degree of similarity, the technique has been implemented and assessed on relevant test cases.
Notes:
2003
Igor B Rogozin, Vladimir N Babenko, Luciano Milanesi, Youri I Pavlov (2003)  Computational analysis of mutation spectra.   Brief Bioinform 4: 3. 210-227 Sep  
Abstract: Mutation frequencies vary along a nucleotide sequence, and nucleotide positions with an exceptionally high mutation frequency are called hotspots. Mutation hotspots in DNA often reflect intrinsic properties of the mutation process, such as the specificity with which mutagens interact with nucleic acids and the sequence-specificity of DNA repair/replication enzymes. They might also reflect structural and functional features of target protein or RNA sequences in which they occur. The determinants of mutation frequency and specificity are complex and there are many analytical methods for their study. This paper discusses computational approaches to analysing mutation spectra (distribution of mutations along the target genes) that include many detectable (mutable) positions. The following methods are reviewed: mutation hotspot prediction; pairwise and multiple comparisons of mutation spectra; derivation of a consensus sequence; and analysis of correlation between nucleotide sequence features and mutation spectra. Spectra of spontaneous and induced mutations are used for illustration of the complexities and pitfalls of such analyses. In general, the DNA sequence context of mutation hotspots is a fingerprint of interactions between DNA and DNA repair/replication/modification enzymes, and the analysis of hotspot context provides evidence of such interactions.
Notes:
Luciano Milanesi, Igor B Rogozin (2003)  ESTMAP: a system for expressed sequence tags mapping on genomic sequences.   IEEE Trans Nanobioscience 2: 2. 75-78 Jun  
Abstract: The completion of a number of large genome sequencing projects emphasizes the importance of protein-coding gene predictions. Most of the problems associated with gene prediction are caused by the complex exon-intron structures commonly found in eukaryotic genomes. However, information from homologous sequences can significantly improve the accuracy of the prediction. In particular, expressed sequence tags (ESTs) are very useful for this purpose, since currently existing EST collections are very large. We developed an ESTMAP system, which utilizes homology searches against a database of repetitive elements using the RepeatView program and the EST Division of GenBank using the BLASTN program. ESTMAP extracts "exact" matches with EST sequences (> 95% of homology) from BLASTN output file and predicts introns in DNA comparing ESTs and a query sequence. ESTMAP is implemented as a part of the WebGene system (http://www.cnr.it/webgene).
Notes:
2002
Vadim P Valuev, Dmitry A Afonnikov, Mikhail P Ponomarenko, Luciano Milanesi, Nikolay A Kolchanov (2002)  ASPD (Artificially Selected Proteins/Peptides Database): a database of proteins and peptides evolved in vitro.   Nucleic Acids Res 30: 1. 200-202 Jan  
Abstract: ASPD is a new curated database that incorporates data on full-length proteins, protein domains and peptides that were obtained through in vitro directed evolution processes (mainly by means of phage display). At present, the ASPD database contains data on 195 selection experiments, which were described in 112 original papers. For each experiment, the following information is given: (i) description of the target for binding, (ii) description of the protein or peptide which serves as the template for library construction and description of the native protein which binds the target, (iii) links to the major proteomic databases (SWISS-PROT, PDB, PROSITE and ENZYME), (iv) keywords referring to the biological significance of the experiment, (v) aligned sequences of proteins or peptides retrieved through in vitro evolution and relevant native or constructed sequences, (vi) the number of rounds of selection/amplification and (vii) the number of occurrences of clones with each sequence. The literature data include a full reference, a link to the MEDLINE database and the name of the corresponding author with his email address. ASPD has a user-friendly interface which allows for simple queries using the names of proteins and ligands, as well as keywords describing the biological role of the interaction studied, and also for queries based on authors' names. It is also possible to access the database by means of the SRS system, allowing complex queries. There is a BLAST search tool against the ASPD for looking directly for homologous sequences. Research tools of the ASPD allow the analysis of pairwise correlations in the sequences of proteins and peptides selected against one target. The URL for the ASPD database is http://www.sgi.sscc.ru/mgs/gnw/aspd/.
Notes:
2001
I B Rogozin, A V Kochetov, F A Kondrashov, E V Koonin, L Milanesi (2001)  Presence of ATG triplets in 5' untranslated regions of eukaryotic cDNAs correlates with a 'weak' context of the start codon.   Bioinformatics 17: 10. 890-900 Oct  
Abstract: MOTIVATION: The context of the start codon (typically, AUG) and the features of the 5' Untranslated Regions (5' UTRs) are important for understanding translation regulation in eukaryotic mRNAs and for accurate prediction of the coding region in genomic and cDNA sequences. The presence of AUG triplets in 5' UTRs (upstream AUGs) might effect the initiation rate and, in the context of gene prediction, could reduce the accuracy of the identification of the authentic start. To reveal potential connections between the presence of upstream AUGs and other features of 5' UTRs, such as their length and the start codon context, we undertook a systematic analysis of the available eukaryotic 5' UTR sequences. RESULTS: We show that a large fraction of 5' UTRs in the available cDNA sequences, 15-53% depending on the organism, contain upstream ATGs. A negative correlation was observed between the information content of the translation start signal and the length of the 5' UTR. Similarly, a negative correlation exists between the 'strength' of the start context and the number of upstream ATGs. Typically, cDNAs containing long 5' UTRs with multiple upstream ATGs have a 'weak' start context, and in contrast, cDNAs containing short 5' UTRs without ATGs have 'strong' starts. These counter-intuitive results may be interpreted in terms of upstream AUGs having an important role in the regulation of translation efficiency by ensuring low basal translation level via double negative control and creating the potential for additional regulatory mechanisms. One of such mechanisms, supported by experimental studies of some mRNAs, includes removal of the AUG-containing portion of the 5' UTR by alternative splicing. AVAILABILITY: An ATG_ EVALUATOR program is available upon request or at www.itba.mi.cnr.it/webgene. CONTACT: rogozin@ncbi.nlm.nih.gov, milanesi@itba.mi.cnr.it.
Notes:
2000
I B Rogozin, V I Mayorov, M V Lavrentieva, L Milanesi, L R Adkison (2000)  Prediction and phylogenetic analysis of mammalian short interspersed elements (SINEs).   Brief Bioinform 1: 3. 260-274 Sep  
Abstract: The presence of repetitive elements can create serious problems for sequence analysis, especially in the case of homology searches in nucleotide sequence databases. Repetitive elements should be treated carefully by using special programs and databases. In this paper, various aspects of SINE (short interspersed repetitive element) identification, analysis and evolution are discussed.
Notes:
1999
A S Frolov, S V Lavriushev, D A Grigorovich, A E Kel, A A Ptitsyn, N A Kolchanov, N L Podkolodnyĭ, V V Solov'ev, L Milanesi, P Bourne (1999)  WWWMGS: an integrated server for molecular-genetic studies   Biofizika 44: 5. 832-836 Sep/Oct  
Abstract: We report an integrative technology for molecular biology studies in the field of transcription regulation by using Internet. A set of databases, programs, and systems are included into WWWMGS Web server. For example, the use of TRRD database information for site prediction is described. Using this method, the computer system SeqAnn was developed. The system performs the "real time" searching for prediction of initiation transcription site position according to database information. WWWMGS is available at URL: http://wwwmgs.bionet.nsc.ru/.
Notes:
N A Kolchanov, M P Ponomarenko, A S Frolov, E A Ananko, F A Kolpakov, E V Ignatieva, O A Podkolodnaya, T N Goryachkovskaya, I L Stepanenko, T I Merkulova, V V Babenko, Y V Ponomarenko, A V Kochetov, N L Podkolodny, D V Vorobiev, S V Lavryushev, D A Grigorovich, Y V Kondrakhin, L Milanesi, E Wingender, V Solovyev, G C Overton (1999)  Integrated databases and computer systems for studying eukaryotic gene expression.   Bioinformatics 15: 7-8. 669-686 Jul/Aug  
Abstract: MOTIVATION: The goal of the work was to develop a WWW-oriented computer system providing a maximal integration of informational and software resources on the regulation of gene expression and navigation through them. Rapid growth of the variety and volume of information accumulated in the databases on regulation of gene expression necessarily requires the development of computer systems for automated discovery of the knowledge that can be further used for analysis of regulatory genomic sequences. RESULTS: The GeneExpress system developed includes the following major informational and software modules: (1) Transcription Regulation (TRRD) module, which contains the databases on transcription regulatory regions of eukaryotic genes and TRRD Viewer for data visualization; (2) Site Activity Prediction (ACTIVITY), the module for analysis of functional site activity and its prediction; (3) Site Recognition module, which comprises (a) B-DNA-VIDEO system for detecting the conformational and physicochemical properties of DNA sites significant for their recognition, (b) Consensus and Weight Matrices (ConsFrec) and (c) Transcription Factor Binding Sites Recognition (TFBSR) systems for detecting conservative contextual regions of functional sites and their recognition; (4) Gene Networks (GeneNet), which contains an object-oriented database accumulating the data on gene networks and signal transduction pathways, and the Java-based Viewer for exploration and visualization of the GeneNet information; (5) mRNA Translation (Leader mRNA), designed to analyze structural and contextual properties of mRNA 5'-untranslated regions (5'-UTRs) and predict their translation efficiency; (6) other program modules designed to study the structure-function organization of regulatory genomic sequences and regulatory proteins. AVAILABILITY: GeneExpress is available at http://wwwmgs.bionet.nsc. ru/systems/GeneExpress/ and the links to the mirror site(s) can be found at http://wwwmgs.bionet.nsc.ru/mgs/links/mirrors.html+ ++.
Notes:
I B Rogozin, D D'Angelo, L Milanesi (1999)  Protein-coding regions prediction combining similarity searches and conservative evolutionary properties of protein-coding sequences.   Gene 226: 1. 129-137 Jan  
Abstract: The gene identification procedure in a completely new gene with no good homology with protein sequences can be a very complex task. In order to identify the protein-coding region, a new method, 'SYNCOD', based on the analysis of conservative evolutionary properties of coding regions, has been realized. This program is able to identify and use the coding region homologies of the non-annotated (unknown) protein-coding sequences already present in the nucleotide sequence databases by using the alignment produced by BLASTN. The ratio of number mismatches resulting in synonymous codons to the number of mismatches resulting in non-synonymous codons is estimated for each open reading frame. Monte Carlo simulations are then used to estimate the significance of the ratio deviation from random behavior. The SYNCOD program has been tested on generated random sequences and on different control sets. The high accuracy of predicting protein-coding regions (the correlation coefficient, CC, varies from 0.67 to 0.79) and the high specificity (the portion of wrong exons, WE, varies from 0.06 to 0.07) have proved to be important features of the suggested approach. The SYNCOD program is resident on the ITBA-CNR Web Server and can be used via the Internet (URL: www.itba.mi.cnr.it/webgene).
Notes:
L Milanesi, D D'Angelo, I B Rogozin (1999)  GeneBuilder: interactive in silico prediction of gene structure.   Bioinformatics 15: 7-8. 612-621 Jul/Aug  
Abstract: MOTIVATION: Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. RESULTS: We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. AVAILABILITY: The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
Notes:
1998
M G Campi, P Romano, L Milanesi, D Marra, M A Manniello, B Iannotta, G Rondanina, E Grasso, T Ruzzon, L Santi (1998)  Molecular Probe Data Base (MPDB).   Nucleic Acids Res 26: 1. 145-147 Jan  
Abstract: In this paper, the current status of the Molecular Probe Data Base (http://www.biotech.ist.unige.it/interlab/ mpdb.html ) is briefly presented together with a short analysis of its activity during 1997. This has been performed by statistically evaluating the 'logs' of the Internet servers that are used for its distribution with reference to the geographical origin of the requests, the words that were utilized to carry out of the searches and the oligonucleotides that were retrieved. Planned enhancements of this database are also described. They include a revision of its data structure and, even more relevant, of its data management procedures.
Notes:
N A Kolchanov, M P Ponomarenko, A E Kel, Kondrakhin YuV, A S Frolov, F A Kolpakov, T N Goryachkovsky, O V Kel, E A Ananko, E V Ignatieva, O A Podkolodnaya, V N Babenko, I L Stepanenko, A G Romashchenko, T I Merkulova, D G Vorobiev, S V Lavryushev, Ponomarenko YuV, A V Kochetov, G B Kolesov, V V Solovyev, L Milanesi, N L Podkolodny, E Wingender, T Heinemeyer (1998)  GeneExpress: a computer system for description, analysis, and recognition of regulatory sequences in eukaryotic genome.   Proc Int Conf Intell Syst Mol Biol 6: 95-104  
Abstract: GeneExpress system has been designed to integrate description, analysis, and recognition of eukaryotic regulatory sequences. The system includes 5 basic units: (1) GeneNet contains an object-oriented database for accumulation of data on gene networks and signal transduction pathways and a Java-based viewer that allows an exploration and visualization of the GeneNet information; (2) Transcription Regulation combines the database on transcription regulatory regions of eukaryotic genes (TRRD) and TRRD Viewer; (3) Transcription Factor Binding Site Recognition contains a compilation of transcription factor binding sites (TFBSC) and programs for their analysis and recognition; (4) mRNA Translation is designed for analysis of structural and contextual features of mRNA 5'UTRs and prediction of their translation efficiency; and (5) ACTIVITY is the module for analysis and site activity prediction of a given nucleotide sequence. Integration of the databases in the GeneExpress is based on the Sequence Retrieval System (SRS) created in the European Bioinformatics Institute.
Notes:
G B Glazko, L Milanesi, I B Rogozin (1998)  The subclass approach for mutational spectrum analysis: application of the SEM algorithm.   J Theor Biol 192: 4. 475-487 Jun  
Abstract: Analysis and comparison of mutational spectra represents an important problem in molecular biology. To analyse a mutational spectra we apply an algorithm based on the SEM subclass approach (Simulation, Expectation, Maximization). The algorithm tries to classify the mutational sites according to different mutation probabilities, and each site should belong to one class. Each class is approximated by binomial distribution and thus any real mutational spectrum is regarded as a mixture of binomial distributions. The separation process runs iteratively. Each iteration includes the simulation, maximization and estimation procedures. To evaluate the quality of the classification results, the X2 test is used. The algorithm has been checked on random spectra with preset parameters and on real mutational spectra. As has been shown, 17 out of 19 analysed real mutational spectra can be divided into two or more classes of sites, of which one contains hotspots of mutation. For the G:C-->A:T mutational spectra induced by Sn1 alkylating mutagenes (11 spectra) the classification accuracy was 0.95. To test different site volumes, each Sn1-induced spectrum was divided into the G-->A and C-->T spectra. The classification accuracy for these spectra was 0.96. From the analysis of classification errors it is possible to suggest that at least part of them cannot be ascribed to the faults of the algorithm but are caused by some special features of the mutagenesis itself. The results of the real data are in good relation with existing knowledge. The approach we present is an attempt to formalize the concept of a "mutational hotspot". The program implementing the SEM algorithm is available on the Web server (http:/(/)www.itba.mi.cnr.it/webmutation).
Notes:
1997
I Biunno, I B Rogozin, V Appierto, L Milanesi, M Mostardini, S Mumm, R Pergolizzi, I Zucchi, G De Bellis (1997)  Sequence and gene content in 35 kb genomic clone mapping in the human Xq27.1 region.   DNA Seq 8: 1-2. 1-15  
Abstract: This paper presents detailed analysis of the entire sequence of a cosmid clone, 26H7, containing 35 kb of human DNA. This cosmid resides on the q27.1 region of the human X chromosome between, DXS1232 and DXS119 loci. Novel potential small exons were detected for which conventional gene identification strategies (Northern blot analysis and extensive cDNA library screening) proved to be inefficient. Of the standard repetitive elements we found: 8 Alu's making up 6.2% of the sequence; 10 MIR segments (4.1%); 5 LINE1 elements (4.8%), 3 MIR2 (1.0%); 2 MLT (2.9%), and 1 MSTA (0.7%) representing about 20% of the total sequence. The overall GC content was rather low, only 42% and no CpG island was detected using rare restriction enzymes. However, a CpG-rich region was identified. Computer aided analysis of the sequence inferred the presence of three possible genes: one of them was found to be homologous to the U7 RNA family elements; a second is reported in this paper, however at the moment no significant homology has been found in the data bank. The third predicted gene has not as yet been found to be detectable by RT-PCR. We also report in this paper the identification of X-chromosome specific repeated sequences.
Notes:
M Mostardini, V Appierto, R Pergolizzi, I Zucchi, S Mumm, G DeBellis, L Milanesi, I B Rogozin, I Biunno (1997)  Identification of a U7snRNA homologue mapping to the human Xq27.1 region, between the DXS1232 and DXS119 loci.   Gene 187: 2. 221-224 Mar  
Abstract: To contribute to the identification and analysis of novel genes, we undertook the study of a cosmid clone in the Xq27 region of human DNA. The cloned fragment was previously observed to have a high number of evolutionarily conserved sequences. In this genomic stretch of DNA we have identified sequence homologous to the U7 RNA gene including its potential regulatory elements. This paper describes the genomic organisation of this gene and its mapping to the Xq27.1 genomic sub-interval between the DXS1232 and DXS119 loci.
Notes:
I B Rogozin, L Milanesi (1997)  Analysis of donor splice sites in different eukaryotic organisms.   J Mol Evol 45: 1. 50-59 Jul  
Abstract: We present here a new algorithm for functional site analysis. It is based on four main assumptions: each variation of nucleotide composition makes a different contribution to the overall binding free energy of interaction between a functional site and another molecule; nonfunctioning site-like regions (pseudosites) are absent or rare in genomes; there may be errors in the sample of sites; and nucleotides of different site positions are considered to be mutually dependent. In this algorithm, the site set is divided into subsets, each described by a certain consensus. Donor splice sites of the human protein-coding genes were analyzed. Comparing the results with other methods of donor splice site prediction has demonstrated a more accurate prediction of consensus sequences AG/GU(A,G), G/GUnAG, /GU(A,G)AG, /GU(A,G)nGU, and G/GUA than is achieved by weight matrix and consensus (A,C)AG/GU(A,G)AGU with mismatches. The probability of the first type error, E1, for the obtained consensus set was about 0.05, and the probability of the second type error, E2, was 0.15. The analysis demonstrated that accuracy of the functional site prediction could be improved if one takes into account correlations between the site positions. The accuracy of prediction by using human consensus sequences was tested on sequences from different organisms. Some differences in consensus sequences for the plant Arabidopsis sp., the invertebrate Caenorhabditis sp., and the fungus Aspergillus sp. were revealed. For the yeast Saccharomyces sp. only one conservative consensus, /GUA(U,A,C)G(U,A,C), was revealed (E1 = 0.03, E2 = 0.03). Yeast is a very interesting model to use for analysis of molecular mechanisms of splicing.
Notes:
1996
L Milanesi, M Muselli, P Arrigo (1996)  Hamming-Clustering method for signals prediction in 5' and 3' regions of eukaryotic genes.   Comput Appl Biosci 12: 5. 399-404 Oct  
Abstract: MOTIVATION: Gene expression is regulated by different kinds of short nucleotide domains. These features can either activate or terminate the transcription process. To predict the signal sites in the 5' and 3' gene regions we applied the Hamming-Clustering network (HC) to the TATA box, to the transcription initiation site and to the poly(A) signal determination in DNA sequences. This approach employs a technique deriving from the synthesis of digital networks in order to generate prototypes, or rules, which can be directly analysed or used for the construction of a final neural network. RESULTS: More than 1000 poly-A signals have been extracted from EMBL database rel. 42 and used to build the training and the test set. A full set of the eukaryotic genes (1252 entry) from the Eukaryotic Promoter Database (EPD rel. 42) have been used for the TATA-box signal and transcription network approach. The results show the applicability of the Hamming-Clustering method to functional signal prediction.
Notes:
S Faranda, A Frattini, I Zucchi, C Patrosso, L Milanesi, C Montagna, P Vezzoni (1996)  Characterization and fine localization of two new genes in Xq28 using the genomic sequence/EST database screening approach.   Genomics 34: 3. 323-327 Jun  
Abstract: Two new genes were identified and mapped by searching the EST databases with genomic sequences obtained from putative CpG islands of the rodent-human hybrid X3000. Previous mapping of these CpG islands in the proximity of the host cell factor (HCFC1) and GdX genes automatically localized these two new genes to Xq28 in the interval between the L1 cell adhesion molecule (L1CAM) and the glucose-6-phosphate dehydrogenase (G6PD) loci. Both genes are relatively short, contain an ORF of 261 and 105 amino acids, respectively, and are ubiquitously expressed. Combining sequencing of selected CpG islands, derived from hybrids containing small portions of the human genome, with an EST database search is an easy method of identifying and mapping new genes to specific regions of the genome.
Notes:
I B Rogozin, L Milanesi, N A Kolchanov (1996)  Gene structure prediction using information on homologous protein sequence.   Comput Appl Biosci 12: 3. 161-170 Jun  
Abstract: In this paper a new approach for the prediction of protein coding gene structures is described. The principal scheme of prediction is as follows: first, the exons with the best potential are predicted in a sequence with unknown functions and a list of potential amino acid fragments coded by these exons is formed. Second, testing the homology between each amino acid fragment from the list and proteins from the SWISS-PROT database of amino acid sequences. One protein with the best homology is chosen out of all the homologous sequences. Third, reconstruction of the exon-intron structure, basing it on its homology with the chosen protein sequences. The method was tested on an independent control set (20 genes). The results were as follows: 21% of real exons were lost and 3% of non-real exons were found. This system can be used to refine the results of gene prediction systems, especially if highly homologous proteins are found in the amino acid sequence database.
Notes:
1995
A E Kel, Y V Kondrakhin, Kolpakov PhA, O V Kel, A G Romashenko, E Wingender, L Milanesi, N A Kolchanov (1995)  Computer tool FUNSITE for analysis of eukaryotic regulatory genomic sequences.   Proc Int Conf Intell Syst Mol Biol 3: 197-205  
Abstract: We present the computer tool FUNSITE for description and analysis of regulatory sequences of eukaryotic genomes. The tool consists of the following main parts: 1) An integrated database for genomic regulatory sequences. The integrated database was designed on the basis of the databases TRANSFAC (Wingender 1994) and TRRD (Kel et al. 1995) that are currently under development. The following functions are performed: i) linkage to the EMBL database; ii) preparing samples of definite types of functional sites with their flanking sequences; iii) preparing samples of promoter sequences; iv) preparing samples of transcription factors classified with regard to structural and functional features of DNA binding and activating domains, functional families of the factors, their tissue specificity and other functional features; v) access to data on mutual disposition of cis-elements within the regulatory regions. 2) The second component of FUNSITE tool is the set of programs for analysis of the structural organization of regulatory sequences: i) Program for revealing of potential transcription factors binding sites based on their consensi; ii) program for revealing of the potential binding sites using homology search with nucleotide sequences of real binding sites; iii) program for analysis of oligonucleotide context features which are characteristic of flank sequences of the binding sites; iv) program for design of recognition method for the functional sites based on generalized weight matrix; v) program for revealing potential composite elements. The results of analysis of the promoter sequences of eukaryotic genes with the FUNSITE are presented, too.
Notes:
Y V Kondrakhin, A E Kel, N A Kolchanov, A G Romashchenko, L Milanesi (1995)  Eukaryotic promoter recognition by binding sites for transcription factors.   Comput Appl Biosci 11: 5. 477-488 Oct  
Abstract: A method for identification of eukaryotic promoters by localization of binding sites for transcription factors has been suggested. The binding sites for a range of transcription factors have been found to be distributed unevenly. Based on these distributions, we have constructed a weight matrix of binding site localization. On the basis of the weight matrix we have, in turn, designed an algorithm for promoter recognition. To increase the accuracy of the method, we have developed a routine that breaks any promoter sample into subsamples. The method to be reported on allows much better recognition accuracy than does the approach based on detection of the TATA box. In particular, the overprediction error is three times lower following our method. The program FunSiteP recognizes promoters from newly uncovered sequences and tentatively identifies the functional class the promoters must belong to. We have introduced the notion of 'regulatory potential' for the degree to which any region of the sequences is similar to the real eukaryotic promoter. By making use of the potential, we have revealed putative transcription start sites and extended regions of transcription regulation.
Notes:
1994
M C Patrosso, M Repetto, A Villa, L Milanesi, A Frattini, S Faranda, M Mancini, E Maestrini, D Toniolo, P Vezzoni (1994)  The exon-intron organization of the human X-linked gene (FLN1) encoding actin-binding protein 280.   Genomics 21: 1. 71-76 May  
Abstract: We have determined the exon-intron organization of the human X-linked gene (FLN1) encoding actin-binding protein 280 (filamin), a ubiquitous protein that plays an important role in the mechanochemical activities of cells through its association with actin filaments and membrane components. The gene is composed of 47 exons spanning approximately 26 kb. The first and part of the second exon are untranslated. The actin-binding domain at the N-terminus is encoded by exons 2 to 5. The 96-amino-acid repeats corresponding to the elongated rod backbone of the protein are encoded by the remaining 42 exons: size, location, and boundaries of the exons cannot be easily correlated with the repeated structure, while sequences interrupting the repeats (the two hinge segments preceding repeats 16 and 24 and the 8-amino-acid (aa) segment interrupting the 15th repeat) were encoded by separate exons, suggesting that they may be recent additions to the X-linked protein. The 8-aa segment is encoded by exon 29, which is alternatively spliced.
Notes:
V B Strelets, A A Ptitsyn, L Milanesi, H A Lim (1994)  Data bank homology search algorithm with linear computation complexity.   Comput Appl Biosci 10: 3. 319-322 Jun  
Abstract: A new algorithm for data bank homology search is proposed. The principal advantages of the new algorithm are: (i) linear computation complexity; (ii) low memory requirements; and (iii) high sensitivity to the presence of local region homology. The algorithm first calculates indicative matrices of k-tuple 'realization' in the query sequence and then searches for an appropriate number of matching k-tuples within a narrow range in database sequences. It does not require k-tuple coordinates tabulation and in-memory placement for database sequences. The algorithm is implemented in a program for execution on PC-compatible computers and tested on PIR and GenBank databases with good results. A few modifications designed to improve the selectivity are also discussed. As an application example, the search for homology of the mouse homeotic protein HOX 3.1 is given.
Notes:
1993
A E Kel, M P Ponomarenko, E A Likhachev, Orlov YuL, I V Ischenko, L Milanesi, N A Kolchanov (1993)  SITEVIDEO: a computer system for functional site analysis and recognition. Investigation of the human splice sites.   Comput Appl Biosci 9: 6. 617-627 Dec  
Abstract: We developed the computer system SITEVIDEO for analysis and recognition of the functional sites in DNA and RNA molecules. It reveals contextual features essential for site function and thus enable the user to design efficient methods for recognition of the functional sites. We mainly considered only quantitative characteristics reflecting the uneven distribution of oligonucleotides in the sequences of functional sites of interest. The approach suggested makes use of available information about the hierarchical organization of the functional sites, and ensures highly precise prediction of the sites. The present analysis is concerned with the human donor and acceptor splice sites. A method for recognizing these sites in the sequences with an accuracy of approximately 90% was developed.
Notes:
1992
V B Streletc, I N Shindyalov, N A Kolchanov, L Milanesi (1992)  Fast, statistically based alignment of amino acid sequences on the base of diagonal fragments of DOT-matrices.   Comput Appl Biosci 8: 6. 529-534 Dec  
Abstract: We present a new pairwise alignment algorithm that uses iterative statistical analysis of homologous subsequences. Apart from the classical conversion of the DOT-matrix characteristic of the Needleman-Wunsch algorithm (NW), we used only those matrix elements that corresponded to the most non-random subsequence homologies. The most reliable elements of the DOT-matrix are written to the compact competition matrices. The algorithm then searches for alignment on the base of only these matrix elements. Our algorithm has low storage and memory requirements, but provides a reliable alignment for the sequences of weak homology (or, at least for the homology regions). In such cases classical NW algorithms often produce unreliable results on the level of statistical noise due to accumulation of random matchings throughout the aligned sequences.
Notes:
C Tribioli, F Tamanini, C Patrosso, L Milanesi, A Villa, R Pergolizzi, E Maestrini, S Rivella, S Bione, M Mancini (1992)  Methylation and sequence analysis around EagI sites: identification of 28 new CpG islands in XQ24-XQ28.   Nucleic Acids Res 20: 4. 727-733 Feb  
Abstract: Thirty-two probes for CpG islands of the distal long arm of the human X chromosome have been identified. From a genomic library of DNA of the hamster-human cell hybrid X3000.1 digested with the rare cutter restriction enzyme EagI, 53 different human clones have been isolated and characterized by methylation and sequence analysis. The characteristic pattern of DNA methylation of CpG islands at the 5' end of genes of the X chromosome has been used to distinguish between EagI sites in CpG islands versus isolated EagI sites. The sequence analysis has confirmed and completed the characterization showing that sequences at the 5' end of known genes were among the clones defined CpG islands and that the non-CpG islands clones were mostly repetitive sequences with a non-methylated or variably methylated EagI site. Thus, since clones corresponding to repetitive sequences can be easily identified by sequencing, such libraries are a very good source of CpG islands. The methylation analysis of 28 different new probes allows to state that demethylation of CpG islands of the active X and methylation of those on the inactive X chromosome are the general rule. Moreover, the finding, in all instances, of methylation differences between male and female DNA is in very strong support of the notion that most genes of the distal long arm of the X chromosome are subject to X inactivation.
Notes:
1988
M C Gilardi, V Bettinardi, A Todd-Pokropek, L Milanesi, F Fazio (1988)  Assessment and comparison of three scatter correction techniques in single photon emission computed tomography.   J Nucl Med 29: 12. 1971-1979 Dec  
Abstract: The detection of scattered radiation is recognized as one of the major sources of error in single photon emission computed tomography (SPECT). In this work three scatter correction techniques have been assessed and compared. Scatter coefficients and parameters characteristic of each technique have been calculated through Monte Carlo simulations and experimentally measured for various source geometries. Their dependence on the source/matter distribution and their spatial non-stationarity have been described. Each of the three scatter correction methods has then been tested on several SPECT phantom studies. The three methods provided comparable results. Following scatter compensation, both image quality and quantitative accuracy improved. In particular a slight improvement in spatial resolution and a statistically significant increase in cold lesion contrast, hot lesion recovery coefficient, and signal/noise ratio have been demonstrated with all methods.
Notes:
1987
C Birattari, M Bonardi, A Ferrari, L Milanesi, M Silari (1987)  Biomedical applications of cyclotrons and review of commercially available models.   J Med Eng Technol 11: 4. 166-176 Jul/Aug  
Abstract: The growing use of cyclotrons in biomedicine, both for clinical and research purposes and in particular for the production of short-lived radionuclides which are extremely useful in nuclear medicine diagnosis, has reached a stage in which commercial companies are able to offer several models with different performances, in order to satisfy the demand of different users. Many of these commercially produced accelerators are installed all over the world and some of them have been operating for several years, demonstrating that this category of machine has reached a high degree of reliability. A brief description of the operating principle of the cyclotron is presented, together with an illustration of its possible applications in the medical field. A list of the models presently available on the market is given and the installation problems and the criteria to be followed in the choice of a model are discussed. Finally, likely future developments in the field are briefly discussed.
Notes:
1986
C Birattari, M Bonardi, A Ferrari, L Milanesi, M Silari (1986)  Cyclotrons in medicine. A survey of commercial models and their biomedical applications   Radiol Med (Torino) 72: 5. 316-327 May  
Abstract: At present in Italy there is a great interest in the use of cyclotrons for medical applications: according to a plan of CNR (National Research Council), accelerators of this kind are going to be installed in some hospitals. After the explanation of the cyclotron operation principles, an outline is given of the possible applications with particular care for the clinical ones. An up-to-date review of commercial models so far developed is reported and finally, after a short note concerning installation problems, some suggestions are given about criteria to be followed in the choice of a model, according to the foreseen scientific program.
Notes:
1984
F Fazio, P Gerundini, A Margonato, W Bencivelli, A Maseri, M C Gilardi, A Fregoso, L Milanesi (1984)  Quantitative radionuclide angiocardiography using gold-195m.   Am J Cardiol 53: 10. 1442-1446 May  
Abstract: A limitation of first-pass radionuclide angiocardiography is the limited repeatability because of the relatively long half-life of technetium-99m (Tc-99m). The feasibility, reproducibility and validity of multiple sequential quantitative first-pass studies were assessed in humans using the short-lived isotope gold-195m (Au-195m) (half-life of 30.6 seconds, 262 keV), which can be directly obtained from a generator made of its parent isotope, mercury-195m (half-life of 41.6 hours). Thirty-three subjects (13 normal volunteers and 20 cardiac patients) were studied using a large-field gamma camera equipped with a medium-energy collimator. After Au-195m intravenous injections, repeat first-pass studies were performed in the left anterior oblique projection. A left anterior oblique study was then obtained after i.v. injection of Tc-99m. Left ventricular ejection fraction calculations were performed separately by 2 observers. Reproducibility of Au-195m first-pass studies was excellent. The correlation coefficients for left ventricular ejection fraction from the first and the second Au-195m injections were 0.93 and 0.98 for observers 1 and 2, respectively. The correlation coefficients between Au-195m and Tc-99m first-pass studies were 0.95 and 0.98, respectively.
Notes:
1982
Powered by PublicationsList.org.