hosted by
publicationslist.org
    

Francisco J Azuaje


fj.azuaje@ieee.org

Books

2010
F Azuaje (2010)  Bioinformatics and biomarker discovery: “Omic” data analysis for personalised medicine   London: Wiley  
Abstract:
Notes: http://eu.wiley.com/WileyCDA/WileyTitle/productCd-047074460X.html
2006
2005
2004

Journal articles

2010
2009
2008
2007
Francisco J Azuaje, Jose L Ramirez, Jose F Da Silveira (2007)  In silico, biologically-inspired modelling of genomic variation generation in surface proteins of Trypanosoma cruzi.   Kinetoplastid Biol Dis 6: 07  
Abstract: BACKGROUND: Protozoan parasites improve the likelihood of invading or adapting to the host through their capacity to present a large repertoire of surface molecules. The understanding of the mechanisms underlying the generation of antigenic diversity is crucial to aid in the development of therapies and the study of evolution. Despite advances driven by molecular biology and genomics, there is a need to gain a deeper understanding of key properties that may facilitate variation generation, models for explaining the role of genomic re-arrangements and the characterisation of surface protein families on the basis of their capacity to generate variation. Computer models may be implemented to explore, visualise and estimate the variation generation capacity of gene families in a dynamic fashion. In this paper we report the dynamic simulation of genomic variation using real T. cruzi coding sequences as inputs to a computational simulation system. The effects of random, multiple-point mutations and gene conversions on genomic variation generation were quantitatively estimated and visualised. Simulations were also implemented to investigate the potential role of pseudogenes as a source of antigenic variation in T. cruzi. RESULTS: Computational models of variation generation were applied to real coding sequences from surface proteins in T. cruzi: trans-sialidase-like proteins and putative surface protein dispersed gene family-1. In the simulations the sequences self-replicated, mutated and re-arranged during thousands of generations. Simulations were implemented for different mutation rates to estimate the relative robustness of the protein families in the face of DNA multiple-point mutations and sequence re-arrangements. The gene super-families and families showed distinguishing evolutionary responses, which may be used to characterise them on the basis of their capacity to generate variability. The simulations showed that sequences from T. cruzi nuclear genes tend to be relatively more robust against random, multiple-point mutations than those obtained from surface protein genes. Simulations also showed that a gene conversion model may act as an effective variation generation mechanism. Differential variation responses can be used to characterise the sequence groups under study. For example, unlike other families, sequences from the DGF1 family have the capacity to maximise variation at the amino acid level under relatively low mutation rates and through gene conversion. However, in relation to the other protein families, they exhibit more robust behaviour in response to more severe modifications through intra-family genomic sequence exchange. Independent simulations indicate that DGF1 pseudogenes might play a role in the generation of greater genomic variation in the DFG1 gene family through gene conversion under different experimental conditions. CONCLUSION: Digital, dynamic simulations may be implemented to characterise gene families on the basis of their capacity to generate variation in the face of genomic perturbations. Such simulations may be useful to explore antigenic variation mechanisms and hypotheses about robustness at the genomic level. This investigation illustrated how sequences derived from surface protein genes and computer simulations can be used to investigate variation generation mechanisms. Such in silico experiments of self-replicating sequences undergoing random mutations and genomic re-arrangements can offer insights into the diversity generation potential of the genes under study. Biologically-inspired simulations may support the study of genomic variation mechanisms in pathogens whose genomes have been recently sequenced.
Notes:
Francisco Azuaje, José Luis Ramirez, José Franco Da Silveira (2007)  An exploration of the genetic robustness landscape of surface protein families in the human protozoan parasite Trypanosoma cruzi.   IEEE Trans Nanobioscience 6: 3. 223-228 Sep  
Abstract: The ability of genes to be robust to mutations at the codon level has been suggested as a key factor for understanding adaptation features. It has been proposed that genes relevant to host-parasite interactions will tend to exhibit high volatility or "antirobust" patterns, which may be related to the capacity of the parasite to evade the host immune system. We compared two superfamilies of surface proteins, trans-sialidase (TS)-like proteins and putative surface protein dispersed gene family-1 (DGF-1), in the parasite Trypanosoma cruzi in terms of a measure of gene volatility. We proposed alternative codon robustness indicators based on cross entropy and impurity of amino acids encoded by point-mutations, which were compared to a volatility estimator previously published. This allowed us to present a more detailed description of the differences between families. A significant difference was observed in terms of these scores, with the TS-MVar1 and the DGF-1 families showing the highest and lowest gene volatility values respectively. The cross entropy and impurity estimators suggest that the MVar1 levels of volatility are linearly correlated with their capacity to generate diverse sets of amino acids as a consequence of potential mutations. This study indicates the feasibility of applying different measures of genetic robustness to detect variations between potential drug targets at the protein level.
Notes:
Ignacio Ponzoni, Francisco Azuaje, Juan Augusto, David Glass (2007)  Inferring adaptive regulation thresholds and association rules from gene expression data through combinatorial optimization learning.   IEEE/ACM Trans Comput Biol Bioinform 4: 4. 624-634 Oct/Dec  
Abstract: There is a need to design computational methods to support the prediction of gene regulatory networks. Such models should offer both biologically-meaningful and computationally-accurate predictions, which in combination with other techniques may improve large-scale, integrative studies. This paper presents a new machine learning method for the prediction of putative regulatory associations from expression data, which exhibit properties never or only partially addressed by other techniques recently published. The method was tested on a Saccharomyces cerevisiae gene expression dataset. The results were statistically validated and compared with the relationships inferred by two machine learning approaches to gene regulatory network prediction. Furthermore, the resulting predictions were assessed using domain knowledge. The proposed algorithm may be able to accurately predict relevant biological associations between genes. One of the most relevant features of this new method is the prediction of adaptive regulation thresholds for the discretization of gene expression values, which is required prior to the rule association learning process. Moreover, an important advantage consists of its low computational cost to infer association rules. The proposed system may significantly support exploratory, large-scale studies of automated identification of potentially-relevant gene expression associations.
Notes:
Anyela Camargo, Francisco Azuaje (2007)  Linking gene expression and functional network data in human heart failure.   PLoS ONE 2: 12. 12  
Abstract: BACKGROUND: Gene expression profiling and the analysis of protein-protein interaction (PPI) networks may support the identification of disease bio-markers and potential drug targets. Thus, a step forward in the development of systems approaches to medicine is the integrative analysis of these data sources in specific pathological conditions. We report such an integrative bioinformatics analysis in human heart failure (HF). A global PPI network in HF was assembled, which by itself represents a useful compendium of the current status of human HF-relevant interactions. This provided the basis for the analysis of interaction connectivity patterns in relation to a HF gene expression data set. RESULTS: Relationships between the significance of the differentiation of gene expression and connectivity degrees in the PPI network were established. In addition, relationships between gene co-expression and PPI network connectivity were analysed. Highly-connected proteins are not necessarily encoded by genes significantly differentially expressed. Genes that are not significantly differentially expressed may encode proteins that exhibit diverse network connectivity patterns. Furthermore, genes that were not defined as significantly differentially expressed may encode proteins with many interacting partners. Genes encoding network hubs may exhibit weak co-expression with the genes encoding their interacting protein partners. We also found that hubs and superhubs display a significant diversity of co-expression patterns in comparison to peripheral nodes. Gene Ontology (GO) analysis established that highly-connected proteins are likely to be engaged in higher level GO biological process terms, while low-connectivity proteins tend to be engaged in more specific disease-related processes. CONCLUSION: This investigation supports the hypothesis that the integrative analysis of differential gene expression and PPI network analysis may facilitate a better understanding of functional roles and the identification of potential drug targets in human heart failure.
Notes:
Haiying Wang, Huiru Zheng, Francisco Azuaje (2007)  Poisson-based self-organizing feature maps and hierarchical clustering for serial analysis of gene expression data.   IEEE/ACM Trans Comput Biol Bioinform 4: 2. 163-175 Apr/Jun  
Abstract: Serial analysis of gene expression (SAGE) is a powerful technique for global gene expression profiling, allowing simultaneous analysis of thousands of transcripts without prior structural and functional knowledge. Pattern discovery and visualization have become fundamental approaches to analyzing such large-scale gene expression data. From the pattern discovery perspective, clustering techniques have received great attention. However, due to the statistical nature of SAGE data (i.e., underlying distribution), traditional clustering techniques may not be suitable for SAGE data analysis. Based on the adaptation and improvement of Self-Organizing Maps and hierarchical clustering techniques, this paper presents two new clustering algorithms, namely, PoissonS and PoissonHC, for SAGE data analysis. Tested on synthetic and experimental SAGE data, these algorithms demonstrate several advantages over traditional pattern discovery techniques. The results indicate that, by incorporating statistical properties of SAGE data, PoissonS and PoissonHC, as well as a hybrid approach (neuro-hierarchical approach) based on the combination of PoissonS and PoissonHC, offer significant improvements in pattern discovery and visualization for SAGE data. Moreover, a user-friendly platform, which may improve and accelerate SAGE data mining, was implemented. The system is freely available on request from the authors for nonprofit use.
Notes:
2006
Peyman Jafari, Francisco Azuaje (2006)  An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors.   BMC Med Inform Decis Mak 6: 06  
Abstract: BACKGROUND: The analysis of large-scale gene expression data is a fundamental approach to functional genomics and the identification of potential drug targets. Results derived from such studies cannot be trusted unless they are adequately designed and reported. The purpose of this study is to assess current practices on the reporting of experimental design and statistical analyses in gene expression-based studies. METHODS: We reviewed hundreds of MEDLINE-indexed papers involving gene expression data analysis, which were published between 2003 and 2005. These papers were examined on the basis of their reporting of several factors, such as sample size, statistical power and software availability. RESULTS: Among the examined papers, we concentrated on 293 papers consisting of applications and new methodologies. These papers did not report approaches to sample size and statistical power estimation. Explicit statements on data transformation and descriptions of the normalisation techniques applied prior to data analyses (e.g. classification) were not reported in 57 (37.5%) and 104 (68.4%) of the methodology papers respectively. With regard to papers presenting biomedical-relevant applications, 41(29.1 %) of these papers did not report on data normalisation and 83 (58.9%) did not describe the normalisation technique applied. Clustering-based analysis, the t-test and ANOVA represent the most widely applied techniques in microarray data analysis. But remarkably, only 5 (3.5%) of the application papers included statements or references to assumption about variance homogeneity for the application of the t-test and ANOVA. There is still a need to promote the reporting of software packages applied or their availability. CONCLUSION: Recently-published gene expression data analysis studies may lack key information required for properly assessing their design quality and potential impact. There is a need for more rigorous reporting of important experimental factors such as statistical power and sample size, as well as the correct description and justification of statistical methods applied. This paper highlights the importance of defining a minimum set of information required for reporting on statistical design and analysis of expression data. By improving practices of statistical analysis reporting, the scientific community can facilitate quality assurance and peer-review processes, as well as the reproducibility of results.
Notes:
Nadia Bolshakova, F Azuaje (2006)  Estimating the number of clusters in DNA microarray data.   Methods Inf Med 45: 2. 153-157  
Abstract: OBJECTIVES: The main objective of the research is an application of the clustering and cluster validity methods to estimate the number of clusters in cancer tumor datasets. A weighed voting technique is going to be used to improve the prediction of the number of clusters based on different data mining techniques. These tools may be used for the identification of new tumour classes using DNA microarray datasets. This estimation approach may perform a useful tool to support biological and biomedical knowledge discovery. METHODS: Three clustering and two validation algorithms were applied to two cancer tumor datasets. Recent studies confirm that there is no universal pattern recognition and clustering model to predict molecular profiles across different datasets. Thus, it is useful not to rely on one single clustering or validation method, but to apply a variety of approaches. Therefore, combination of these methods may be successfully used for the estimation of the number of clusters. RESULTS: The methods implemented in this research may contribute to the validation of clustering results and the estimation of the number of clusters. The results show that this estimation approach may represent an effective tool to support biomedical knowledge discovery and healthcare applications. CONCLUSION: The methods implemented in this research may be successfully used for the estimation of the number of clusters. The methods implemented in this research may contribute to the validation of clustering results and the estimation of the number of clusters. These tools may be used for the identification of new tumour classes using gene expression profiles.
Notes:
Francisco Azuaje, Fatima Al-Shahrour, Joaquin Dopazo (2006)  Ontology-driven approaches to analyzing data in functional genomics.   Methods Mol Biol 316: 67-86  
Abstract: Ontologies are fundamental knowledge representations that provide not only standards for annotating and indexing biological information, but also the basis for implementing functional classification and interpretation models. This chapter discusses the application of gene ontology (GO) for predictive tasks in functional genomics. It focuses on the problem of analyzing functional patterns associated with gene products. This chapter is divided into two main parts. The first part overviews GO and its applications for the development of functional classification models. The second part presents two methods for the characterization of genomic information using GO. It discusses methods for measuring functional similarity of gene products, and a tool for supporting gene expression clustering analysis and validation.
Notes:
Haiying Wang, Huiru Zheng, David Simpson, Francisco Azuaje (2006)  Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data.   BMC Bioinformatics 7: 03  
Abstract: BACKGROUND: Retinal photoreceptors are highly specialised cells, which detect light and are central to mammalian vision. Many retinal diseases occur as a result of inherited dysfunction of the rod and cone photoreceptor cells. Development and maintenance of photoreceptors requires appropriate regulation of the many genes specifically or highly expressed in these cells. Over the last decades, different experimental approaches have been developed to identify photoreceptor enriched genes. Recent progress in RNA analysis technology has generated large amounts of gene expression data relevant to retinal development. This paper assesses a machine learning methodology for supporting the identification of photoreceptor enriched genes based on expression data. RESULTS: Based on the analysis of publicly-available gene expression data from the developing mouse retina generated by serial analysis of gene expression (SAGE), this paper presents a predictive methodology comprising several in silico models for detecting key complex features and relationships encoded in the data, which may be useful to distinguish genes in terms of their functional roles. In order to understand temporal patterns of photoreceptor gene expression during retinal development, a two-way cluster analysis was firstly performed. By clustering SAGE libraries, a hierarchical tree reflecting relationships between developmental stages was obtained. By clustering SAGE tags, a more comprehensive expression profile for photoreceptor cells was revealed. To demonstrate the usefulness of machine learning-based models in predicting functional associations from the SAGE data, three supervised classification models were compared. The results indicated that a relatively simple instance-based model (KStar model) performed significantly better than relatively more complex algorithms, e.g. neural networks. To deal with the problem of functional class imbalance occurring in the dataset, two data re-sampling techniques were studied. A random over-sampling method supported the implementation of the most powerful prediction models. The KStar model was also able to achieve higher predictive sensitivities and specificities using random over-sampling techniques. CONCLUSION: The approaches assessed in this paper represent an efficient and relatively inexpensive in silico methodology for supporting large-scale analysis of photoreceptor gene expression by SAGE. They may be applied as complementary methodologies to support functional predictions before implementing more comprehensive, experimental prediction and validation methods. They may also be combined with other large-scale, data-driven methods to facilitate the inference of transcriptional regulatory networks in the developing retina. Furthermore, the methodology assessed may be applied to other data domains.
Notes:
2005
F Azuaje (2005)  Integrative data analysis for functional prediction: a multi-objective optimization approach.   Bioinformatics 21: 9. 2099-2100 May  
Abstract: SUMMARY: An integrative classification system for functional genomics is introduced. A comparison with a previous study of the yeast mitochondrial proteome is presented. AVAILABILITY: A demonstration prototype, interSearch, is available on request. SUPPLEMENTARY INFORMATION: http://ijsr32.infj.ulster.ac.uk/~e10110731/interSearch.
Notes:
Nadia Bolshakova, Francisco Azuaje, Pádraig Cunningham (2005)  A knowledge-driven approach to cluster validity assessment.   Bioinformatics 21: 10. 2546-2547 May  
Abstract: This paper presents an approach to assessing cluster validity based on similarity knowledge extracted from the Gene Ontology. AVAILABILITY: The program is freely available for non-profit use on request from the authors.
Notes:
Kai-Bo Duan, Jagath C Rajapakse, Haiying Wang, Francisco Azuaje (2005)  Multiple SVM-RFE for gene selection in cancer classification with expression data.   IEEE Trans Nanobioscience 4: 3. 228-234 Sep  
Abstract: This paper proposes a new feature selection method that uses a backward elimination procedure similar to that implemented in support vector machine recursive feature elimination (SVM-RFE). Unlike the SVM-RFE method, at each step, the proposed approach computes the feature ranking score from a statistical analysis of weight vectors of multiple linear SVMs trained on subsamples of the original training data. We tested the proposed method on four gene expression datasets for cancer classification. The results show that the proposed feature selection method selects better gene subsets than the original SVM-RFE and improves the classification accuracy. A Gene Ontology-based similarity assessment indicates that the selected subsets are functionally diverse, further validating our gene selection method. This investigation also suggests that, for gene expression-based cancer classification, average test error from multiple partitions of training and test sets can be recommended as a reference of performance quality.
Notes:
Francisco Azuaje, Haiying Wang, Alban Chesneau (2005)  Non-linear mapping for exploratory data analysis in functional genomics.   BMC Bioinformatics 6: 01  
Abstract: BACKGROUND: Several supervised and unsupervised learning tools are available to classify functional genomics data. However, relatively less attention has been given to exploratory, visualisation-driven approaches. Such approaches should satisfy the following factors: Support for intuitive cluster visualisation, user-friendly and robust application, computational efficiency and generation of biologically meaningful outcomes. This research assesses a relaxation method for non-linear mapping that addresses these concerns. Its applications to gene expression and protein-protein interaction data analyses are investigated. RESULTS: Publicly available expression data originating from leukaemia, round blue-cell tumours and Parkinson disease studies were analysed. The method distinguished relevant clusters and critical analysis areas. The system does not require assumptions about the inherent class structure of the data, its mapping process is controlled by only one parameter and the resulting transformations offer intuitive, meaningful visual displays. Comparisons with traditional mapping models are presented. As a way of promoting potential, alternative applications of the methodology presented, an example of exploratory data analysis of interactome networks is illustrated. Data from the C. elegans interactome were analysed. Results suggest that this method might represent an effective solution for detecting key network hubs and for clustering biologically meaningful groups of proteins. CONCLUSION: A relaxation method for non-linear mapping provided the basis for visualisation-driven analyses using different types of data. This study indicates that such a system may represent a user-friendly and robust approach to exploratory data analysis. It may allow users to gain better insights into the underlying data structure, detect potential outliers and assess assumptions about the cluster composition of the data.
Notes:
Nadia Bolshakova, Francisco Azuaje, Pádraig Cunningham (2005)  An integrated tool for microarray data clustering and cluster validity assessment.   Bioinformatics 21: 4. 451-455 Feb  
Abstract: SUMMARY: In this paper we present a data mining system, which allows the application of different clustering and cluster validity algorithms for DNA microarray data. This tool may improve the quality of the data analysis results, and may support the prediction of the number of relevant clusters in the microarray datasets. This systematic evaluation approach may significantly aid genome expression analyses for knowledge discovery applications. The developed software system may be effectively used for clustering and validating not only DNA microarray expression analysis applications but also other biomedical and physical data with no limitations. AVAILABILITY: The program is freely available for non-profit use on request at http://www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html CONTACT: Nadia.Bolshakova@cs.tcd.ie.
Notes:
2004
Haiying Wang, Francisco Azuaje, Norman Black (2004)  An integrative and interactive framework for improving biomedical pattern discovery and visualization.   IEEE Trans Inf Technol Biomed 8: 1. 16-27 Mar  
Abstract: Recent progress in medical sciences has led to an explosive growth of data. Due to its inherent complexity and diversity, mining such volumes of data to extract relevant knowledge represents an enormous challenge and opportunity. Interactive pattern discovery and visualization systems for biomedical data mining have received relatively little attention. Emphasis has been traditionally placed on automation and supervised classification problems. Based on self-adaptive neural networks and pattern-validation statistical tools, this paper presents a user-friendly platform to support biomedical pattern discovery and visualization. It has been tested on several types of biomedical data, such as dermatology and cardiology data sets. The results indicate that in comparison to traditional techniques, such as Kohonen Maps, this platform may significantly improve the effectiveness and efficiency of pattern discovery and classification tasks, including problems described by several classes. Furthermore, this study shows how the combination of graphical and statistical tools may make these patterns more meaningful.
Notes:
2003
Haiying Wang, Francisco Azuaje, Benjamin Jung, Norman Black (2003)  A markup language for electrocardiogram data acquisition and analysis (ecgML).   BMC Med Inform Decis Mak 3: May  
Abstract: BACKGROUND: The storage and distribution of electrocardiogram data is based on different formats. There is a need to promote the development of standards for their exchange and analysis. Such models should be platform-/ system- and application-independent, flexible and open to every member of the scientific community. METHODS: A minimum set of information for the representation and storage of electrocardiogram signals has been synthesised from existing recommendations. This specification is encoded into an XML-vocabulary. The model may aid in a flexible exchange and analysis of electrocardiogram information. RESULTS: Based on advantages of XML technologies, ecgML has the ability to present a system-, application- and format-independent solution for representation and exchange of electrocardiogram data. The distinction between the proposal developed by the U.S Food and Drug Administration and ecgML model is given. A series of tools, which aim to facilitate ecgML-based applications, are presented. CONCLUSIONS: The models proposed here can facilitate the generation of a data format, which opens ways for better and clearer interpretation by both humans and machines. Its structured and transparent organisation will allow researchers to expand and test its capabilities in different application domains. The specification and programs for this protocol are publicly available.
Notes:
F Azuaje (2003)  A computational evolutionary approach to evolving game strategy and cooperation.   IEEE Trans Syst Man Cybern B Cybern 33: 3. 498-503  
Abstract: This paper describes an approach to the co-evolution of competing virtual creatures. Pairs of individuals enter one-on-one contests in which they contend to establish contact with a common resource. The individuals are subjected to competitive fitness functions. In each tournament one type of organism implements a game rule based on a set of basic cognitive capabilities. The second type of contestant genetically determines its game strategy. Interesting strategy patterns are identified when this evolutionary process is simulated on populations of competing individuals. These experiments show how cooperation emerges in order to improve both individual and collective game performances.
Notes:
Nadia Bolshakova, Francisco Azuaje (2003)  Machaon CVE: cluster validation for gene expression data.   Bioinformatics 19: 18. 2494-2495 Dec  
Abstract: SUMMARY: This paper presents a cluster validation tool for gene expression data. Machaon CVE (Clustering and Validation Environment) system aims to partition samples or genes into groups characterized by similar expression patterns, and to evaluate the quality of the clusters obtained. AVAILABILITY: The program is freely available for non-profit use on request at http://www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html SUPPLEMENTARY INFORMATION: http://www.cs.tcd.ie/Nadia.Bolshakova/Machaon.html
Notes:
Francisco Azuaje (2003)  Clustering-based approaches to discovering and visualising microarray data patterns.   Brief Bioinform 4: 1. 31-42 Mar  
Abstract: This article focuses on clustering techniques for the analysis of microarray data and discusses contributions and applications for the implementation of intelligent diagnostic systems and therapy design studies. Approaches to validating and visualising expression clustering results and software and other relevant resources to support clustering-based analyses are reviewed. Finally, this paper addresses current limitations and problems that need to be investigated for the development of an advanced generation of pattern discovery tools.
Notes:
Francisco Azuaje (2003)  Genomic data sampling and its effect on classification performance assessment.   BMC Bioinformatics 4: Jan  
Abstract: BACKGROUND: Supervised classification is fundamental in bioinformatics. Machine learning models, such as neural networks, have been applied to discover genes and expression patterns. This process is achieved by implementing training and test phases. In the training phase, a set of cases and their respective labels are used to build a classifier. During testing, the classifier is used to predict new cases. One approach to assessing its predictive quality is to estimate its accuracy during the test phase. Key limitations appear when dealing with small-data samples. This paper investigates the effect of data sampling techniques on the assessment of neural network classifiers. RESULTS: Three data sampling techniques were studied: Cross-validation, leave-one-out, and bootstrap. These methods are designed to reduce the bias and variance of small-sample estimations. Two prediction problems based on small-sample sets were considered: Classification of microarray data originating from a leukemia study and from small, round blue-cell tumours. A third problem, the prediction of splice-junctions, was analysed to perform comparisons. Different accuracy estimations were produced for each problem. The variations are accentuated in the small-data samples. The quality of the estimates depends on the number of train-test experiments and the amount of data used for training the networks. CONCLUSION: The predictive quality assessment of biomolecular data classifiers depends on the data size, sampling techniques and the number of train-test experiments. Conservative and optimistic accuracy estimations can be obtained by applying different methods. Guidelines are suggested to select a sampling technique according to the complexity of the prediction problem under consideration.
Notes:
2002
Francisco Azuaje (2002)  In silico approaches to microarray-based disease classification and gene function discovery.   Ann Med 34: 4. 299-305  
Abstract: The automated analysis of transcriptional profiling data promises a wealth of information that may be used to develop a more complete understanding of gene function and interactions. Moreover, it may improve the effectiveness of complex diagnostic tasks. This article discusses important data mining and management techniques to analyse genome-wide expression data. It reviews some of the major discovery goals, methods and applications in a number of biomedical domains. Finally, this paper highlights key problems that need to be approached by a new generation of computational solutions.
Notes:
Haiying Wang, Francisco Azuaje, Norman Black (2002)  Improving biomolecular pattern discovery and visualization with hybrid self-adaptive networks.   IEEE Trans Nanobioscience 1: 4. 146-166 Dec  
Abstract: There is an increasing need to develop powerful techniques to improve biomedical pattern discovery and visualization. This paper presents an automated approach, based on hybrid self-adaptive neural networks, to pattern identification and visualization for biomolecular data. The methods are tested on two datasets: leukemia expression data and DNA splice-junction sequences. Several supervised and unsupervised models are implemented and compared. A comprehensive evaluation study of some of their intrinsic mechanisms is presented. The results suggest that these tools may be useful to support biological knowledge discovery based on advanced classification and visualization tasks.
Notes:
F Azuaje (2002)  A cluster validity framework for genome expression data.   Bioinformatics 18: 2. 319-320 Feb  
Abstract: This paper presents a method for the assessment of expression cluster validity.
Notes:
2001
F Azuaje (2001)  A computational neural approach to support the discovery of gene function and classes of cancer.   IEEE Trans Biomed Eng 48: 3. 332-339 Mar  
Abstract: Advances in molecular classification of tumours may play a central role in cancer treatment. Here, a novel approach to genome expression pattern interpretation is described and applied to the recognition of B-cell malignancies as a test set. Using cDNA microarrays data generated by a previous study, a neural network model known as simplified fuzzy ARTMAP is able to identify normal and diffuse large B-cell lymphoma (DLBCL) patients. Furthermore, it discovers the distinction between patients with molecularly distinct forms of DLBCL without previous knowledge of those subtypes.
Notes:
2000
F Azuaje, W Dubitzky, N Black, K Adamson (2000)  Discovering relevance knowledge in data: a growing cell structures approach.   IEEE Trans Syst Man Cybern B Cybern 30: 3. 448-460  
Abstract: Both information retrieval and case-based reasoning systems rely on effective and efficient selection of relevant data. Typically, relevance in such systems is approximated by similarity or indexing models. However, the definition of what makes data items similar or how they should be indexed is often nontrivial and time-consuming. Based on growing cell structure artificial neural networks, this paper presents a method that automatically constructs a case retrieval model from existing data. Within the case-based reasoning (CBR) framework, the method is evaluated for two medical prognosis tasks, namely, colorectal cancer survival and coronary heart disease risk prognosis. The results of the experiments suggest that the proposed method is effective and robust. To gain a deeper insight and understanding of the underlying mechanisms of the proposed model, a detailed empirical analysis of the models structural and behavioral properties is also provided.
Notes:
1999
F Azuaje, W Dubitzky, P Lopes, N Black, K Adamson, X Wu, J A White (1999)  Predicting coronary disease risk based on short-term RR interval measurements: a neural network approach.   Artif Intell Med 15: 3. 275-297 Mar  
Abstract: Coronary heart disease is a multifactorial disease and it remains the most common cause of death in many countries. Heart rate variability has been used for non-invasive measurement of parasympathetic activity and prediction of cardiac death. Patterns of heart rate variability associated with respiratory sinus arrhythmia have recently been considered as possible indicators of coronary heart disease risk in asymptomatic subjects. The aim of this work is to detect individuals at varying risk of coronary heart disease based on short-term heart rate variability measurements under controlled respiration. Artificial neural networks are used to recognise Poincaré-plot-encoded heart rate variability patterns related to coronary heart disease risk. The results indicate a relatively coarse binary representation of Poincaré plots could be superior to an analogue encoding which, in principle, carries more information.
Notes:
F Azuaje, W Dubitzky, N Black, K Adamson (1999)  Improving clinical decision support through case-based data fusion.   IEEE Trans Biomed Eng 46: 10. 1181-1185 Oct  
Abstract: This paper presents an information fusion technique based on a knowledge discovery model, and the case-based reasoning decision framework. Using signal data and database records from the heart disease risk estimation domain, three data fusion methods are discussed. Two of these methods combine information at the retrieval-outcome level, and one method merges data at the discovery-input level. The result of these three models are compared and evaluated against the performance of single-source models. It is shown that the methods that fuse information at the retrieval-outcome level are significantly superior.
Notes:
1997

Book chapters

2006
2005
2003
2000

Conference papers

2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1995

Editorials

2008
G B Fogel, E Bullinger, S F Su, F Azuaje (2008)  Special section on machine intelligence approaches to systems biology.   IEEE Transactions on Systems, Man, and Cybernetics, Part B, Vol. 38 (1), pp. 2 – 4. [Editorials]  
Abstract: The three papers in this special section focus on machine intelligence approaches to systems biology.
Notes:
2004,
2003
2001

Peer-reviewed abstracts in journals and conferences

2010
2009
2008
2007
2006
2005
2004
2000
1998
1996
1995
1994

Book reviews and paper commentaries

2008
2006
2005
2004
2003
2002
Powered by PublicationsList.org.