hosted by
publicationslist.org
    

Chao Zhang


czhmu2010@gmail.com

Journal articles

Submitted
2012
Chao Zhang, Shunfu Xu, Dong Xu (2012)  Risk Assessment of Gastric Cancer Caused by Helicobacter pylori Using CagA Sequence Markers   PLoS ONE 7: 5. 05  
Abstract: <sec><title>Background</title><p>As a marker of <italic>Helicobacter pylori</italic>, Cytotoxin-associated gene A (cagA) has been revealed to be the major virulence factor causing gastroduodenal diseases. However, the molecular mechanisms that underlie the development of different gastroduodenal diseases caused by cagA-positive <italic>H. pylori</italic> infection remain unknown. Current studies are limited to the evaluation of the correlation between diseases and the number of Glu-Pro-Ile-Tyr-Ala (EPIYA) motifs in the CagA strain. To further understand the relationship between CagA sequence and its virulence to gastric cancer, we proposed a systematic entropy-based approach to identify the cancer-related residues in the intervening regions of CagA and employed a supervised machine learning method for cancer and non-cancer cases classification.</p></sec><sec><title>Methodology</title><p>An entropy-based calculation was used to detect key residues of CagA intervening sequences as the gastric cancer biomarker. For each residue, both combinatorial entropy and background entropy were calculated, and the entropy difference was used as the criterion for feature residue selection. The feature values were then fed into Support Vector Machines (SVM) with the Radial Basis Function (RBF) kernel, and two parameters were tuned to obtain the optimal F value by using grid search. Two other popular sequence classification methods, the BLAST and HMMER, were also applied to the same data for comparison.</p></sec><sec><title>Conclusion</title><p>Our method achieved 76% and 71% classification accuracy for Western and East Asian subtypes, respectively, which performed significantly better than BLAST and HMMER. This research indicates that small variations of amino acids in those important residues might lead to the virulence variance of CagA strains resulting in different gastroduodenal diseases. This study provides not only a useful tool to predict the correlation between the novel CagA strain and diseases, but also a general new framework for detecting biological sequence biomarkers in population studies.</p></sec>
Notes:
Chao Zhang, Kristina Hanspers, Allan Kuchinsky, Nathan Salomonis, Dong Xu, Alexander R Pico (2012)  Mosiac: Making Biological Sense of Complex Networks   BIOINFORMATICS  
Abstract: We present a Cytoscape plugin called Mosaic to support interactive network annotation, partitioning, layout and coloring based on Gene Ontology or other relevant annotations.
Notes:
Chao Zhang, Guolu Zheng, Shunfu Xu, Dong Xu (2012)  Computational challenges in characterization of bacteria and bacteria-host interactions based on genomic data   Journal of Computer Science and Technology 27: 2. 225-239 3  
Abstract: With the rapid development of next-generation sequencing technologies, bacterial identi¯cation becomes a very important and essential step in processing genomic data, especially for metagenomic data. Many computational methods have been developed and some of them are widely used to address the problems in bacterial identi¯cation. In this article we review the algorithms of these methods, discuss their drawbacks, and propose future computational methods that use genomic data to characterize bacteria. In addition, we tackle two speci¯c computational problems in bacterial identi¯cation, namely, the detection of host-speci¯c bacteria and the detection of disease-associated bacteria, by o®ering potential solutions as a starting point for those who are interested in the area.
Notes:
2011
Guan Lin, Chao Zhang, Dong Xu (2011)  Polytomy identification in microbial phylogenetic reconstruction   BMC Systems Biology 5: Suppl 3.  
Abstract: BACKGROUND:A phylogenetic tree, showing ancestral relations among organisms, is commonly represented as a rooted tree with sets of bifurcating branches (dichotomies) for simplicity, although polytomies (multifurcating branches) may reflect more accurate evolutionary relationships. To represent the true evolutionary relationships, it is important to systematically identify the polytomies from a bifurcating tree and generate a taxonomy-compatible multifurcating tree. For this purpose we propose a novel approach, "PolyPhy", which would classify a set of bifurcating branches of a phylogenetic tree into a set of branches with dichotomies and polytomies by considering genome distances among genomes and tree topological properties.RESULTS:PolyPhy employs a machine learning technique, BLR (Bayesian logistic regression) classifier, to identify possible bifurcating subtrees as polytomies from the trees resulted from ComPhy. Other than considering genome-scale distances between all pairs of species, PolyPhy also takes into account different properties of tree topology between dichotomy and polytomy, such as long-branch retraction and short-branch contraction, and quantifies these properties into comparable rates among different sub-branches. We extract three tree topological features, ’LR’ (Leaf rate), ’IntraR’ (Intra-subset branch rate) and ’InterR’ (Inter-subset branch rate), all of which are calculated from bifurcating tree branch sets for classification. We have achieved F-measure (balanced measure between precision and recall) of 81% with about 0.9 area under the curve (AUC) of ROC.CONCLUSIONS:PolyPhy is a fast and robust method to identify polytomies from phylogenetic trees based on genome-wide inference of evolutionary relationships among genomes. The software package and test data can be downloaded from http://digbio.missouri.edu/ComPhy/phyloTreeBiNonBi-1.0.zip webcite.
Notes:
2010
Shunfu Xu, Chao Zhang, Yi Miao, Jianjiong Gao, Dong Xu (2010)  Effector prediction in host-pathogen interaction based on a Markov model of a ubiquitous EPIYA motif.   BMC Genomics 11 Suppl 3: 12  
Abstract: Effector secretion is a common strategy of pathogen in mediating host-pathogen interaction. Eight EPIYA-motif containing effectors have recently been discovered in six pathogens. Once these effectors enter host cells through type III/IV secretion systems (T3SS/T4SS), tyrosine in the EPIYA motif is phosphorylated, which triggers effectors binding other proteins to manipulate host-cell functions. The objectives of this study are to evaluate the distribution pattern of EPIYA motif in broad biological species, to predict potential effectors with EPIYA motif, and to suggest roles and biological functions of potential effectors in host-pathogen interactions.
Notes:
2008
Lourdes Peña-Castillo, Murat Tasan, Chad L Myers, Hyunju Lee, Trupti Joshi, Chao Zhang, Yuanfang Guan, Michele Leone, Andrea Pagnani, Wan Kyu Kim, Chase Krumpelman, Weidong Tian, Guillaume Obozinski, Yanjun Qi, Sara Mostafavi, Guan Ning Lin, Gabriel F Berriz, Francis D Gibbons, Gert Lanckriet, Jian Qiu, Charles Grant, Zafer Barutcuoglu, David P Hill, David Warde-Farley, Chris Grouios, Debajyoti Ray, Judith A Blake, Minghua Deng, Michael I Jordan, William S Noble, Quaid Morris, Judith Klein-Seetharaman, Ziv Bar-Joseph, Ting Chen, Fengzhu Sun, Olga G Troyanskaya, Edward M Marcotte, Dong Xu, Timothy R Hughes, Frederick P Roth (2008)  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence.   Genome Biol 9 Suppl 1: 06  
Abstract: Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated.
Notes:
Trupti Joshi, Chao Zhang, Guan Ning Lin, Zhao Song, Dong Xu (2008)  GeneFAS: A tool for prediction of gene function using multiple sources of data.   Methods Mol Biol 439: 369-386  
Abstract: Characterizing gene function is one of the major challenging tasks in the postgenomic era. To address this challenge, we developed GeneFAS (gene function annotation system), a computer system with a graphical user interface for cellular function prediction by integrating information from protein-protein interactions, protein complexes, microarray gene expression profiles, and annotations of known proteins. GeneFAS can provide biologists a workspace for their organism of interest, to integrate different types of experimental data and annotation information, and facilitate biological discovery and hypothesis generation using all the information. It also provides testing and training capabilities for users to utilize and integrate their data more efficiently. GeneFAS is freely available for download at http://digbio.missouri.edu/genefas .
Notes:
Chao Zhang, Trupti Joshi, Guan Ning Lin, Dong Xu (2008)  An integrated probabilistic approach for gene function prediction using multiple sources of high-throughput data.   Int J Comput Biol Drug Des 1: 3. 254-274  
Abstract: Characterising gene function is one of the major challenging tasks in the post-genomic era. Various approaches have been developed to integrate multiple sources of high-throughput data to predict gene function. Most of those approaches are just used for research purpose and have not been implemented as publicly available tools. Even for those implemented applications, almost all of them are still web-based 'prediction servers' that have to be managed by specialists. This paper introduces a systematic method for integrating various sources of high-throughput data to predict gene function and analyse our prediction results and evaluates its performances based on the competition for mouse gene function prediction (MouseFunc). A stand-alone Java-based software package 'GeneFAS' is freely available at http://digbio. missouri.eduigenefas.
Notes:
2006
Zhao Song, Luonan Chen, Chao Zhang, Dong Xu (2006)  Design and implementation of probability-based scoring function for peptide mass fingerprinting protein identification.   Conf Proc IEEE Eng Med Biol Soc 1: 4556-4559  
Abstract: Protein identification through high-throughput mass spectrum data is an important domain in proteomics. Peptide mass fingerprinting (PMF) is one of the major methods for protein identification using the mass-spec technology. We developed a software package called "ProteinDecision" for PMF protein identification, together with a user-friendly graphical interface. "ProteinDecision" can handle the issues of selecting peaks from mass spectrum, transforming database format, displaying the top ranks of identification result, and detailed information for each ranking. We used a novel scoring function by considering the distribution of matching a mass-to-charge and peak intensity in a database based on the MOWSE table. Our new scoring function is assessed better than existing ones by comparing the computational results using experimental PMF data. A standalone version of "ProteinDecision" is freely available upon request.
Notes:

Book chapters

2009

Conference papers

2010
Powered by PublicationsList.org.