hosted by
publicationslist.org
    

Humberto González-Díaz

Humberto González-Díaz

Researcher of Program IPP, Xunta de Galicia
&
Prof. of Bioinformatics, Faculty of Pharmacy,
University of Santiago de Compostela, 15782, Santiago de Compostela, Spain.
gonzalezdiazh@yahoo.es

Journal articles

2009
Humberto González-Díaz, Lázaro G Pérez-Montoto, Aliuska Duardo-Sanchez, Esperanza Paniagua, Severo Vázquez-Prieto, R Vilas, Maria Auxiliadora Dea-Ayuela, Francisco Bolas-Fernández, Cristian R Munteanu, Julián Dorado, J Costas, Florencio M Ubeira (2009)  Generalized lattice graphs for 2D-visualization of biological information.   J Theor Biol 261: 1. 136-147 Nov  
Abstract: Several graph representations have been introduced for different data in theoretical biology. For instance, complex networks based on Graph theory are used to represent the structure and/or dynamics of different large biological systems such as protein-protein interaction networks. In addition, Randic, Liao, Nandy, Basak, and many others developed some special types of graph-based representations. This special type of graph includes geometrical constrains to node positioning in space and adopts final geometrical shapes that resemble lattice-like patterns. Lattice networks have been used to visually depict DNA and protein sequences but they are very flexible. However, despite the proved efficacy of new lattice-like graph/networks to represent diverse systems, most works focus on only one specific type of biological data. This work proposes a generalized type of lattice and illustrates how to use it in order to represent and compare biological data from different sources. We exemplify the following cases: protein sequence; mass spectra (MS) of protein peptide mass fingerprints (PMF); molecular dynamic trajectory (MDTs) from structural studies; mRNA microarray data; single nucleotide polymorphisms (SNPs); 1D or 2D-Electrophoresis study of protein polymorphisms and protein-research patent and/or copyright information. We used data available from public sources for some examples but for other, we used experimental results reported herein for the first time. This work may break new ground for the application of Graph theory in theoretical biology and other areas of biomedical sciences.
Notes:
Santiago Vilar, Humberto González-Díaz, Lourdes Santana, Eugenio Uriarte (2009)  A network-QSAR model for prediction of genetic-component biomarkers in human colorectal cancer.   J Theor Biol 261: 3. 449-458 Dec  
Abstract: The combination of the network theory and the calculation of topological indices (TIs) allow establishing relationships between the molecular structure of large molecules like the genes and proteins and their properties at a biological level. This type of models can be considered quantitative structure-activity relationships (QSAR) for biopolymers. In the present work a QSAR model is reported for proteins, related to human colorectal cancer (HCC) and codified by different genes that have been identified experimentally by Sjöblom et al. [2006. The consensus coding sequences of human breast and colorectal cancers. Science 314, 268-274] among more than 10000 human genes. The 69 proteins related to human colorectal cancer (HCCp) and a control group of 200 proteins not related to HCC (no-HCCp) were represented through an HP Lattice type Network. Starting from the generated graphs we calculate a set of descriptors of electrostatic potential type (xi(k)) that allow to establish, through a linear discriminant analysis (LDA), a QSAR model of relatively high percentage of good classification (higher than 80%) to differentiate between HCCp and no-HCCp proteins. The purpose of this study is helping to predict the possible implication of a certain gene and/or protein (biomarker) in the colorectal cancer. Different procedures of validation of the obtained model have been carried out in order to corroborate its stability, including cross-validation series (CV) and evaluation of an additional series of 200 no-HCCp. This biostatistic methodology could be applied to predict human colorectal cancer biomarkers and to understand much better the biological aspects of this disease.
Notes:
Francisco J Prado-Prado, Eugenio Uriarte, Fernanda Borges, Humberto González-Díaz (2009)  Multi-target spectral moments for QSAR and Complex Networks study of antibacterial drugs.   Eur J Med Chem 44: 11. 4516-4521 Nov  
Abstract: There are many of pathogen bacteria species which very different susceptibility profile to different antibacterial drugs. There are many drugs described with very different affinity to a large number of receptors. In this work, we selected Drug-Bacteria Pairs (DBPs) of affinity/non-affinity drugs with similar/dissimilar bacteria and represented it as a large network, which may be used to identify drugs that can act on bacteria. Computational chemistry prediction of the biological activity based on one-target Quantitative Structure-Activity Relationship (ot-QSAR) studies substantially increases the potentialities of this kind of networks avoiding time and resource consuming experiments. Unfortunately almost all ot-QSAR models predict the biological activity of drugs against only one bacterial species. Consequently, multi-tasking learning to predict drug's activity against different species with a single model (mt-QSAR) is a goal of major importance. These mt-QSARs offer a good opportunity to construct drug-drug similarity Complex Networks. Unfortunately, almost QSAR models are unspecific or predict activity against only one receptor. To solve this problem, we developed here a multi-bacteria QSAR classification model. The model correctly classifies 202 out of 241 active compounds (83.8%) and 169 out of 200 non-active cases (84.5%). Overall training predictability was 84.13% (371 out of 441 cases). The validation of the model was carried out by means of external predicting series, classifying the model 197 out of 221 (89.4%) cases. In order to show how the model functions in practice a virtual screening was carried out recognizing the model as active 86.7%, 520 out of 600 cases not used in training or predicting series. Outputs of this QSAR model were used as inputs to construct a network. The observed network has 1242 nodes (DBPs), 772,736 edges or DBPs with similar activity (sDBPs). The network predicted has 1031 nodes, 641,377 sDBPs. After edge-to-edge comparison, we have demonstrated that the predicted network is significantly similar to the observed one and both have distribution closer to exponential than to normal.
Notes:
Riccardo Concu, Maria A Dea-Ayuela, Lazaro G Perez-Montoto, Francisco Bolas-Fernández, Francisco J Prado-Prado, Gianni Podda, Eugenio Uriarte, Florencio M Ubeira, Humberto González-Díaz (2009)  Prediction of enzyme classes from 3D structure: a general model and examples of experimental-theoretic scoring of peptide mass fingerprints of Leishmania proteins.   J Proteome Res 8: 9. 4372-4382 Sep  
Abstract: The number of protein and peptide structures included in Protein Data Bank (PDB) and Gen Bank without functional annotation has increased. Consequently, there is a high demand for theoretical models to predict these functions. Here, we trained and validated, with an external set, a Markov Chain Model (MCM) that classifies proteins by their possible mechanism of action according to Enzyme Classification (EC) number. The methodology proposed is essentially new, and enables prediction of all EC classes with a single equation without the need for an equation for each class or nonlinear models with multiple outputs. In addition, the model may be used to predict whether one peptide presents a positive or negative contribution of the activity of the same EC class. The model predicts the first EC number for 106 out of 151 (70.2%) oxidoreductases, 178/178 (100%) transferases, 223/223 (100%) hydrolases, 64/85 (75.3%) lyases, 74/74 (100%) isomerases, and 100/100 (100%) ligases, as well as 745/811 (91.9%) nonenzymes. It is important to underline that this method may help us predict new enzyme proteins or select peptide candidates that improve enzyme activity, which may be of interest for the prediction of new drugs or drug targets. To illustrate the model's application, we report the 2D-Electrophoresis (2DE) isolation from Leishmania infantum as well as MADLI TOF Mass Spectra characterization and theoretical study of the Peptide Mass Fingerprints (PMFs) of a new protein sequence. The theoretical study focused on MASCOT, BLAST alignment, and alignment-free QSAR prediction of the contribution of 29 peptides found in the PMF of the new protein to specific enzyme action. This combined strategy may be used to identify and predict peptides of prokaryote and eukaryote parasites and their hosts as well as other superior organisms, which may be of interest in drug development or target identification.
Notes:
Ricardo Concu, Maria Auxiliadora Dea-Ayuela, Lázaro G Perez-Montoto, Francisco J Prado-Prado, Eugenio Uriarte, Francisco Bolás-Fernández, Gianni Podda, Alejandro Pazos, Cristian R Munteanu, Florencio M Ubeira, Humberto González-Díaz (2009)  3D entropy and moments prediction of enzyme classes and experimental-theoretic study of peptide fingerprints in Leishmania parasites.   Biochim Biophys Acta 1794: 12. 1784-1794 Dec  
Abstract: The number of protein 3D structures without function annotation in Protein Data Bank (PDB) has been steadily increased. This fact has led in turn to an increment of demand for theoretical models to give a quick characterization of these proteins. In this work, we present a new and fast Markov chain model (MCM) to predict the enzyme classification (EC) number. We used both linear discriminant analysis (LDA) and/or artificial neural networks (ANN) in order to compare linear vs. non-linear classifiers. The LDA model found is very simple (three variables) and at the same time is able to predict the first EC number with an overall accuracy of 79% for a data set of 4755 proteins (859 enzymes and 3896 non-enzymes) divided into both training and external validation series. In addition, the best non-linear ANN model is notably more complex but has an overall accuracy of 98.85%. It is important to emphasize that this method may help us to predict not only new enzyme proteins but also to select peptide candidates found on the peptide mass fingerprints (PMFs) of new proteins that may improve enzyme activity. In order to illustrate the use of the model in this regard, we first report the 2D electrophoresis (2DE) and MADLI-TOF mass spectra characterization of the PMF of a new possible malate dehydrogenase sequence from Leishmania infantum. Next, we used the models to predict the contribution to a specific enzyme action of 30 peptides found in the PMF of the new protein. We implemented the present model in a server at portal Bio-AIMS (http://miaja.tic.udc.es/Bio-AIMS/EnzClassPred.php). This free on-line tool is based on PHP/HTML/Python and MARCH-INSIDE routines. This combined strategy may be used to identify and predict peptides of prokaryote and eukaryote parasites and their hosts as well as other superior organisms, which may be of interest in drug development or target identification.
Notes:
Alcides Perez-Bello, Cristian R Munteanu, Florencio Ubeira, Alexandre Lopes De Magalhães, Eugenio Uriarte, Humberto González-Díaz (2009)  Alignment-free prediction of mycobacterial DNA promoters based on pseudo-folding lattice network or star-graph topological indices.   J Theor Biol 256: 458–466  
Abstract: The importance of the promoter sequences in the function regulation of several important mycobacterial pathogens creates the necessity to design simple and fast theoretical models that can predict them. This work proposes two DNA promoter QSAR models based on pseudo-folding lattice network (LN) and star-graphs (SG) topological indices. In addition, a comparative study with the previous RNA electrostatic parameters of thermodynamically-driven secondary structure folding representations has been carried out. The best model of this work was obtained with only two LN stochastic electrostatic potentials and it is characterized by accuracy, selectivity and specificity of 90.87%, 82.96% and 92.95%, respectively. In addition, we pointed out the SG result dependence on the DNA sequence codification and we proposed a QSAR model based on codons and only three SG spectral moments.
Notes:
Isela García, Cristian R Munteanu, Yagamare Fall, Generosa Gómez, Eugenio Uriarte, Humberto González-Díaz (2009)  QSAR and complex network study of the chiral HMGR inhibitor structural diversity.   Bioorg Med Chem 17: 165–175  
Abstract: Efficient drugs such as statins or mevinic acids are inhibitors of the rate-limiting enzyme of cholesterol biosynthesis, 3-hydroxy-3-methyl-glutaryl coenzyme A reductase (HMGR), an enzyme responsible for the double reduction of 3-hydroxy-3-methyl-glutaryl coenzyme A into mevalonic acid. These compounds promoted the synthesis and evaluation of new inhibitors for HMGR, named HMGRIs. The high number of possible candidates creates the necessity of Quantitative Structure-Activity Relationship models in order to guide the HMGRI synthesis. There are two main problems of the reported QSAR models: the homogeneous series of the compounds and the chirality of many candidates. In this work, we propose for the first time a QSAR model for a very large and heterogeneous series of HMGRIs. The model is based on the Topological Indices (TIs) of molecular structures. Using the predictions of this model as input, we construct the first complex network that describes the drug-drug similarity relationships for more than 1600 experimentally non-explored chiral HMGRIs isomers. We also presented a reduced version of this network (Giant Component) that contains the most representative set of chiral HMGRI candidates. The work suggests a new mixed application in the QSAR study of relevant aspects of structural diversity by using chiral/non-chiral TIs, combined with complex networks.
Notes:
Francisco J Prado-Prado, Fernanda Borges, Eugenio Uriarte, Lazaro G Peréz-Montoto, Humberto González-Díaz (2009)  Multi-target spectral moment: QSAR for antiviral drugs vs. different viral species.   Anal Chim Acta 651: 2. 159-164 Oct  
Abstract: The antiviral QSAR models have an important limitation today. They predict the biological activity of drugs against only one viral species. This is determined by the fact that most of the current reported molecular descriptors encode only information about the molecular structure. As a result, predicting the probability with which a drug is active against different viral species with a single unifying model is a goal of major importance. In this work, we use Markov Chain theory to calculate new multi-target spectral moments to fit a QSAR model for drugs active against 40 viral species. The model is based on 500 drugs (including active and non-active compounds) tested as antiviral agents in the recent literature; not all drugs were predicted against all viruses, but only those with experimental values. The database also contains 207 well-known compounds (not as recent as the previous ones) reported in the Merck Index with other activities that do not include antiviral action against any virus species. We used Linear Discriminant Analysis (LDA) to classify all these drugs into two classes as active or non-active against the different viral species tested, whose data we processed. The model correctly classifies 5129 out of 5594 non-active compounds (91.69%) and 412 out of 422 active compounds (97.63%). Overall training predictability was 92.34%. The validation of the model was carried out by means of external predicting series, the model classifying, thus, 2568 out of 2779 non-active compounds and 224 out of 229 active compounds. Overall training predictability was 92.82%. The present work reports the first attempts to calculate within a unified framework the probabilities of antiviral drugs against different virus species based on a spectral moment analysis.
Notes:
Cristian R Munteanu, José M Vázquez, Julián Dorado, Alejandro Pazos Sierra, Angeles Sánchez-González, Francisco J Prado-Prado, Humberto González-Díaz (2009)  Complex network spectral moments for ATCUN motif DNA cleavage: first predictive study on proteins of human pathogen parasites.   J Proteome Res 8: 11. 5219-5228 Nov  
Abstract: The development of methods that can predict the metal-mediated biological activity based only on the 3D structure of metal-unbound proteins has become a goal of major importance. This work is dedicated to the amino terminal Cu(II)- and Ni(II)-binding (ATCUN) motifs that participate in the DNA cleavage and have antitumor activity. We have calculated herein, for the first time, the 3D electrostatic spectral moments for 415 different proteins, including 133 potential ATCUN antitumor proteins. Using these parameters as input for Linear Discriminant Analysis, we have found a model that discriminates between ATCUN-DNA cleavage proteins and nonactive proteins with 91.32% Accuracy (379 out of 415 of proteins including both training and external validation series). Finally, the model has predicted for the first time the DNA cleavage function of proteins from the pathogen parasites. We have predicted possible ATCUN-like proteins with a probability higher than 99% in nine parasite families such as Trypanosoma, Plasmodium, Leishmania, or Toxoplasma. The distribution by biological function of the ATCUN proteins predicted has been the following: oxidoreductases 70.5%, signaling proteins 62.5%, lyases 58.2%, membrane proteins 45.5%, ligases 44.4%, hydrolases 41.3%, transferases 39.2%, cell adhesion proteins 34.5%, metal binders 33.5%, translation proteins 25.0%, transporters 16.7%, structural proteins 9.1%, and isomerases 8.2%. The model is implemented at http://miaja.tic.udc.es/Bio-AIMS/ATCUNPred.php.
Notes:
Dolores Viña, Eugenio Uriarte, Francisco Orallo, Humberto González-Díaz (2009)  Alignment-free prediction of a drug-target complex network based on parameters of drug connectivity and protein sequence of receptors.   Mol Pharm 6: 3. 825-835 May/Jun  
Abstract: There are many drugs described with very different affinity to a large number of receptors. In this work, we selected drug-receptor pairs (DRPs) of affinity/nonaffinity drugs to similar/dissimilar receptors and we represented them as a large network, which may be used to identify drugs that can act on a receptor. Computational chemistry prediction of the biological activity based on quantitative structure-activity relationships (QSAR) substantially increases the potentialities of this kind of networks avoiding time- and resource-consuming experiments. Unfortunately, most QSAR models are unspecific or predict activity against only one receptor. To solve this problem, we developed here a multitarget QSAR (mt-QSAR) classification model. Overall model classification accuracy was 72.25% (1390/1924 compounds) in training, 72.28% (459/635) in cross-validation. Outputs of this mt-QSAR model were used as inputs to construct a network. The observed network has 1735 nodes (DRPs), 1754 edges or pairs of DRPs with similar drug-target affinity (sPDRPs), and low coverage density d = 0.12%. The predicted network has 1735 DRPs, 1857 sPDRPs, and also low coverage density d = 0.12%. After an edge-to-edge comparison (chi-square = 9420.3; p < 0.005), we have demonstrated that the predicted network is significantly similar to the one observed and both have a distribution closer to exponential than to normal.
Notes:
Cristian Robert Munteanu, Alexandre L Magalhães, Eugenio Uriarte, Humberto González-Díaz (2009)  Multi-target QPDR classification model for human breast and colon cancer-related proteins using star graph topological indices.   J Theor Biol 257: 2. 303-311 Mar  
Abstract: The cancer diagnostic is a complex process and, sometimes, the specific markers can interfere or produce negative results. Thus, new simple and fast theoretical models are required. One option is the complex network graphs theory that permits us to describe any real system, from the small molecules to the complex genetic, neural or social networks by transforming real properties in topological indices. This work converts the protein primary structure data in specific Randic's star networks topological indices using the new sequence to star networks (S2SNet) application. A set of 1054 proteins were selected from previous works and contains proteins related or not with two types of cancer, human breast cancer (HBC) and human colon cancer (HCC). The general discriminant analysis method generates an input-coded multi-target classification model with the training/predicting set accuracies of 90.0% for the forward stepwise model type. In addition, a protein subset was modified by single amino acid mutations with higher log-odds PAM250 values and tested with the new classification if can be related with HBC or HCC. In conclusion, we shown that, using simple input data such is the primary protein sequence and the simples linear analysis, it is possible to obtain accurate classification models that can predict if a new protein related with two types of cancer. These results promote the use of the S2SNet in clinical proteomics.
Notes:
Francisco J Prado-Prado, Octavio Martinez de la Vega, Eugenio Uriarte, Florencio M Ubeira, Kuo-Chen Chou, Humberto González-Díaz (2009)  Unified QSAR approach to antimicrobials. 4. Multi-target QSAR modeling and comparative multi-distance study of the giant components of antiviral drug-drug complex networks.   Bioorg Med Chem 17: 2. 569-575 Jan  
Abstract: One limitation of almost all antiviral Quantitative Structure-Activity Relationships (QSAR) models is that they predict the biological activity of drugs against only one species of virus. Consequently, the development of multi-tasking QSAR models (mt-QSAR) to predict drugs activity against different species of virus is of the major vitally important. These mt-QSARs offer also a good opportunity to construct drug-drug Complex Networks (CNs) that can be used to explore large and complex drug-viral species databases. It is known that in very large CNs we can use the Giant Component (GC) as a representative sub-set of nodes (drugs) and but the drug-drug similarity function selected may strongly determines the final network obtained. In the three previous works of the present series we reported mt-QSAR models to predict the antimicrobial activity against different fungi [Gonzalez-Diaz, H.; Prado-Prado, F. J.; Santana, L.; Uriarte, E. Bioorg.Med.Chem.2006, 14, 5973], bacteria [Prado-Prado, F. J.; Gonzalez-Diaz, H.; Santana, L.; Uriarte E. Bioorg.Med.Chem.2007, 15, 897] or parasite species [Prado-Prado, F.J.; González-Díaz, H.; Martinez de la Vega, O.; Ubeira, F.M.; Chou K.C. Bioorg.Med.Chem.2008, 16, 5871]. However, including these works, we do not found any report of mt-QSAR models for antivirals drug, or a comparative study of the different GC extracted from drug-drug CNs based on different similarity functions. In this work, we used Linear Discriminant Analysis (LDA) to fit a mt-QSAR model that classify 600 drugs as active or non-active against the 41 different tested species of virus. The model correctly classifies 143 of 169 active compounds (specificity=84.62%) and 119 of 139 non-active compounds (sensitivity=85.61%) and presents overall training accuracy of 85.1% (262 of 308 cases). Validation of the model was carried out by means of external predicting series, classifying the model 466 of 514, 90.7% of compounds. In order to illustrate the performance of the model in practice, we develop a virtual screening recognizing the model as active 92.7%, 102 of 110 antivirus compounds. These compounds were never use in training or predicting series. Next, we obtained and compared the topology of the CNs and their respective GCs based on Euclidean, Manhattan, Chebychey, Pearson and other similarity measures. The GC of the Manhattan network showed the more interesting features for drug-drug similarity search. We also give the procedure for the construction of Back-Projection Maps for the contribution of each drug sub-structure to the antiviral activity against different species.
Notes:
Francisco J Prado-Prado, Fernanda Borges, Lazaro G Perez-Montoto, Humberto González-Díaz (2009)  Multi-target spectral moment: QSAR for antifungal drugs vs. different fungi species.   Eur J Med Chem 44: 10. 4051-4056 Oct  
Abstract: The most important limitation of antifungal QSAR models is that they predict the biological activity of drugs against only one fungal species. This is determined due the fact that most of the up-to-date reported molecular descriptors encode only information about the molecular structure. Consequently, predicting the probability with which a drug is active against different fungal species with a single unifying model is a goal of major importance. Herein, we use the Markov Chain theory to calculate new multi-target spectral moments to fit a QSAR model that predicts the antifungal activity of more than 280 drugs against 90 fungi species. Linear discriminant analysis (LDA) was used to classify drugs into two classes as active or non-active against the different tested fungal species whose data we processed. The model correctly classifies 12 434 out of 12 566 non-active compounds (98.95%) and 421 out of 468 active compounds (89.96%). Overall training predictability was 98.63%. Validation of the model was carried out by means of external predicting series, the model classifying, thus, 6216 out of 6277 non-active compounds and 215 out of 239 active compounds. Overall training predictability was 98.7%. The present is the first attempt to calculate, within a unifying framework, the probabilities of antifungal action of drugs against many different species based on spectral moment's analysis.
Notes:
Lázaro G Pérez-Montoto, Lourdes Santana, Humberto González-Díaz (2009)  Scoring function for DNA-drug docking of anticancer and antiparasitic compounds based on spectral moments of 2D lattice graphs for molecular dynamics trajectories.   Eur J Med Chem 44: 11. 4461-4469 Nov  
Abstract: We introduce here a new class of invariants for MD trajectories based on the spectral moments pi(k)(L) of the Markov matrix associated to lattice network-like (LN) graph representations of Molecular Dynamics (MD) trajectories. The procedure embeds the MD energy profiles on a 2D Cartesian coordinates system using simple heuristic rules. At the same time, we associate the LN with a Markov matrix that describes the probabilities of passing from one state to other in the new 2D space. We construct this type of LNs for 422 MD trajectories obtained in DNA-drug docking experiments of 57 furocoumarins. The combined use of psoralens+ultraviolet light (UVA) radiation is known as PUVA therapy. PUVA is effective in the treatment of skin diseases such as psoriasis and mycosis fungoides. PUVA is also useful to treat human platelet (PTL) concentrates in order to eliminate Leishmania spp. and Trypanosoma cruzi. Both are parasites that cause Leishmaniosis (a dangerous skin and visceral disease) and Chagas disease, respectively; and may circulate in blood products collected from infected donors. We included in this study both lineal (psoralens) and angular (angelicins) furocoumarins. In the study, we grouped the LNs on two sets; set1: DNA-drug complex MD trajectories for active compounds and set2: MD trajectories of non-active compounds or no-optimal MD trajectories of active compounds. We calculated the respective pi(k)(L) values for all these LNs and used them as inputs to train a new classifier that discriminate set1 from set2 cases. In training series the model correctly classifies 79 out of 80 (specificity=98.75%) set1 and 226 out of 238 (Sensitivity=94.96%) set2 trajectories. In independent validation series the model correctly classifies 26 out of 26 (specificity=100%) set1 and 75 out of 78 (sensitivity=96.15%) set2 trajectories. We propose this new model as a scoring function to guide DNA-docking studies in the drug design of new coumarins for anticancer or antiparasitic PUVA therapy.
Notes:
Prado-Prado, Ubeira, Borges, González-Díaz (2009)  Unified QSAR & network-based computational chemistry approach to antimicrobials. II. Multiple distance and triadic census analysis of antiparasitic drugs complex networks.   J Comput Chem May  
Abstract: In the previous work, we reported a multitarget Quantitative Structure-Activity Relationship (mt-QSAR) model to predict drug activity against different fungal species. This mt-QSAR allowed us to construct a drug-drug multispecies Complex Network (msCN) to investigate drug-drug similarity (González-Díaz and Prado-Prado, J Comput Chem 2008, 29, 656). However, important methodological points remained unclear, such as follows: (1) the accuracy of the methods when applied to other problems; (2) the effect of the distance type used to construct the msCN; (3) how to perform the inverse procedure to study species-species similarity with multidrug resistance CNs (mdrCN); and (4) the implications and necessary steps to perform a substructural Triadic Census Analysis (TCA) of the msCN. To continue the present series with other important problem, we developed here a mt-QSAR model for more than 700 drugs tested in the literature against different parasites (predicting antiparasitic drugs). The data were processed by Linear Discriminate Analysis (LDA) and the model classifies correctly 93.62% (1160 out of 1239 cases) in training. The model validation was carried out by means of external predicting series; the model classified 573 out of 607, that is, 94.4% of cases. Next, we carried out the first comparative study of the topology of six different drug-drug msCNs based on six different distances such as Euclidean, Chebychev, Manhattan, etc. Furthermore, we compared the selected drug-drug msCN and species-species mdsCN with random networks. We also introduced here the inverse methodology to construct species-species msCN based on a mt-QSAR model. Last, we reported the first substructural analysis of drug-drug msCN using Triadic Census Analysis (TCA) algorithm. (c) 2009 Wiley Periodicals, Inc. J Comput Chem 2009.
Notes:
Guillermín Agüero-Chapin, Javier Varona-Santos, Gustavo A de la Riva, Agostinho Antunes, Tomás González-Vlla, Eugenio Uriarte, Humberto González-Díaz (2009)  Alignment-free prediction of polygalacturonases with pseudofolding topological indices: experimental isolation from Coffea arabica and prediction of a new sequence.   J Proteome Res 8: 4. 2122-2128 Apr  
Abstract: Polygalacturonases (PGs) have called the attention of microbiology scientists and biotechnology or pharmaceutical industry because they are protein enzymes relevant to phytopathogens invasion, fruit ripening, and potential antimicrobial drug targets. Numeric Topological Indices (TIs) of protein pseudofolding lattices can be used as input for classification algorithms in Quantitative Structure-Activity Relationship (OSAR) studies. However, a comparative study of different OSAR models for PGs has not been reported. In this study, we calculated for the first time two classes of TIs (Spectral moments (pik) and Entropy (thetak) values) for the Markov matrices associated to pseudofolding lattices of 108 PGs and 100 non-PGs heterogeneous proteins. Afterward, we developed different linear classifiers based on Linear Discriminant Analysis (LDA) and four types of nonlinear Artificial Neural Networks (ANN). The pik-LDA model correctly classified 98.8% of PGs and 100% non-PGs used to train the model, as well as 98.1% of all sequences used as external validation series. The rk-LDA model was the more accurate and/or simpler found. In addition, we report for the first time the experimental isolation and successful prediction of a new PG sequence from Coffea arabica. This sequence was deposited in the GenBank by our group with accession number GDQ336394. The present type of models are an interesting alignment-free complement to alignment-based procedures.
Notes:
2008
Humberto González-Díaz, Francisco Prado-Prado, Florencio M Ubeira (2008)  Predicting Antimicrobial Drugs and Targets with the MARCH-INSIDE Approach   Curr Top Med Chem 8: 18. 1676-1690 Nov  
Abstract: The method MARCH-INSIDE (MARkovian CHemicals IN SIlico DEsign) is a simple but efficient computational approach to the study of Quantitative Structure-Activity Relationships (QSAR) in Medicinal Chemistry. The method uses the theory of Markov Chains to generate parameters that numerically describe the chemical structure of drugs and drug targets. This approach generates two principal types of parameters Stochastic Topological Indices (sto-TIs) and stochastic 3D-Topographic Indices (sto-TPGIs). The use of these parameters allows the rapid collection, annotation, retrieval, comparison and mining of molecular and macromolecular chemical structures within large databases. In the work described here, we review and comment on the several applications of MARCH-INSIDE to the Medicinal Chemistry of Antimicrobial agents as well as their molecular targets. First we revised the use of classic sto-TIs to predict antiparasite compounds for the treatment of Fascioliasis. Next, we revised the use of chiral sto-TIs (sto-CTIs) to predict new antibacterial, antiviral and anti-coccidial compounds. After that, we review multi-target sto-TIs (mt-sto-TIs), which unifying QSAR models predicting antifungal, antibacterial, or anti-parasite drugs with multiple targets (microbial species). We also discussed the uses of mt-sto-TIs to assemble drug-drug similarity Complex Networks of antimicrobial compounds based on molecular structure. Last, we review the use of MARCH-INSIDE to generate macromolecular TIs and TPGIs for proteins or RNA targets for antimicrobial drugs.
Notes:
Santiago Vilar, Humberto González-Díaz, Lourdes Santana, Eugenio Uriarte (2008)  QSAR model for alignment-free prediction of human breast cancer biomarkers based on electrostatic potentials of protein pseudofolding HP-lattice networks.   J Comput Chem 29: 16. 2613-2622 May  
Abstract: Network theory allows relationships to be established between numerical parameters that describe the molecular structure of genes and proteins and their biological properties. These models can be considered as quantitative structure-activity relationships (QSAR) for biopolymers. The work described here concerns the first QSAR model for 122 proteins that are associated with human breast cancer (HBC), as identified experimentally by Sjöblom et al. (Science 2006, 314, 268) from over 10,000 human proteins. In this study, the 122 proteins related to HBC (HBCp) and a control group of 200 proteins that are not related to HBC (non-HBCp) were forced to fold in an HP lattice network. From these networks a series of electrostatic potential parameters (xi(k)) was calculated to describe each protein numerically. The use of xi(k) as an entry point to linear discriminant analysis led to a QSAR model to discriminate between HBCp and non-HBCp, and this model could help to predict the involvement of a certain gene and/or protein in HBC. In addition, validation procedures were carried out on the model and these included an external prediction series and evaluation of an additional series of 1000 non-HBCp. In all cases good levels of classification were obtained with values above 80%. This study represents the first example of a QSAR model for the computational chemistry inspired search of potential HBC protein biomarkers. (c) 2008 Wiley Periodicals, Inc. J Comput Chem 2008.
Notes:
Francisco J Prado-Prado, Humberto González-Díaz, Octavio Martinez de la Vega, Florencio M Ubeira, Kuo-Chen Chou (2008)  Unified QSAR approach to antimicrobials. Part 3: first multi-tasking QSAR model for input-coded prediction, structural back-projection, and complex networks clustering of antiprotozoal compounds.   Bioorg Med Chem 16: 11. 5871-5880 Jun  
Abstract: Several pathogen parasite species show different susceptibilities to different antiparasite drugs. Unfortunately, almost all structure-based methods are one-task or one-target Quantitative Structure-Activity Relationships (ot-QSAR) that predict the biological activity of drugs against only one parasite species. Consequently, multi-tasking learning to predict drugs activity against different species by a single model (mt-QSAR) is vitally important. In the two previous works of the present series we reported two single mt-QSAR models in order to predict the antimicrobial activity against different fungal (Bioorg. Med. Chem.2006, 14, 5973-5980) or bacterial species (Bioorg. Med. Chem.2007, 15, 897-902). These mt-QSARs offer a good opportunity (unpractical with ot-QSAR) to construct drug-drug similarity Complex Networks and to map the contribution of sub-structures to function for multiple species. These possibilities were unattended in our previous works. In the present work, we continue this series toward other important direction of chemotherapy (antiparasite drugs) with the development of an mt-QSAR for more than 500 drugs tested in the literature against different parasites. The data were processed by Linear Discriminant Analysis (LDA) classifying drugs as active or non-active against the different tested parasite species. The model correctly classifies 212 out of 244 (87.0%) cases in training series and 207 out of 243 compounds (85.4%) in external validation series. In order to illustrate the performance of the QSAR for the selection of active drugs we carried out an additional virtual screening of antiparasite compounds not used in training or predicting series; the model recognized 97 out of 114 (85.1%) of them. We also give the procedures to construct back-projection maps and to calculate sub-structures contribution to the biological activity. Finally, we used the outputs of the QSAR to construct, by the first time, a multi-species Complex Networks of antiparasite drugs. The network predicted has 380 nodes (compounds), 634 edges (pairs of compounds with similar activity). This network allows us to cluster different compounds and identify on average three known compounds similar to a new query compound according to their profile of biological activity. This is the first attempt to calculate probabilities of antiparasitic action of drugs against different parasites.
Notes:
Cristian Robert Munteanu, Humberto González-Díaz, Alexandre L Magalhães (2008)  Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices.   J Theor Biol 254: 2. 476-482 Sep  
Abstract: The huge amount of new proteins that need a fast enzymatic activity characterization creates demands of protein QSAR theoretical models. The protein parameters that can be used for an enzyme/non-enzyme classification includes the simpler indices such as composition, sequence and connectivity, also called topological indices (TIs) and the computationally expensive 3D descriptors. A comparison of the 3D versus lower dimension indices has not been reported with respect to the power of discrimination of proteins according to enzyme action. A set of 966 proteins (enzymes and non-enzymes) whose structural characteristics are provided by PDB/DSSP files was analyzed with Python/Biopython scripts, STATISTICA and Weka. The list of indices includes, but it is not restricted to pure composition indices (residue fractions), DSSP secondary structure protein composition and 3D indices (surface and access). We also used mixed indices such as composition-sequence indices (Chou's pseudo-amino acid compositions or coupling numbers), 3D-composition (surface fractions) and DSSP secondary structure amino acid composition/propensities (obtained with our Prot-2S Web tool). In addition, we extend and test for the first time several classic TIs for the Randic's protein sequence Star graphs using our Sequence to Star Graph (S2SG) Python application. All the indices were processed with general discriminant analysis models (GDA), neural networks (NN) and machine learning (ML) methods and the results are presented versus complexity, average of Shannon's information entropy (Sh) and data/method type. This study compares for the first time all these classes of indices to assess the ratios between model accuracy and indices/model complexity in enzyme/non-enzyme discrimination. The use of different methods and complexity of data shows that one cannot establish a direct relation between the complexity and the accuracy of the model.
Notes:
Cristian Robert Munteanu, Humberto González-Díaz, Fernanda Borges, Alexandre Lopes de Magalhães (2008)  Natural/random protein classification models based on star network topological indices.   J Theor Biol 254: 4. 775-783 Oct  
Abstract: The development of the complex network graphs permits us to describe any real system such as social, neural, computer or genetic networks by transforming real properties in topological indices (TIs). This work uses Randic's star networks in order to convert the protein primary structure data in specific topological indices that are used to construct a natural/random protein classification model. The set of natural proteins contains 1046 protein chains selected from the pre-compiled CulledPDB list from PISCES Dunbrack's Web Lab. This set is characterized by a protein homology of 20%, a structure resolution of 1.6A and R-factor lower than 25%. The set of random amino acid chains contains 1046 sequences which were generated by Python script according to the same type of residues and average chain length found in the natural set. A new Sequence to Star Networks (S2SNet) wxPython GUI application (with a Graphviz graphics back-end) was designed by our group in order to transform any character sequence in the following star network topological indices: Shannon entropy of Markov matrices, trace of connectivity matrices, Harary number, Wiener index, Gutman index, Schultz index, Moreau-Broto indices, Balaban distance connectivity index, Kier-Hall connectivity indices and Randic connectivity index. The model was constructed with the General Discriminant Analysis methods from STATISTICA package and gave training/predicting set accuracies of 90.77% for the forward stepwise model type. In conclusion, this study extends for the first time the classical TIs to protein star network TIs by proposing a model that can predict if a protein/fragment of protein is natural or random using only the amino acid sequence data. This classification can be used in the studies of the protein functions by changing some fragments with random amino acid sequences or to detect the fake amino acid sequences or the errors in proteins. These results promote the use of the S2SNet application not only for protein structure analysis but also for mass spectroscopy, clinical proteomics and imaging, or DNA/RNA structure analysis.
Notes:
María Auxiliadora Dea-Ayuela, Yunierkis Pérez-Castillo, Alfredo Meneses-Marcel, Florencio M Ubeira, Francisco Bolas-Fernández, Kuo-Chen Chou, Humberto González-Díaz (2008)  HP-Lattice QSAR for dynein proteins: experimental proteomics (2D-electrophoresis, mass spectrometry) and theoretic study of a Leishmania infantum sequence.   Bioorg Med Chem 16: 16. 7770-7776 Aug  
Abstract: The toxicity and inefficacy of actual organic drugs against Leishmaniosis justify research projects to find new molecular targets in Leishmania species including Leishmania infantum (L. infantum) and Leishmaniamajor (L. major), both important pathogens. In this sense, quantitative structure-activity relationship (QSAR) methods, which are very useful in Bioorganic and Medicinal Chemistry to discover small-sized drugs, may help to identify not only new drugs but also new drug targets, if we apply them to proteins. Dyneins are important proteins of these parasites governing fundamental processes such as cilia and flagella motion, nuclear migration, organization of the mitotic splinde, and chromosome separation during mitosis. However, despite the interest for them as potential drug targets, so far there has been no report whatsoever on dyneins with QSAR techniques. To the best of our knowledge, we report here the first QSAR for dynein proteins. We used as input the Spectral Moments of a Markov matrix associated to the HP-Lattice Network of the protein sequence. The data contain 411 protein sequences of different species selected by ClustalX to develop a QSAR that correctly discriminates on average between 92.75% and 92.51% of dyneins and other proteins in four different train and cross-validation datasets. We also report a combined experimental and theoretic study of a new dynein sequence in order to illustrate the utility of the model to search for potential drug targets with a practical example. First, we carried out a 2D-electrophoresis analysis of L. infantum biological samples. Next, we excised from 2D-E gels one spot of interest belonging to an unknown protein or protein fragment in the region M<20,200 and pI<4. We used MASCOT search engine to find proteins in the L. major data base with the highest similarity score to the MS of the protein isolated from L. infantum. We used the QSAR model to predict the new sequence as dynein with probability of 99.99% without relying upon alignment. In order to confirm the previous function annotation we predicted the sequences as dynein with BLAST and the omniBLAST tools (96% alignment similarity to dyneins of other species). Using this combined strategy, we have successfully identified L. infantum protein containing dynein heavy chain, and illustrated the potential use of the QSAR model as a complement to alignment tools.
Notes:
Maykel Cruz-Monteagudo, Cristian R Munteanu, Fernanda Borges, M Natália D S Cordeiro, Eugenio Uriarte, Kuo-Chen Chou, Humberto González-Díaz (2008)  Stochastic molecular descriptors for polymers. 4. Study of complex mixtures with topological indices of mass spectra spiral and star networks: The blood proteome case   Polymer 49: 25. 5575–5587 Oct  
Abstract: The Quantitative StructureâProperty Relationships (QSPRs) based on Graph or Network Theory are important for predicting the properties of polymeric systems. In the three previous papers of this series (Polymer 45 (2004) 3845â3853; Polymer 46 (2005) 2791â2798; and Polymer 46 (2005) 6461â6473) we focused on the uses of molecular graph parameters called topological indices (TIs) to link the structure of polymers with their biological properties. However, there has been little effort to extend these TIs to the study of complex mixtures of artificial polymers or biopolymers such as nucleic acids and proteins. In this sense, Blood Proteome (BP) is one of the most important and complex mixtures containing protein polymers. For instance, outcomes obtained by Mass Spectrometry (MS) analysis of BP are very useful for the early detection of diseases and drug-induced toxicities. Here, we use two Spiral and Star Network representations of the MS outcomes and defined a new type of TIs. The new TIs introduced here are the spectral moments (Ïk) of the stochastic matrix associated to the Spiral graph and describe non-linear relationships between the different regions of the MS characteristic of BP. We used the MARCH-INSIDE approach to calculate the Ïk(SN) of different BP samples and S2SNet to determine several Star graph TIs. In the second step, we develop the corresponding Quantitative ProteomeâProperty Relationship (QPPR) models using the Linear Discriminant Analysis (LDA). QPPRs are the analogues of QSPRs in the case of complex biopolymer mixtures. Specifically, the new QPPRs derived here may be used to detect drug-induced cardiac toxicities from BP samples. Different Machine Learning classification algorithms were used to fit the QPPRs based on Ïk(SN), showing J48 decision tree classifier to have the best performance. These results suggest that the present approach captures important features of the complex biopolymers mixtures and opens new opportunities to the application of the idea supporting classic QSPRs in polymer sciences.
Notes:
Giulio Ferino, Humberto González-Díaz, Giovanna Delogu, Gianni Podda, Eugenio Uriarte (2008)  Using spectral moments of spiral networks based on PSA/mass spectra outcomes to derive quantitative proteome-disease relationships (QPDRs) and predicting prostate cancer.   Biochem Biophys Res Commun 372: 2. 320-325 Jul  
Abstract: In prostate cancer (PCa), prognostic (predictive) factors are particularly important given the marked heterogeneity of this disease at clinical, morphologic, and biomolecular levels. Blood contains a treasure of previously unstudied biomarkers that could reflect the ongoing physiological state of all tissue. The serum prostate-specific antigen (PSA) measurement is a very good biomarker for PCa, but the percentage of bad classification is somewhat high. The blood proteome mass spectra (MS) represent a potential tool for detection of diseases; however the identification of a single biomarker from the complex output from MS is often difficult. In this paper, we propose a general strategy, based on computational chemistry techniques, which should improve the predictive power of PSA. Our group adapted the square-spiral graph to represent human serum-plasma-proteome MS for healthy and PCa patients. These graphs were previously applied to DNA and/or protein sequences. In this work, we calculated different classes of connectivity indices (CIs), and created various models based on the spectral moments. The best QPDRs model found showed accuracy values ranging from 71.7% to 97.2%, and 70.4% to 99.2% of specificity. This methodology might be useful for several applications in computational chemistry.
Notes:
Maykel Cruz-Monteagudo, Cristian Robert Munteanu, Fernanda Borges, M Natália D S Cordeiro, Eugenio Uriarte, Humberto González-Díaz (2008)  Quantitative Proteome-Property Relationships (QPPRs). Part 1: finding biomarkers of organic drugs with mean Markov connectivity indices of spiral networks of blood mass spectra.   Bioorg Med Chem 16: 22. 9684-9693 Nov  
Abstract: Numerical parameters of the molecular networks, also referred as Topological Indices or Connectivity Indices (CIs), have been used in Bioorganic and Medicinal Chemistry to find Quantitative Structure-Activity, Property or Toxicity Relationship (QSAR, QSPR and QSTR) models. QSPR models generally use CIs as inputs to predict the biological activity of compounds. However, the literature does not evidence a great effort to find QSAR-like models for other biologically and chemically relevant systems. For instance, blood proteome constitutes a protein-rich information reservoir, since the serum proteome Mass Spectra (MS) represents a potential information source for the early detection of Biomarkers for diseases and/or drug-induced toxicities. The concept of mass spectrum network (MS network) for a single protein is already well-known. However, there are no reported results on the use of CIs for a MS network of a whole proteome to explore MS patterns. In this work, we introduced for the first time a novel network representation and the CIs for the MS of blood proteome samples. The new network bases on Randic's Spiral network have been previously introduced for protein sequences. The new MS CIs, called here Spiral Markov Connectivity (SMC(k)) of the MS Spiral graph can be calculated with the software MARCH-INSIDE, combining network and Markov model theory. The SMC(k) values could be used to seek QSAR-like models, called in this work Quantitative Proteome-Property Relationships (QPPRs). We calculate the SMC(k) values for 62 blood samples and fit a QPPR model by discriminating proteome MS, typical of individuals susceptible to suffer drug-induced cardiotoxicity from control samples. The accuracy, sensitivity, and specificity values of the QPPR model were between 73.08% and 87.5% in training and validation series. This work points to QPPR models as a powerful tool for MS detection of biomarkers in proteomics.
Notes:
Humberto González-Díaz, Francisco J Prado-Prado (2008)  Unified QSAR and network-based computational chemistry approach to antimicrobials, part 1: multispecies activity models for antifungals.   J Comput Chem 29: 4. 656-667 Mar  
Abstract: There are many pathogen microbial species with very different antimicrobial drugs susceptibility. In this work, we selected pairs of antifungal drugs with similar/dissimilar species predicted-activity profile and represented it as a large network, which may be used to identify drugs with similar mechanism of action. Computational chemistry prediction of the biological activity based on quantitative structure-activity relationships (QSAR) susbtantially increases the potentialities of this kind of networks, avoiding time and resource-consuming experiments. Unfortunately, most QSAR models are unspecific or predict activity against only one species. To solve this problem we developed a multispecies QSAR classification model, in which the outputs were the inputs of the aforementioned network. Overall model classification accuracy was 87.0% (161/185 compounds) in training, 83.4% (50/61) in validation, and 83.7% for 288 additional antifungal compounds used to extend model validation for network construction. The network predicted has 59 nodes (compounds), 648 edges (pairs of compounds with similar activity), low coverage density d = 37.8%, and distribution more close to normal than to exponential. These results are more characteristic of a not-overestimated random network, clustering different drug mechanisms of actions, than of a less useful power law network with few mechanisms (network hubs).
Notes:
Guillermín Agüero-Chapín, Humberto Gonzalez-Díaz, Gustavo de la Riva, Edrey Rodríguez, Aminael Sanchez-Rodríguez, Gianni Podda, Roberto I Vazquez-Padrón (2008)  MMM-QSAR recognition of ribonucleases without alignment: comparison with an HMM model and isolation from Schizosaccharomyces pombe, prediction, and experimental assay of a new sequence.   J Chem Inf Model 48: 2. 434-448 Feb  
Abstract: The study of type III RNases constitutes an important area in molecular biology. It is known that the pac1+ gene encodes a particular RNase III that shares low amino acid similarity with other genes despite having a double-stranded ribonuclease activity. Bioinformatics methods based on sequence alignment may fail when there is a low amino acidic identity percentage between a query sequence and others with similar functions (remote homologues) or a similar sequence is not recorded in the database. Quantitative structure-activity relationships (QSAR) applied to protein sequences may allow an alignment-independent prediction of protein function. These sequences of QSAR-like methods often use 1D sequence numerical parameters as the input to seek sequence-function relationships. However, previous 2D representation of sequences may uncover useful higher-order information. In the work described here we calculated for the first time the spectral moments of a Markov matrix (MMM) associated with a 2D-HP-map of a protein sequence. We used MMMs values to characterize numerically 81 sequences of type III RNases and 133 proteins of a control group. We subsequently developed one MMM-QSAR and one classic hidden Markov model (HMM) based on the same data. The MMM-QSAR showed a discrimination power of RNAses from other proteins of 97.35% without using alignment, which is a result as good as for the known HMM techniques. We also report for the first time the isolation of a new Pac1 protein (DQ647826) from Schizosaccharomyces pombe strain 428-4-1. The MMM-QSAR model predicts the new RNase III with the same accuracy as other classical alignment methods. Experimental assay of this protein confirms the predicted activity. The present results suggest that MMM-QSAR models may be used for protein function annotation avoiding sequence alignment with the same accuracy of classic HMM models.
Notes:
Maykel Cruz-Monteagudo, Humberto González-Díaz, Fernanda Borges, Elena Rosa Dominguez, M Natália D S Cordeiro (2008)  3D-MEDNEs: an alternative "in silico" technique for chemical research in toxicology. 2. quantitative proteome-toxicity relationships (QPTR) based on mass spectrum spiral entropy.   Chem Res Toxicol 21: 3. 619-632 Mar  
Abstract: Low range mass spectra (MS) characterization of serum proteome offers the best chance of discovering proteome-(early drug-induced cardiac toxicity) relationships, called here Pro-EDICToRs. However, due to the thousands of proteins involved, finding the single disease-related protein could be a hard task. The search for a model based on general MS patterns becomes a more realistic choice. In our previous work ( González-Díaz, H. , et al. Chem. Res. Toxicol. 2003, 16, 1318- 1327 ), we introduced the molecular structure information indices called 3D-Markovian electronic delocalization entropies (3D-MEDNEs). In this previous work, quantitative structure-toxicity relationship (QSTR) techniques allowed us to link 3D-MEDNEs with blood toxicological properties of drugs. In this second part, we extend 3D-MEDNEs to numerically encode biologically relevant information present in MS of the serum proteome for the first time. Using the same idea behind QSTR techniques, we can seek now by analogy a quantitative proteome-toxicity relationship (QPTR). The new QPTR models link MS 3D-MEDNEs with drug-induced toxicological properties from blood proteome information. We first generalized Randic's spiral graph and lattice networks of protein sequences to represent the MS of 62 serum proteome samples with more than 370 100 intensity ( I i ) signals with m/ z bandwidth above 700-12000 each. Next, we calculated the 3D-MEDNEs for each MS using the software MARCH-INSIDE. After that, we developed several QPTR models using different machine learning and MS representation algorithms to classify samples as control or positive Pro-EDICToRs samples. The best QPTR proposed showed accuracy values ranging from 83.8% to 87.1% and leave-one-out (LOO) predictive ability of 77.4-85.5%. This work demonstrated that the idea behind classic drug QSTR models may be extended to construct QPTRs with proteome MS data.
Notes:
Humberto González-Díaz, Yenny González-Díaz, Lourdes Santana, Florencio M Ubeira, Eugenio Uriarte (2008)  Proteomics, networks and connectivity indices.   Proteomics 8: 4. 750-778 Feb  
Abstract: Describing the connectivity of chemical and/or biological systems using networks is a straight gate for the introduction of mathematical tools in proteomics. Networks, in some cases even very large ones, are simple objects that are composed at least by nodes and edges. The nodes represent the parts of the system and the edges geometric and/or functional relationships between parts. In proteomics, amino acids, proteins, electrophoresis spots, polypeptidic fragments, or more complex objects can play the role of nodes. All of these networks can be numerically described using the so-called Connectivity Indices (CIs). The transformation of graphs (a picture) into CIs (numbers) facilitates the manipulation of information and the search for structure-function relationships in Proteomics. In this work, we review and comment on the challenges and new trends in the definition and applications of CIs in Proteomics. Emphasis is placed on 1-D-CIs for DNA and protein sequences, 2-D-CIs for RNA secondary structures, 3-D-topographic indices (TPGIs) for protein function annotation without alignment, 2-D-CIs and 3-D-TPGIs for the study of drug-protein or drug-RNA quantitative structure-binding relationships, and pseudo 3-D-CIs for protein surface molecular recognition. We also focus on CIs to describe Protein Interaction Networks or RNA co-expression networks. 2-D-CIs for patient blood proteome 2-DE maps or mass spectra are also covered.
Notes:
Guillermin Agüero-Chapín, Agostinho Antunes, Florencio M Ubeira, Kuo-Chen Chou, Humberto González-Díaz (2008)  Comparative Study of Topological Indices of Macro/Supramolecular RNA Complex Networks.   J Chem Inf Model Oct  
Abstract: RNA function annotation is often based on alignment to a previously studied template. In contrast to the study of proteins, there are not many alignment-free methods to predict RNA functions if alignment fails. The use of topological indices (TIs) of RNA complex networks (CNs) to find quantitative structure-activity relationships (QSAR) may be an alternative to incorporate secondary structure or sequence-to-sequence similarity. Here, we introduce new QSAR-like techniques using RNA macromolecular CNs (mmCNs), where nodes are nucleotides, or RNA supramolecular CNs (smCNs), where nodes are RNA sequences. We studied a data set of 198 sequences including 18S-rRNAs (important phylogenetic molecular biomarkers). We constructed three types of RNA mmCNs: sequence-linear (SL), Cartesian-lattice (CL), and sequence-folding CNs (SF-CNs) and two smCNs: sequence-sequence disagreement CN (SSD) and sequence-sequence similarity (SSS-smCN). We reported the first comparative QSAR study with all these CIs and CNs, which includes: (i) spectral moments ( ( i )mu d ( w)) of SL-mmCNs (accuracy = 75.3%), (ii) electrostatic CIs (xi d ) of CL-mmCNs (>90%), (iii) thermodynamic parameters (Delta G, Delta H, Delta S, and T m) of SF-mmCNs (64.7%), (iv) disagreement-distribution moments ( M k ) of the SSD-smCN (79.3%), and (v) node centralities of the SSD-smCN (78.0%). Furthermore, we reported the experimental isolation of a new RNA sequence from Psidum guajava leaf tissue and its QSAR and BLAST prediction to illustrate the practical use of these methods. We also investigated the use of these CNs to explore rRNA diversity on bacteria, plants, and parasites from the Dactylogyrus genus. The HPL-mmCNs model was the best of all found. All the CNs and TIs, except SF-mmCNs, were introduced here by the first time for the QSAR study of RNA, which allowed a comparative study for RNA classification.
Notes:
Lourdes Santana, Humberto González-Díaz, Elías Quezada, Eugenio Uriarte, Matilde Yáñez, Dolores Viña, Francisco Orallo (2008)  Quantitative structure-activity relationship and complex network approach to monoamine oxidase A and B inhibitors.   J Med Chem 51: 21. 6740-6751 Nov  
Abstract: The work provides a new model for the prediction of the MAO-A and -B inhibitor activity by the use of combined complex networks and QSAR methodologies. On the basis of the obtained model, we prepared and assayed 33 coumarin derivatives, and the theoretical prediction was compared with the experimental activity data. The model correctly predicted 27 compounds, and most of the active derivatives showed IC 50 values in the muM-nM range against both the MAO-A and MAO-B isoforms. Compound 14 shows the same MAO-A inhibitory activity (IC 50 = 7.2 nM), as clorgyline used as a reference inhibitor and has the highest MAO-A specificity (1000-fold higher compared to MAO-B). On the other hand, compounds 24 (IC 50 = 1.2 nM) and 28 (IC 50 = 1.5 nM) show higher activity than selegiline (IC 50 = 19.6 nM) and high MAO-B selectivity with 100-fold and 1600-fold inhibition levels, with respect to the MAO-A isoform.
Notes:
2007
Humberto González-Díaz, Ervelio Olazábal, Lourdes Santana, Eugenio Uriarte, Yenny González-Díaz, Nilo Castañedo (2007)  QSAR study of anticoccidial activity for diverse chemical compounds: prediction and experimental assay of trans-2-(2-nitrovinyl)furan.   Bioorg Med Chem 15: 2. 962-968 Jan  
Abstract: In this work we report a QSAR model that discriminates between chemically heterogeneous classes of anticoccidial and non-anticoccidial compounds. For this purpose we used the Markovian Chemicals in silico Design (MARCH-INSIDE) approach J. Mol. Mod.2002, 8, 237-245; J. Mol. Mod.2003, 9, 395-407]. Linear discriminant analysis allowed us to fit the discriminant function. This function correctly classifies 86.67% of anticoccidial compounds and 96.23% of inactive compounds in the training series. Overall classification is 94.12%. We validated the model by means of an external predicting series, with 86.96% of global predictability. Remarkably, the present model is based on topological as well as configuration-dependent molecular descriptors. Therefore, the model performs timely calculations and allows discrimination between Z/E and chiral isomers. Finally, to exemplify the use of the model in practice we report the prediction and experimental assay of trans-2-(2-nitrovinyl)furan. It is notable that lesion control was 72.86% at mg/kg of body weight with respect to 60% at 125 mg/kg for amprolium (control drug). The back-projection map for this compound predicts a high level of importance for the double bond and for the nitro group in the trans position. We conclude that the MARCH-INSIDE approach enables the accurate fast track identification of anticoccidial hits. Moreover, trans-2-(2-nitrovinyl)furan seems to be a promising drug for the treatment of coccidiosis.
Notes:
Maykel Cruz-Monteagudo, Humberto González-Díaz, Guillermín Agüero-Chapín, Lourdes Santana, Fernanda Borges, Elena Rosa Domínguez, Gianni Podda, Eugenio Uriarte (2007)  Computational chemistry development of a unified free energy Markov model for the distribution of 1300 chemicals to 38 different environmental or biological systems.   J Comput Chem 28: 11. 1909-1923 Aug  
Abstract: Predicting tissue and environmental distribution of chemicals is of major importance for environmental and life sciences. Most of the molecular descriptors used in computational prediction of chemicals partition behavior consider molecular structure but ignore the nature of the partition system. Consequently, computational models derived up-to-date are restricted to the specific system under study. Here, a free energy-based descriptor (DeltaG(k)) is introduced, which circumvent this problem. Based on DeltaG(k), we developed for the first time a single linear classification model to predict the partition behavior of a broad number of structurally diverse drugs and other chemicals (1300) for 38 different partition systems of biological and environmental significance. The model presented training/predicting set accuracies of 91.79/88.92%. Parametrical assumptions were checked. Desirability analysis was used to explore the levels of the predictors that produce the most desirable partition properties. Finally, inversion of the partition direction for each one of the 38 partition systems evidences that our models correctly classified 89.08% of compounds with an uncertainty of only +/-0.17% independently of the direction of the partition process used to seek the model. Other 10 different classification models (linear, neural networks, and genetic algorithms) were also tested for the same purposes. None of these computational models favorably compare with respect to the linear model indicating that our approach capture the main aspects that govern chemicals partition in different systems.
Notes:
Humberto González-Díaz, Yunierkis Pérez-Castillo, Gianni Podda, Eugenio Uriarte (2007)  Computational chemistry comparison of stable/nonstable protein mutants classification models based on 3D and topological indices.   J Comput Chem 28: 12. 1990-1995 Sep  
Abstract: In principle, there are different protein structural parameters that can be used in computational chemistry studies to classify protein mutants according to thermal stability including: sequence, connectivity, and 3D descriptors. Connectivity parameters (called topological indices, TIs) are simpler than 3D parameters being then less computationally expensive. However, TIs ignore important aspects of protein structure and hence are expected to be inaccurate. In any case, a comparison of 3D and TIs has not been reported with respect to the power of discrimination of proteins according to stability. In this study, we compare both classes of indices in this sense by the first time. The best model found, based on 3D spectral moments correctly classified 507 out of 525 (96.6%) proteins while TIs model correctly classified 404 out of 525 (77.0%) proteins. We have shown that, in fact, 3D descriptor models gave more accurate results than TIs but interestingly, TIs give acceptable results in a timely way in spite of their simplicity.
Notes:
Humberto González-Díaz, Santiago Vilar, Lourdes Santana, Eugenio Uriarte (2007)  Medicinal chemistry and bioinformatics-current trends in drugs discovery with networks topological indices.   Curr Top Med Chem 7: 10. 1015-1029  
Abstract: The numerical encoding of chemical structure with Topological Indices (TIs) is currently growing in importance in Medicinal Chemistry and Bioinformatics. This approach allows the rapid collection, annotation, retrieval, comparison and mining of chemical structures within large databases. TIs can subsequently be used to seek quantitative structure-activity relationships (QSAR), which are models connecting chemical structure with biological activity. In the early 1990's, there was an explosion in the introduction and definition of new TIs. The Handbook of Molecular Descriptors by Todeschini and Consonni lists more than 1500 of these indices. At the end of the last century, researchers produced a large number of TIs with essentially the same advantages and/or disadvantages. Consequently, many researchers abandoned the definition of TIs for a time. In our opinion, one of the problems associated with TIs is that researchers aimed their efforts only at the codification of chemical connectivity for small-sized drugs. As a consequence, recently it seems that we have arrived at "Fukuyama's End of History in TIs definition". In the work described here, we review and comment on the "quo vadis" and challenges in the definition of TIs as we enter the new century. Emphasis is placed on new chiral TIs (CTIs), flexible TIs for unifying QSAR models with multiple targets, topographic indices (TPGIs), TIs for DNA and protein sequences, TIs for 2D RNA structures, TPGIs and drug-protein or drug-RNA quantitative structure-binding relationship (QSBR) studies, TIs to encode protein surface information and TIs for protein interaction networks (PINs).
Notes:
Humberto González-Díaz, Santiago Vilar, Lourdes Santana, Gianni Podda, Eugenio Uriarte (2007)  On the applicability of QSAR for recognition of miRNA bioorganic structures at early stages of organism and cell development: embryo and stem cells.   Bioorg Med Chem 15: 7. 2544-2550 Apr  
Abstract: Quantitative structure-activity-relationship (QSAR) models have application in bioorganic chemistry mainly to the study of small sized molecules while applications to biopolymers remain not very developed. MicroRNAs (miRNAs), which are non-coding small RNAs, regulate a variety of biological processes and constitute good candidates to scale up the application of QSAR to biopolymers. The propensity of a small RNA sequence to act as miRNA depends on its secondary structure, which one can explain in terms of folding thermodynamic parameters. Then, thermodynamic QSAR can be used, for instance, for fast identification of miRNAs at early stages of development such as embryos and stem cells (called here esmiRNAs), and gain clarity inside cellular differentiation processes and diseases such as cancer. First, we calculated folding free energies (DeltaG), enthalpies (DeltaH), and entropies (DeltaS) as well as melting temperatures (T(m)) for 2623 small RNA sequences (including 623 esmiRNAs and 2000 negative control sequences). Next, we seek a QSAR classification model: esmiRNA=0.035 x T(m)-0.078 x DeltaS-8.748. The model correctly recognized 543 (87.2%) of esmiRNAs and 935 (93.5%) of non-esmiRNAs divided into both training and validation series. The model also recognized 908 out of 1000 additional negative control sequences. ROC curve analysis (area=0.93) demonstrated that the present model significantly differentiates from a random classifier. In addition, we map the influence of thermodynamic parameters over esmiRNA activity. Last, a double ordinate Cartesian plot of cross-validated residuals (first ordinate), standard residuals (second ordinate), and leverages (abscissa) defined the domain of applicability of the model as a squared area within +/-2 band for residuals and a leverage threshold of h=0.0074. The present is the first QSAR model for quickly accurate selection of new esmiRNAs with potential use in bioorganic and medicinal chemistry.
Notes:
Humberto González-Díaz, Guillermín Agüero-Chapin, Javier Varona, Reinaldo Molina, Giovanna Delogu, Lourdes Santana, Eugenio Uriarte, Gianni Podda (2007)  2D-RNA-coupling numbers: a new computational chemistry approach to link secondary structure topology with biological function.   J Comput Chem 28: 6. 1049-1056 Apr  
Abstract: Methods for prediction of proteins, DNA, or RNA function and mapping it onto sequence often rely on bioinformatics alignment approach instead of chemical structure. Consequently, it is interesting to develop computational chemistry approaches based on molecular descriptors. In this sense, many researchers used sequence-coupling numbers and our group extended them to 2D proteins representations. However, no coupling numbers have been reported for 2D-RNA topology graphs, which are highly branched and contain useful information. Here, we use a computational chemistry scheme: (a) transforming sequences into RNA secondary structures, (b) defining and calculating new 2D-RNA-coupling numbers, (c) seek a structure-function model, and (d) map biological function onto the folded RNA. We studied as example 1-aminocyclopropane-1-carboxylic acid (ACC) oxidases known as ACO, which control fruit ripening having importance for biotechnology industry. First, we calculated tau(k)(2D-RNA) values to a set of 90-folded RNAs, including 28 transcripts of ACO and control sequences. Afterwards, we compared the classification performance of 10 different classifiers implemented in the software WEKA. In particular, the logistic equation ACO = 23.8 . tau(1)(2D-RNA) + 41.4 predicts ACOs with 98.9%, 98.0%, and 97.8% of accuracy in training, leave-one-out and 10-fold cross-validation, respectively. Afterwards, with this equation we predict ACO function to a sequence isolated in this work from Coffea arabica (GenBank accession DQ218452). The tau(1)(2D-RNA) also favorably compare with other descriptors. This equation allows us to map the codification of ACO activity on different mRNA topology features. The present computational-chemistry approach is general and could be extended to connect RNA secondary structure topology to other functions.
Notes:
Humberto González-Díaz, Isis Bonet, Carmen Terán, Erik De Clercq, Rafael Bello, Maria M García, Lourdes Santana, Eugenio Uriarte (2007)  ANN-QSAR model for selection of anticancer leads from structurally heterogeneous series of compounds.   Eur J Med Chem 42: 5. 580-585 May  
Abstract: Developing a model for predicting anticancer activity of any classes of organic compounds based on molecular structure is very important goal for medicinal chemist. Different molecular descriptors can be used to solve this problem. Stochastic molecular descriptors so-called the MARCH-INSIDE approach, shown to be very successful in drug design. Nevertheless, the structural diversity of compounds is so vast that we may need non-linear models such as artificial neural networks (ANN) instead of linear ones. SmartMLP-ANN analysis used to model the anticancer activity of organic compounds has shown high average accuracy of 93.79% (train performance) and predictability of 90.88% (validation performance) for the 8:3-MLP topology with different training and predicting series. This ANN model favourably compares with respect to a previous linear discriminant analysis (LDA) model [H. González-Díaz et al., J. Mol. Model 9 (2003) 395] that showed only 80.49% of accuracy and 79.34% of predictability. The present SmartMLP approach employed shorter training times of only 10h while previous models give accuracies of 70-89% only after 25-46 h of training. In order to illustrate the practical use of the model in bioorganic medicinal chemistry, we report the in silico prediction, and in vitro evaluation of six new synthetic tegafur analogues having IC(50) values in a broad range between 37.1 and 138 microgmL(-1) for leukemia (L1210/0) and human T-lymphocyte (Molt4/C8, CEM/0) cells. Theoretical predictions coincide very well with experimental results.
Notes:
Humberto González-Díaz, Liane Saíz-Urra, Reinaldo Molina, Yenny González-Díaz, Angeles Sánchez-González (2007)  Computational chemistry approach to protein kinase recognition using 3D stochastic van der Waals spectral moments.   J Comput Chem 28: 6. 1042-1048 Apr  
Abstract: Three-dimensional (3D) protein structures now frequently lack functional annotations because of the increase in the rate at which chemical structures are solved with respect to experimental knowledge of biological activity. As a result, predicting structure-function relationships for proteins is an active research field in computational chemistry and has implications in medicinal chemistry, biochemistry and proteomics. In previous studies stochastic spectral moments were used to predict protein stability or function (González-Díaz, H. et al. Bioorg Med Chem 2005, 13, 323; Biopolymers 2005, 77, 296). Nevertheless, these moments take into consideration only electrostatic interactions and ignore other important factors such as van der Waals interactions. The present study introduces a new class of 3D structure molecular descriptors for folded proteins named the stochastic van der Waals spectral moments ((o)beta(k)). Among many possible applications, recognition of kinases was selected due to the fact that previous computational chemistry studies in this area have not been reported, despite the widespread distribution of kinases. The best linear model found was Kact = -9.44 degrees beta(0)(c) +10.94 degrees beta(5)(c) -2.40 degrees beta(0)(i) + 2.45 degrees beta(5)(m) + 0.73, where core (c), inner (i) and middle (m) refer to specific spatial protein regions. The model with a high Matthew's regression coefficient (0.79) correctly classified 206 out of 230 proteins (89.6%) including both training and predicting series. An area under the ROC curve of 0.94 differentiates our model from a random classifier. A subsequent principal components analysis of 152 heterogeneous proteins demonstrated that beta(k) codifies information different to other descriptors used in protein computational chemistry studies. Finally, the model recognizes 110 out of 125 kinases (88.0%) in a virtual screening experiment and this can be considered as an additional validation study (these proteins were not used in training or predicting series).
Notes:
Humberto Gonzalez-Díaz, Liane Saiz-Urra, Reinaldo Molina, Lourdes Santana, Eugenio Uriarte (2007)  A model for the recognition of protein kinases based on the entropy of 3D van der Waals interactions.   J Proteome Res 6: 2. 904-908 Feb  
Abstract: The study and prediction of kinase function (kinomics) is of major importance for proteome research due to the widespread distribution of kinases. However, the prediction of protein function based on the similarity between a functionally annotated 3D template and a query structure may fail, for instance, if a similar protein structure cannot be identified. Alternatively, function can be assigned using 3D-structural empirical parameters. In previous studies, we introduced parameters based on electrostatic entropy (Proteins 2004, 56, 715) and molecular vibration entropy (Bioinformatics 2003, 19, 2079) but ignored other important factors such as van der Waals (vdw) interactions. In the work described here, we define 3D-vdw entropies (degrees theta(k)) and use them for the first time to derive a classifier for protein kinases. The model classifies correctly 88.0% of proteins in training and more than 85.0% of proteins in validation studies. Principal components analysis of heterogeneous proteins demonstrated that degrees theta(k) codify information that is different to that described by other bulk or folding parameters. In additional validation experiments, the model recognized 129 out of 142 kinases (90.8%) and 592 out of 677 non-kinases (87.4%) not used above. This study provides a basis for further consideration of degrees theta(k) as parameters for the empirical search for structure-function relationships.
Notes:
Francisco J Prado-Prado, Humberto González-Díaz, Lourdes Santana, Eugenio Uriarte (2007)  Unified QSAR approach to antimicrobials. Part 2: predicting activity against more than 90 different species in order to halt antibacterial resistance.   Bioorg Med Chem 15: 2. 897-902 Jan  
Abstract: There are many different kinds of pathogenic bacteria species with very different susceptibility profiles to different antibacterial drugs. One limitation of QSAR models is that they consider the biological activity of drugs against only one species of bacteria. In a previous paper, we developed a unified Markov model to describe the biological activity of different drugs tested in the literature against some antimicrobial species. Consequently, predicting the probability with which a drug is active against different species of bacteria with a single unified model is a goal of major importance. The work described here develops the unified Markov model to describe the biological activity of more than 70 drugs from the literature tested against 96 species of bacteria. We applied linear discriminant analysis (LDA) to classify drugs as active or inactive against the different tested bacterial species. The model correctly classified 199 out of 237 active compounds (83.9%) and 168 out of 200 inactive compounds (84%). Overall training predictability was 84% (367 out of 437 cases). Validation of the model was carried out using an external predicting series, with the model classifying 202 out of 243 (i.e., 83.13%) of the compounds. In order to show how the model functions in practice, a virtual screening was carried out and the model recognized as active 84.5% (480 out of 568) antibacterial compounds not used in the training or predicting series. The current study is an attempt to calculate within a unified framework the probabilities of antibacterial action of drugs against many different species.
Notes:
2006
Lourdes Santana, Eugenio Uriarte, Humberto González-Díaz, Giuseppe Zagotto, Ramón Soto-Otero, Estefanía Méndez-Alvarez (2006)  A QSAR model for in silico screening of MAO-A inhibitors. Prediction, synthesis, and biological assay of novel coumarins.   J Med Chem 49: 3. 1149-1156 Feb  
Abstract: This work explores the potential of the MARCH-INSIDE methodology to seek a QSAR for MAO-A inhibitors from a heterogeneous series of compounds. A Markov model was used to quickly calculate the molecular electron delocalization, polarizability, refractivity, and n-octanol/water partition coefficients for a series of 1406 active/nonactive compounds. LDA was subsequently used to fit a classification function. The model showed 92.8% and 91.8% global accuracy and predictability in training and validation studies. This QSAR model was validated through a virtual screening of a series of coumarin derivatives. The 15 selected compounds were prepared and evaluated as in vitro MAO-A inhibitors. The theoretical prediction was compared with the experimental results and the model correctly predicted 13 compounds with only two mistakes on compounds with activities very close to the cutoff point established for the model. Consequently, this method represents a useful tool for the "in silico" screening of MAO-A inhibitors.
Notes:
Humberto González-Díaz, Angeles Sánchez-González, Yenny González-Díaz (2006)  3D-QSAR study for DNA cleavage proteins with a potential anti-tumor ATCUN-like motif.   J Inorg Biochem 100: 7. 1290-1297 Jul  
Abstract: Genomics projects have elucidated several genes that encode protein sequences. Subsequently, the advent of the proteomics age has enabled the synthesis and 3D structure determination for these protein sequences. Some of these proteins incorporate metal atoms but it is often not known whether they are metal-binding proteins and the nature of the biological activity is not understood. Consequently, the development of methods to predict metal-mediated biological activity of proteins from the 3D structure of metal-unbound proteins is a goal of major importance. More specifically, the amino terminal Cu(II)- and Ni(II)-binding (ATCUN) motif is a small metal-binding site found in the N-terminus of many naturally occurring proteins. The ATCUN motif participates in DNA cleavage and has anti-tumor activity. In this study, we calculated average 3D electrostatic potentials (xi(k)) for 265 different proteins including 133 potential ATCUN anti-tumor proteins. We also calculated xi(k) values for the total protein or for the following specific protein regions: the core, inner, middle, and outer orbits. A linear discriminant analysis model was subsequently developed to assign proteins into two groups called ATCUN DNA-cleavage proteins and non-active proteins. The best model found was: ATCUN=1.15.xi(1)(inner)+2.18.xi(5)(middle)+27.57.xi(0)(outer)-27.57.xi(0)(total)+0.09. The model correctly classified 182 out of 197 (91.4%) and 61 out of 66 (92.4%) proteins in training and external predicting series', respectively. Finally, desirability analysis was used to predict the values for the electrostatic potential in one single region and the combined values in two regions that are desirable for ATCUN-like proteins. To the best of our knowledge, the present work is the first study in which desirability analysis has been used in protein quantitative-structure-activity-relationship (QSAR).
Notes:
Humberto González-Díaz, Francisco J Prado-Prado, Lourdes Santana, Eugenio Uriarte (2006)  Unify QSAR approach to antimicrobials. Part 1: predicting antifungal activity against different species.   Bioorg Med Chem 14: 17. 5973-5980 Sep  
Abstract: Most of up-to-date reported molecular descriptors encode only information about the molecular structure. In previous papers, we have extended stochastic descriptors to encode additional information such as target site, partition system, or biological species [Bioorg. Med. Chem. Lett.2005, 15, 551; Bioorg. Med. Chem. 2005, 13, 1119]. This work develops an unify Markov model to describe with a single linear equation the biological activity of 74 drugs tested in the literature against some of the fungi species selected from a list of 87 species (491 cases in total). The data were processed by linear discriminant analysis (LDA) classifying drugs as active or non-active against the different tested fungi species. The model correctly classifies 338 out of 368 active compounds (91.85%) and 89 out of 123 non-active compounds (72.36%). Overall training predictability was 86.97% (427 out of 491 compounds). Validation of the model was carried out by means of leave-species-out (LSO) procedure. After elimination step-by-step of all drugs tested against one specific species, we record the percentage of good classification of leave-out compounds (LSO-predictability). In addition, robustness of the model to the elimination of the compounds (LSO-robustness) was considered. This aspect was considered as the variation of the percentage of good classification of the modified model (Delta) in LSO with respect to the original one. Average LSO-predictability was 86.41+/-0.95% (average+/-SD) and Delta = -0.55%, being 6 the average number of drugs tested against each fungi species. Results for some of the 87 studied species were Candida albicans: 43 tested compounds, 100% of LSO-predictability, Delta = -3.49%; Candida parapsilosis 23, 100%, Delta = -0.86%; Aspergillus fumigatus 21, 95.20%, Delta = 0.05%; Microsporum canis 12, 91.60%, Delta = -2.84%; Trichophyton mentagrophytes 11, 100%, Delta = -0.51%; Cryptococcus neoformans 10, 90%, Delta = -0.90%. The present one is the first reported unify model that allows one predicting antifungal activity of any organic compound against a very large diversity of fungi pathogens.
Notes:
Humberto González-Díaz, Dolores Viña, Lourdes Santana, Erik de Clercq, Eugenio Uriarte (2006)  Stochastic entropy QSAR for the in silico discovery of anticancer compounds: prediction, synthesis, and in vitro assay of new purine carbanucleosides.   Bioorg Med Chem 14: 4. 1095-1107 Feb  
Abstract: A Markov model based QSAR is introduced for the rational selection of anticancer compounds. The model discriminates 90.3% of 226 structurally heterogeneous anticancer/non-anticancer compounds in training series. External validation series were used to validate the model; the 91.8% containing 85 compounds, not considered to fit the model, were correctly classified. The model developed is afterwards used in a simulation of a virtual search for anticancer compounds never considered either in training or in predicting series. The 87.7% of the 213 anticancer compounds used in this simulated search were correctly classified. The model also shows high values for specificity (0.89), sensitivity (0.91), and Mathews correlation coefficient (0.79). In addition, the present model compares better-to-similar with respect to other four models elsewhere reported if one takes into consideration 26 comparison parameters. Finally, we exemplify the use of the model in practice with the design of a new series of carbanucleosides. The compounds evaluated with the model were synthesized and experimentally assayed for their antitumor effects on the proliferation of murine leukemia cells (L1210/0) and human T-lymphocyte cells (CEM/0 and Molt4/C8). The more interesting activity was detected for the compound 5a with a predicted probability of 80.2% and IC(50) = 27.0, 27.2, and 29.4 microM, respectively, against the above-mentioned cellular lines. These values are comparable to those for the control compound Ara-A.
Notes:
Maykel Cruz-Monteagudo, Humberto González-Díaz, Fernanda Borges, Yenny González-Díaz (2006)  Simple stochastic fingerprints towards mathematical modeling in biology and medicine. 3. Ocular irritability classification model.   Bull Math Biol 68: 7. 1555-1572 Oct  
Abstract: MARCH-INSIDE methodology and a statistical classification method--linear discriminant analysis (LDA)--is proposed as an alternative method to the Draize eye irritation test. This methodology has been successfully applied to a set of 46 neutral organic chemicals, which have been defined as ocular irritant or nonirritant. The model allow to categorize correctly 37 out of 46 compounds, showing an accuracy of 80.46%. Specifically, this model demonstrates the existence of a good categorization average of 91.67 and 76.47% for irritant and nonirritant compounds, respectively. Validation of the model was carried out using two cross-validation tools: Leave-one-out (LOO) and leave-group-out (LGO), showing a global predictability of the model of 71.7 and 70%, respectively. The average of coincidence of the predictions between leave-one-out/leave-group-out studies and train set were 91.3% (42 out of 46 cases)/89.1% (41 out of 46 cases) proving the robustness of the model obtained. Ocular irritancy distribution diagram is carried out in order to determine the intervals of the property where the probability of finding an irritant compound is maximal relating to the choice of find a false nonirritant one. It seems that, until today, the present model may be the first predictive linear discriminant equation able to discriminate between eye irritant and nonirritant chemicals.
Notes:
Humberto González-Díaz, Alcides Pérez-Bello, Eugenio Uriarte, Yenny González-Díaz (2006)  QSAR study for mycobacterial promoters with low sequence homology.   Bioorg Med Chem Lett 16: 3. 547-553 Feb  
Abstract: The general belief is that quantitative structure-activity relationship (QSAR) techniques work only for small molecules and, protein sequences or, more recently, DNA sequences. However, with non-branched graph for proteins and DNA sequences the QSAR often have to be based on powerful non-linear techniques such as support vector machines. In our opinion, linear QSAR models based on RNA could be useful to assign biological activity when alignment techniques fail due to low sequence homology. The idea bases the high level of branching for the RNA graph. This work introduces the so-called Markov electrostatic potentials (k)xi(M) as a new class of RNA 2D-structure descriptors. Subsequently, we validate these molecular descriptors solving a QSAR classification problem for mycobacterial promoter sequences (mps), which constitute a very low sequence homology problem. The model developed (mps=-4.664.(0)xi(M)+0. 991.(1)xi(M)-2.432) was intended to predict whether a naturally occurring sequence is an mps or not on the basis of the calculated (k)xi(M) value for the corresponding RNA secondary structure. The RNA-QSAR approach recognises 115/135mps (85.2%) and 100% of control sequences. Average predictability and robustness were greater than 95%. A previous non-linear model predicts mps with a slightly higher accuracy (97%) but uses a very large parameter space for DNA sequences. Conversely, the (k)xi(M)-based RNA-QSAR encodes more structural information and needs only two variables.
Notes:
Guillermín Agüero-Chapin, Humberto González-Díaz, Reinaldo Molina, Javier Varona-Santos, Eugenio Uriarte, Yenny González-Díaz (2006)  Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L.   FEBS Lett 580: 3. 723-730 Feb  
Abstract: The development of 2D graph-theoretic representations for DNA sequences was very important for qualitative and quantitative comparison of sequences. Calculation of numeric features for these representations is useful for DNA-QSAR studies. Most of all graph-theoretic representations identify each one of the four bases with a unitary walk in one axe direction in the 2D space. In the case of proteins, twenty amino acids instead of four bases have to be considered. This fact has limited the introduction of useful 2D Cartesian representations and the corresponding sequences descriptors to encode protein sequence information. In this study, we overcome this problem grouping amino acids into four groups: acid, basic, polar and non-polar amino acids. The identification of each group with one of the four axis directions determines a novel 2D representation and numeric descriptors for proteins sequences. Afterwards, a Markov model has been used to calculate new numeric descriptors of the protein sequence. These descriptors are called herein the sequence 2D coupling numbers (zeta(k)). In this work, we calculated the zeta(k) values for 108 sequences of different polygalacturonases (PGs) and for 100 sequences of other proteins. A Linear Discriminant Analysis model derived here (PG=5.36.zeta1-3.98.zeta3-42.21) successfully discriminates between PGs and other proteins. The model correctly classified 100% of a subset of 81 PGs and 75 non-PG proteins sequences used to train the model. The model also correctly classified 51 out of 52 (98.07%) of proteins sequences used as external validation series. The uses of different group of amino acids and/or axes orientation give different results, so it is suggested to be explored for other databases. Finally, to illustrates the use of the model we report the isolation and prediction of the PG action for a novel sequence (AY908988) isolated by our group from Psidium guajava L. This prediction coincides very well with sequence alignment results found by the BLAST methodology. These findings illustrate the possibilities of the sequence descriptors derived for this novel 2D sequence representation in proteins sequence QSAR studies.
Notes:
Maykel Cruz-Monteagudo, Humberto González-Díaz, Eugenio Uriarte (2006)  Simple stochastic fingerprints towards mathematical modeling in biology and medicine 2. Unifying Markov model for drugs side effects.   Bull Math Biol 68: 7. 1527-1554 Oct  
Abstract: Most of present mathematical models for biological activity consider just the molecular structure. In the present article we pretend extending the use of Markov chain models to define novel molecular descriptors, which consider in addition other parameters like target site or biological effect. Specifically, this mathematical model takes into consideration not only the molecular structure but the specific biological system the drug affects too. Herein, a general Markov model is developed that describes 19 different drugs side effects grouped in eight affected biological systems for 178 drugs, being 270 cases finally. The data was processed by linear discriminant analysis (LDA) classifying drugs according to their specific side effects, forward stepwise was fixed as strategy for variables selection. The average percentage of good classification and number of compounds used in the training/predicting sets were 100/95.8% for endocrine manifestations, (18 out of 18)/(13 out of 14); 90.5/92.3% for gastrointestinal manifestations, (38 out of 42)/(30 out of 32); 88.5/86.5% for systemic phenomena, (23 out of 26)/(17 out of 20); 81.8/77.3% for neurological manifestations, (27 out of 33)/(19 out of 25); 81.6/86.2% for dermal manifestations, (31 out of 38)/(25 out of 29); 78.4/85.1% for cardiovascular manifestation, (29 out of 37)/(24 out of 28); 77.1/75.7% for breathing manifestations, (27 out of 35)/(20 out of 26) and 75.6/75% for psychiatric manifestations, (31 out of 41)/(23 out of 31). Additionally a back-projection analysis (BPA) was carried out for two ulcerogenic drugs to prove in structural terms the physical interpretation of the models obtained. This article develops a mathematical model that encompasses a large number of drugs side effects grouped in specifics biological systems using stochastic absolute probabilities of interaction ((A)pi(k)(j)) by the first time.
Notes:
2005
Humberto González-Díaz, Alcides Pérez-Bello, Eugenio Uriarte (2005)  Stochastic molecular descriptors for polymers. 3. Markov electrostatic moments as polymer 2D-folding descriptors: RNA–QSAR for mycobacterial promoters.   Polymer 46: 17. 6461-6473 Aug  
Abstract: The Quantitative StructureâProperty Relationships (QSPRs) based on Graph or Network Theory are important for predicting the properties of polymeric systems. In the three previous papers of this series (Polymer 45 (2004) 3845â3853; Polymer 46 (2005) 2791â2798; and Polymer 46 (2005) 6461â6473) we focused on the uses of molecular graph parameters called topological indices (TIs) to link the structure of polymers with their biological properties. However, there has been little effort to extend these TIs to the study of complex mixtures of artificial polymers or biopolymers such as nucleic acids and proteins. In this sense, Blood Proteome (BP) is one of the most important and complex mixtures containing protein polymers. For instance, outcomes obtained by Mass Spectrometry (MS) analysis of BP are very useful for the early detection of diseases and drug-induced toxicities. Here, we use two Spiral and Star Network representations of the MS outcomes and defined a new type of TIs. The new TIs introduced here are the spectral moments (Ïk) of the stochastic matrix associated to the Spiral graph and describe non-linear relationships between the different regions of the MS characteristic of BP. We used the MARCH-INSIDE approach to calculate the Ïk(SN) of different BP samples and S2SNet to determine several Star graph TIs. In the second step, we develop the corresponding Quantitative ProteomeâProperty Relationship (QPPR) models using the Linear Discriminant Analysis (LDA). QPPRs are the analogues of QSPRs in the case of complex biopolymer mixtures. Specifically, the new QPPRs derived here may be used to detect drug-induced cardiac toxicities from BP samples. Different Machine Learning classification algorithms were used to fit the QPPRs based on Ïk(SN), showing J48 decision tree classifier to have the best performance. These results suggest that the present approach captures important features of the complex biopolymers mixtures and opens new opportunities to the application of the idea supporting classic QSPRs in polymer sciences
Notes:
Humberto González-Díaz, Liane Saíz-Urra, Reinaldo Molina, Eugenio Uriarte (2005)  Stochastic molecular descriptors for polymers. 2. Spherical truncation of electrostatic interactions on entropy based polymers 3D-QSAR.   Polymer 46: 8. 2791-2798 Mar  
Abstract: The spherical truncation of electrostatic field with different functions break down long-range interactions at a given cutoff distance (roff) resulting in short-range ones. Consequently, a Markov Chain model may approach to the entropies of spatial distribution of charges within the polymer backbone. These entropies can be used to predict polymers properties [González-Díaz H, Molina RR, Uriarte E. Polymer 2004; 45: 3845 [53]]. Herein, we explore the effect of abrupt, shifting, force shifting, and switching truncation functions on QSAR models classifying 26 proteins with different function: lysozymes, dihydrofolate reductases, and alcohol dehydrogenases. Almost all methods have shown overall accuracies higher than 85% instead of 80.8% for models based on physicochemical parameters. The present result points to a acceptable robustness of the Markov model for different truncation schemes and roff values. The results of best accuracy 92.3% with abrupt truncation coincides with our recent communication [Bioorg Med Chem Lett 2004; 14: 4691â4695]. Nonetheless, the simpler model with three variables and high accuracy (88%) was derived with a shifting function and roff=10 Ã.
Notes:
Humberto González-Díaz, Eugenio Uriarte, Ronal Ramos de Armas (2005)  Predicting stability of Arc repressor mutants with protein stochastic moments.   Bioorg Med Chem 13: 2. 323-331 Jan  
Abstract: As more and more protein structures are determined and applied to drug manufacture, there is increasing interest in studying their stability. In this study, the stochastic moments ((SR)pi(k)) of 53 Arc repressor mutants were introduced as molecular descriptors modeling protein stability. The Linear Discriminant Analysis model developed correctly classified 43 out of 53, 81.13% of proteins according to their thermal stability. More specifically, the model classified 20/28 (71.4%) proteins with near wild-type stability and 23/25 (92%) proteins with reduced stability. Moreover, validation of the model was carried out by re-substitution procedures (81.0%). In addition, the stochastic moments based model compared favorably with respect to others based on physicochemical and geometric parameters such as D-Fire potential, surface area, volume, partition coefficient, and molar refractivity, which presented less than 77% of accuracy. This result illustrates the possibilities of the stochastic moments' method for the study of bioorganic and medicinal chemistry relevant proteins.
Notes:
Humberto González-Díaz, Esvieta Tenorio, Nilo Castañedo, Lourdes Santana, Eugenio Uriarte (2005)  3D QSAR Markov model for drug-induced eosinophilia--theoretical prediction and preliminary experimental assay of the antimicrobial drug G1.   Bioorg Med Chem 13: 5. 1523-1530 Mar  
Abstract: The application of 3D-MEDNEs as a novel alternative technique to reduce the use of animal experimentation in toxicology in the early stages of medicinal chemistry research has been extended from agranulocytosis to chemically induced eosinophilia. Firstly, a heterogeneous series of organic compounds, which are classified either as eosinophilia inductors or noninductors, was collected. A linear discriminant analysis was subsequently used to obtain a QSTR that gave rise to a very good classification of 91.82% (110 chemicals within training series). Eosinophilia inductors (88.89%) composed the first group while the other one contained only harmless compounds (97.37%). The total predictability (88.1%) was tested by means of an external validation series (42 compounds). The model correctly classifies 88.89% of harmless compounds and 87.5% of toxic ones. Finally, comparison of predicted versus experimental results for G1 [2-bromo-5-(2-bromo-2-nitroethenyl)furan, which is a promising antibacterial-antifungal compound] illustrates the practical application of the method. A dose-dependent study of G1 (9.8-185.6 mg/Kg) at 48, 72 and 96 h after oral administration in rats is reported here for the first time. The study has shown that G1 does not affect the murine eosinophils count under these conditions--a situation in total agreement with the model prediction.
Notes:
Humberto González-Díaz, Luis A Torres-Gómez, Yaima Guevara, Manuel S Almeida, Reinaldo Molina, Nilo Castañedo, Lourdes Santana, Eugenio Uriarte (2005)  Markovian chemicals "in silico" design (MARCH-INSIDE), a promising approach for computer-aided molecular design III: 2.5D indices for the discovery of antibacterials.   J Mol Model 11: 2. 116-123 Mar  
Abstract: The present work continues our series on the use of MARCH-INSIDE molecular descriptors (parts I and II: J Mol Mod 8:237-245, [2002] and 9:395-407, [2003]). These descriptors encode information pertaining to the distribution of electrons in the molecule based on a simple stochastic approach to the idea of electronegativity equalization (Sanderson's principle). Here, 3D-MARCH-INSIDE molecular descriptors for 667 organic compounds are used as input for a linear discriminant analysis. This 2.5D-QSAR model discriminates between antibacterial compounds and non-antibacterial ones with 92.9% accuracy in training sets. On the other hand, the model classifies 94.0% of the compounds in test set correctly. Additionally, the present QSAR performs similar-to-better than other methods reported elsewhere. Finally, the discovery of a novel compound illustrates the use of the method. This compound, 2-bromo-3-(furan-2-yl)-3-oxo-propionamide has MIC50 of 6.25 and 12.50 microg/mL against Pseudomonas aeruginosa ATCC 27853 and Escherichia coli ATCC 27853, respectively while ampicillin, amoxicillin, clindamycin, and metronidazole have, for instance, MIC50 values higher than 250 mug/mL against E. coli. Consequently, the present method may becomes a useful tool for the in silico discovery of antibacterials.
Notes:
Humberto González-Díaz, Maykel Cruz-Monteagudo, Reinaldo Molina, Esvieta Tenorio, Eugenio Uriarte (2005)  Predicting multiple drugs side effects with a general drug-target interaction thermodynamic Markov model.   Bioorg Med Chem 13: 4. 1119-1129 Feb  
Abstract: Most of present molecular descriptors just consider the molecular structure. In the present article we pretend extending the use of Markov chain models to define novel molecular descriptors, which consider in addition to molecular structure other parameters like target site or toxic effect. Specifically, this molecular descriptor takes into consideration not only the molecular structure but the specific system the drug affects too. Herein, it is developed a general Markov model that describes 39 different drugs side effects grouped in 11 affected systems for 301 drugs, being 686 cases finally. The data was processed by linear discriminant analysis (LDA) classifying drugs according to their specific side effects, forward stepwise was fixed as strategy for variables selection. The average percentage of good classification and number of compounds used in the training/predicting sets were 100/100% for systemic phenomena (47 out of 47)/(12 out of 12) and metabolic (18 out of 18)/(5 out of 5), muscular-skeletal (23 out of 23)/(6 out of 6) and neurological manifestations (33 out of 33)/(8 out of 8); 97.6/96.7% for cardiovascular manifestation (122 out of 125)/(30 out of 31); 97.1/97.5% for breathing manifestations (34 out of 35)/(8 out of 9); 97/99.4% for gastrointestinal manifestations (159 out of 164)/(40 out of 41); 97/95% for endocrine manifestations (32 out of 33)/(7 out of 8); 96.4/94.6% for psychiatric manifestations (53 out of 55)/(13 out of 14); 95.1/99.1% for hematological manifestations (98 out of 103)/(25 out of 26) and 88/92.3% for dermal manifestations (44 out of 50)/(12 out of 13). In addition, we report preliminary experimental reversible decrease of lymphocytes differential count after administration of the antibacterial drug G-1 in mice, which coincide with a posterior probability (P%=74.91) predicted by the model. This article develops a model that encompasses a large number of side effects grouped in specific organ systems in a single stochastic framework for the first time.
Notes:
Humberto González-Díaz, Guillermin Agüero, Miguel A Cabrera, Reinaldo Molina, Lourdes Santana, Eugenio Uriarte, Giovanna Delogu, Nilo Castañedo (2005)  Unified Markov thermodynamics based on stochastic forms to classify drugs considering molecular structure, partition system, and biological species: distribution of the antimicrobial G1 on rat tissues.   Bioorg Med Chem Lett 15: 3. 551-557 Feb  
Abstract: To date, molecular descriptors do not commonly account for important information beyond chemical structure. The present work, attempts to extend, in this sense, the stochastic molecular descriptors, incorporating information about the specific biphasic partition system, the biological species, and chemical structure inside the molecular descriptors. Consequently, MARCH-INSIDE molecular descriptors may be identified with time-dependent thermodynamic parameters (entropy and mean free energy) of partition process. A classification function was developed to classify data of 423 drugs and up to 14 different partition systems at the same time. The model has shown a high overall accuracy of 92.1% (293 out of 318 cases) in training series and 90% (36 out of 40 cases) in predicting ones. Finally, we illustrate the use of the model by predicting a high probability (%) for G1 (a novel antibacterial drug) to undergo partition on different biotic systems (rat organs): liver (97.7), spleen (97.5), lung (97.4), and adipose tissue (97.6). These theoretical results coincide with herein reported steady state plasma concentrations (c) and partition coefficients (P) in liver (c=42.25+/-7.86/P=4.75), spleen (11.47+/-4.43/P=1.29), lung (17.04+/-3.58/P=1.91), and adipose tissue (28.19+/-11.82/P=3.17). All values were relative to (14)C-labeled-radioactive-G1 in plasma (c=8.9+/-3.05) after 3h of oral administration. In closing, the present stochastic forms derive average thermodynamic parameters fitting on a more clearly physicochemical framework with respect to classic vector-matrix-vector forms, which include, as particular cases, quadratic forms such as Wiener index, Randic invariants, Zagreb descriptors, Harary index, Balaban index, and Marrero-Ponce quadratic molecular indices.
Notes:
Humberto González-Díaz, Eugenio Uriarte (2005)  Biopolymer stochastic moments. I. Modeling human rhinovirus cellular recognition with protein surface electrostatic moments.   Biopolymers 77: 5. 296-303 Apr  
Abstract: Stochastic moments may be applied as molecular descriptors in quantitative structure-activity relationship (QSAR) studies for small molecules (H. González-Dìaz et al., Journal of Molecular Modeling, 2002, Vol. 8, pp. 237-245; 2003, Vol. 9, pp. 395-407). However, applications in the field of biopolymers are less known. Recently, the MARCH-INSIDE approach has been generalized to encode structural features of proteins and other biopolymers (H. González-Dáaz et al., Bioinformatics, 2003, Vol. 19, pp. 2079-2087; Bioorganic & Medicinal Chemistry Letters, 2004, Vol. 14, pp. 4691-4695; Polymers, 2004, Vol. 45, pp. 3845-3853; Bioorganic & Medicinal Chemistry, 2005, Vol. 13, pp. 323-331). The present article attempts to extend this research by introducing for the first time stochastic moments for a surface road map of viral proteins. These moments are afterward used to seek a model that predicts the cellular receptor for human rhinoviruses. The model correctly classified 100% of 10 viruses binding to low-density lipoprotein receptor (LDLR) and 88.9% of 9 viruses binding to the intracellular adhesion molecule (ICAM) receptors in training. The same results have been obtained in four cross-validation experiments using a resubstitution technique. The present model favorably compares, in terms of complexity, with other previously reported based on entropy considerations, and offers a quantitative basis for the visual rule previously reported by Vlasak et al.
Notes:
Ornella Gia, Sebastiano Marciani Magno, Humberto González-Díaz, Elias Quezada, Lourdes Santana, Eugenio Uriarte, Lisa Dalla Via (2005)  Design, synthesis and photobiological properties of 3,4-cyclopentenepsoralens.   Bioorg Med Chem 13: 3. 809-817 Feb  
Abstract: The QSAR directed synthesis of tetracyclic psoralen derivatives (3-5) characterised by the condensation of a cyclopentane ring at the level of the 3,4 double bond of the tricyclic psoralen moiety is reported. The new compounds present a methoxy (3), a hydroxy (4) or a dimethylaminopropoxy (5) side chain inserted in position 8 of the lead chromophore. The evaluation of photoantiproliferative activity on human tumour cell lines reveals for 5 an ability to inhibit cell growth significantly higher with respect to that of the reference drug, 8-MOP. Interestingly, the enhancement in antiproliferative activity is accompanied by the disappearance of skin phototoxicity. On the other hand, no significant photobiological activity was scored for 3 and 4. The ability to photoreact with DNA, evaluated by isolating the 4',5' monoadduct and by estimating the ability to form interstrand cross-links, appeared to be significant for 5, practically negligible for 3 and 4. Furthermore, a back-projection of the more active compound identifies structural features suitable for further synthetic modifications.
Notes:
Ronal Ramos de Armas, Humberto González Díaz, Reinaldo Molina, Eugenio Uriarte (2005)  Stochastic-based descriptors studying biopolymers biological properties: extended MARCH-INSIDE methodology describing antibacterial activity of lactoferricin derivatives.   Biopolymers 77: 5. 247-256 Apr  
Abstract: Lactoferricin are a number of related peptides derived from the enzymatic cleavage of lactoferrin, an iron-binding protein. These peptides, and other peptides derived from them by simple amino acid substitutions, have shown interesting antibacterial activity. In this paper we applied the MARCH-INSIDE methodology extended to peptide and proteins, to a QSAR study related to antibacterial activity of 31 derivatives of lactoffericin against E. Coli and S. Aureus by means of Linear Discriminant (LDA) and Multiple Linear Regression Analysis (MLR). In the case of LDA we obtained models that classify correctly more than 80% of all cases (85.7% for E. Coli antibacterial activity and 83.9 for S. Aureus). With the application of a Leave-One-Out Cross Validation Procedure, the percentage of good classification of both classification models remained near the above reported values (87.1% for E. Coli antibacterial activity and 83.9 for S. Aureus). We obtained several linear regression models taking into account total and local descriptors. The inclusion of those local descriptors improved the correlation parameters, the statistical quality, and the predictive power of the former model obtained only with total descriptors. The best models explained more than 80% of the experimental variance in the antimicrobial activity of those compounds. These results are comparable with those reported previously by Strom (Strom, M. B.; Rekdal, O.; Svendesen, J. S. J Peptide Res 2001, 57, 127-139.) and Tore-Lejon (Lejon, T.; Strom, M.; Svendsen, S. J Protein Sci 2001, 7, 74-78.; Lejon, T.; Svendsen J. S.; Haug, B. E. J Peptide Sci 2002, 8, 302-306.) in a smaller dataset applying Z-scales and volume-based descriptors and PLS as statistical techniques.
Notes:
Humberto González-Díaz, Maykel Cruz-Monteagudo, Dolores Viña, Lourdes Santana, Eugenio Uriarte, Erik De Clercq (2005)  QSAR for anti-RNA-virus activity, synthesis, and assay of anti-RSV carbonucleosides given a unified representation of spectral moments, quadratic, and topologic indices.   Bioorg Med Chem Lett 15: 6. 1651-1657 Mar  
Abstract: The unified representation of spectral moments, classic topologic indices, quadratic indices, and stochastic molecular descriptors show that all these molecular descriptors lie within the same family. Consequently, the same prior probability for a successful quantitative-structure-activity-relationship (QSAR) may be expected irrespective of which indices are selected. Herein, we used stochastic spectral moments as molecular descriptors to seek a QSAR using a database of 221 bioactive compounds previously tested against diverse RNA-viruses and 402 nonactive ones. The QSAR model thus obtained correctly classifies 90.9% of compounds in training. The model also correctly classifies a total of 87.9% of 207 compounds on additional external predicting series, 73 of them having anti-RNA-virus activity and 134 nonactive ones. In addition, all compounds were regrouped into five different subsets for leave-group-out studies: (1) anti-influenza, (2) anti-picornavirus, (3) anti-paramyxovirus, (4) anti-RSV/anti-influenza, and (5) broad range anti-RNA-virus activity. The model has retained overall accuracies of about 90% on these studies validating model robustness. Finally, we exemplify the practical use of the model with the discovery of compounds 124 and 128. These compounds presented MIC50 values=3.2 and 8 microg/mL against respiratory syncytial virus (RSV) respectively. Both compounds also have low cytotoxicity expressed by their Minimal Cytotoxic Concentrations >400 microg/mL for HeLa cells. The present approach represents an effort toward a formalization and application of molecular indices in bioorganic and medicinal chemistry.
Notes:
Humberto González-Díaz, Reinaldo Molina, Eugenio Uriarte (2005)  Recognition of stable protein mutants with 3D stochastic average electrostatic potentials.   FEBS Lett 579: 20. 4297-4301 Aug  
Abstract: As more and more proteins are applied to biochemical research there is increasing interest in studying their stability. In this study, a Markov model has been used to calculate molecular descriptors of the protein structure and these are called the average electrostatic potentials (xi(k)). These descriptors were intended to encode indirect electrostatic pair-wise interactions between amino acids located at Euclidean distance k within a given 3D protein backbone. The different xi(k) values could be calculated for the protein as a whole or for specific protein regions (orbits), which include amino acids that lie within a given range of distances from the center of charge of the protein. In this work we calculated the xi(k) values for 657 mutants of different proteins. A Linear Discriminant Analysis model correctly classified a subset of 435 out of 493 proteins according to their thermal stability - a level of predictability of 88.2%. This experiment was repeated with three additional subsets of proteins selected at random from the initial series of 657. More specifically, the model predicted 314/356 (88.2%) of mutants with higher stability than the corresponding wild-type protein and 264/301 (86.7%) of proteins with near wild-type stability. These results illustrate the possibilities for the average stochastic potentials xi(k) in the study of 3D-structure/property relationships for biochemically relevant proteins.
Notes:
Humberto González-Díaz, Eugenio Uriarte (2005)  Proteins QSAR with Markov average electrostatic potentials.   Bioorg Med Chem Lett 15: 22. 5088-5094 Nov  
Abstract: Classic physicochemical and topological indices have been largely used in small molecules QSAR but less in proteins QSAR. In this study, a Markov model is used to calculate, for the first time, average electrostatic potentials xik for an indirect interaction between aminoacids placed at topologic distances k within a given protein backbone. The short-term average stochastic potential xi1 for 53 Arc repressor mutants was used to model the effect of Alanine scanning on thermal stability. The Arc repressor is a model protein of relevance for biochemical studies on bioorganics and medicinal chemistry. A linear discriminant analysis model developed correctly classified 43 out of 53, 81.1% of proteins according to their thermal stability. More specifically, the model classified 20/28, 71.4% of proteins with near wild-type stability and 23/25, 92.0% of proteins with reduced stability. Moreover, predictability in cross-validation procedures was of 81.0%. Expansion of the electrostatic potential in the series xi0, xi1, xi2, and xi3, justified the use of the abrupt truncation approach, being the overall accuracy >70.0% for xi0 but equal for xi1, xi2, and xi3.The xi1 model compared favorably with respect to others based on D-Fire potential, surface area, volume, partition coefficient, and molar refractivity, with less than 77.0% of accuracy [Ramos de Armas, R.; González-Díaz, H.; Molina, R.; Uriarte, E. Protein Struct. Func. Bioinf.2004, 56, 715]. The xi1 model also has more tractable interpretation than others based on Markovian negentropies and stochastic moments. Finally, the model is notably simpler than the two models based on quadratic and linear indices. Both models, reported by Marrero-Ponce et al., use four-to-five time more descriptors. Introduction of average stochastic potentials may be useful for QSAR applications; having xik amenable physical interpretation and being very effective.
Notes:
Maykel Cruz-Monteagudo, Humberto González-Díaz (2005)  Unified drug-target interaction thermodynamic Markov model using stochastic entropies to predict multiple drugs side effects.   Eur J Med Chem 40: 10. 1030-1041 Oct  
Abstract: Most of present molecular descriptors consider just the molecular structure. In the present article we pretend extending the use of Markov chain (MC) models to define novel molecular descriptors, which consider in addition other parameters like target site or toxic effect. Specifically, this molecular descriptor takes into consideration not only the molecular structure but the specific system the drug affects too. Herein, it is developed a general Markov model that describes 21 different drugs side effects grouped in 10 affected biological systems for 193 drugs, being 311 cases finally. The data were processed by linear discriminant analysis (LDA) classifying drugs according to their specific side effects, forward stepwise was fixed as strategy for variables selection. The average percentage of good classification and number of compounds used in the training/predicting sets were 92.6/91.7% for cardiovascular manifestation (25 out of 27)/(18 out of 20); 89.3/83.9% for dermal manifestations (25 out of 18)/(18 out of 21); 88.9/88.9% for endocrine manifestations (16 out of 18)/(12 out of 14); 88.9/88.2% for psychiatric manifestations (32 out of 36)/(24 out of 27); 88.5/85.6% for systemic phenomena (23 out of 26)/(17 out of 20); 85.7/91.7% for gastrointestinal manifestations (36 out of 42)/(29 out of 32); 83.3/79.2% for metabolic manifestations (15 out of 18)/(11 out of 14); 81.8/78.0% for neurological manifestations (27 out of 33)/(20 out of 25); 75.0/74.0% for hematological manifestations (36 out of 48)/(27 out of 36) and 74.3/72.8% for breathing manifestations (26 out of 35)/(19 out of 26). Finally, application of back-projection analysis (BPA) provides physic interpretation in structural terms through molecular graphics of the toxic effects predicted with these QSTR models. This article develops a mathematical model that encompasses a large number of drugs side effects grouped in specifics systems using stochastic entropies of interaction (Thetak (j)) by the first time.
Notes:
Humberto González-Díaz, Guillermín Agüero-Chapin, Javier Varona-Santos, Reinaldo Molina, Gustavo de la Riva, Eugenio Uriarte (2005)  2D RNA-QSAR: assigning ACC oxidase family membership with stochastic molecular descriptors; isolation and prediction of a sequence from Psidium guajava L.   Bioorg Med Chem Lett 15: 11. 2932-2937 Jun  
Abstract: Quantitative structure-activity relationship (QSAR) techniques for small molecules could be applied to nucleic acids. Unfortunately, almost all molecular descriptors are more successful at encoding branching information than sequences and/or cannot be back-projected. A solution for scaling the QSAR problem up to RNA may be to transform sequences into secondary structures first. Our group has used Markovian negentropies as molecular descriptors for drug design with preliminary results in bioinformatics [Bioinformatics 2003, 19, 2079]. However, RNA-QSAR studies on RNA molecules have not been described to date. Novel Markovian negentropies have been introduced here as molecular descriptors for 2D-RNA structures. An RNA-QSAR study of the ACC proteins from different plants has been carried out. The QSAR recognizes 19/20 sequences (95.0%) within the ACC family and 12/17 (70.6%) of the control group sequences. The model has a high Matthews' regression coefficient (C = 0.68). Overall cross-validation average accuracies were 14 out of 15 for ACC sequences (93.3%) and 10 out of 13 for control sequences (76.9%). Finally, ACC oxidase family membership was assigned to a new sequence isolated for the first time in this work from Psidium guajava L. A backprojection map for this sequence identifies the left stem (40%) and the main stem (45%) as highly important substructures. Results of an nBLAST experiment are consistent with this finding and indicate a high conservation score (>70) for left stem and main stem; whereas major loop, right stem, cap and major loop right half were hardly conserved.
Notes:
Aliuska Morales Helguera, Miguel Angel Cabrera Pérez, Maykel Pérez González, Reinaldo Molina Ruiz, Humberto González-Díaz (2005)  A topological substructural approach applied to the computational prediction of rodent carcinogenicity.   Bioorg Med Chem 13: 7. 2477-2488 Apr  
Abstract: The carcinogenic activity has been investigated by using a topological substructural molecular design approach (TOPS-MODE). A discriminant model was developed to predict the carcinogenic and noncarcinogenic activity on a data set of 189 compounds. The percentage of correct classification was 76.32%. The predictive power of the model was validated by three test: an external test set (compounds not used in the develop of the model, with a 72.97% of good classification), a leave-group-out cross-validation procedure (4-fold full cross-validation, removing 20% of compounds in each cycle, with a good prediction of 76.31%) and two external prediction sets (the first and second exercises of the National Toxicology Program). This methodology evidenced that the hydrophobicity increase the carcinogenic activity and the dipole moment of the molecule decrease it; suggesting the capacity of the TOPS-MODE descriptors to estimate this property for new drug candidates. Finally, the positive and negative fragment contributions to the carcinogenic activity were identified (structural alerts) and their potentialities in the lead generation process and in the design of 'safer' chemicals were evaluated.
Notes:
Liane Saíz-Urra, Humberto González-Díaz, Eugenio Uriarte (2005)  Proteins Markovian 3D-QSAR with spherically-truncated average electrostatic potentials.   Bioorg Med Chem 13: 11. 3641-3647 Jun  
Abstract: Proteins 3D-QSAR is an emerging field of bioorganic chemistry. However, the large dimensions of the structures to be handled may become a bottleneck to scaling up classic QSAR problems for proteins. In this sense, truncation approach could be used as in molecular dynamic to perform timely calculations. The spherical truncation of electrostatic field with different functions breaks down long-range interactions at a given cutoff distance (r(off)) resulting in short-range ones. Consequently, a Markov chain model may approach to the average electrostatic potentials of spatial distribution of charges within the protein backbone. These average electrostatic potentials can be used to predict proteins properties. Herein, we explore the effect of abrupt, shifting, force shifting, and switching truncation functions on 3D-QSAR models classifying 26 proteins with different functions: lysozymes, dihydrofolate reductases, and alcohol dehydrogenases. Almost all methods have shown overall accuracies higher than 73%. The present result points to an acceptable robustness of the MC for different truncation schemes and r(off) values. The results of best accuracy 92% with abrupt truncation coincide with our recent communication. We also developed models with the same accuracy value for other truncation functions; however they are more complex functions. PCA analysis for 152 non-homologous proteins has shown that there are five main eigenvalues, which explain more than 87% of the variance of the studied properties. The present molecular descriptors may encode structural information not totally accounted for the previous ones, so success with these descriptors could be expected when classic fails. The present result confirms the utility of our Markov models combined with truncation approach to generate bioorganic structure protein molecular descriptors for QSAR.
Notes:
2004
Yovani Marrero Ponce, Miguel A Cabrera Pérez, Vicente Romero Zaldivar, Humberto González-Díaz, Francisco Torrens (2004)  A new topological descriptors based model for predicting intestinal epithelial transport of drugs in Caco-2 cell culture.   J Pharm Pharm Sci 7: 2. 186-199 Jun  
Abstract: PURPOSE: Quantitative Structure-Permeability Relationships (QSPerR) of the intestinal permeability across the (Caco-2) cells monolayer could be obtained by the application of new molecular descriptors. METHOD: A novel topologic-molecular approach to computer molecular design ( TOMOCOMD-CARDD ) has been used to estimate the intestinal-epithelial transport of drug in Caco-2 cell culture. RESULTS: The Permeability Coefficients in Caco-2 cells (P) for 33 structurally diverse drugs were well described using quadratic indices of the molecular pseudograph's atom adjacency matrix as molecular descriptors. A quantitative model that discriminates the high-absorption compounds from those with moderate-poor absorption was obtained for the training data set, showing a global classification of 87.87%. In addition, two QSPerR models, through a multiple linear regression, were obtained to predict the P [apical to basolateral (AP-->BL) and basolateral to apical (BL-->AP)]. A leave- n -out and leave- one -out cross-validation procedure revealed that the discriminant and regression models respectively, had a good predictability. Furthermore, others 18 drugs were selected as a test set in order to assess the predictive power of the models and the accuracy of the final prediction was similar to achieve for the data set. Besides, the use of both regression models, in a combinative way, is possible to predict the Permeability Directional Ratio (PDR, BL-->AP/AP-->BL) value. The found models were used in virtual screening of drug intestinal permeability and a relationship between calculated P and percentage of human intestinal absorption for several compounds was established. Furthermore, this approximation permits us to obtain a good explanation of the experiment based on the molecular structural features. CONCLUSIONS: These results suggest that the proposed method is able to predict the P values and it proved to be a good tool for studying the oral absorption of drug candidates during the drug development process.
Notes:
Ronal Ramos de Armas, Humberto González-Díaz, Reinaldo Molina, Maykel Pérez González, Eugenio Uriarte (2004)  Stochastic-based descriptors studying peptides biological properties: modeling the bitter tasting threshold of dipeptides.   Bioorg Med Chem 12: 18. 4815-4822 Sep  
Abstract: MARCH-INSIDE methodology was applied to the prediction of the bitter tasting threshold of 48 dipeptides by means of pattern recognition techniques, in this case linear discriminant analysis (LDA), and regression methods. The LDA models yielded a percentage of good classification higher than 80% with the two main families of descriptor generated by this methodology (95.8% for self return probability and 83.3% using electronic delocalization entropy). The regression models can explain more than 80% of the experimental variance of the independent variable. Two regression models were obtained with R(2) values of 0.82 and 0.88 for the whole data and the data without two outliers, respectively; having a standard deviation of 0.27 and 0.23. The predictive power of the obtained equations was assessed by the Leave-One-Out cross validation procedures, giving the same percentages of good classification as in the training set, in the LDA models, and yielding values of q(2) of 0.78 and 0.86 in the regression model, respectively. The validation of this methodology was also carried out by comparison with previous reports modeling this data with other well-known methodologies, even 3-D molecular descriptors.
Notes:
Ronal Ramos de Armas, Humberto González-Díaz, Reinaldo Molina, Eugenio Uriarte (2004)  Markovian Backbone Negentropies: Molecular descriptors for protein research. I. Predicting protein stability in Arc repressor mutants.   Proteins 56: 4. 715-723 Sep  
Abstract: As more and more protein structures are determined and applied to drug manufacture, there is increasing interest in studying their stability. In this sense, developing novel computational methods to predict and study protein stability in relation to their amino acid sequences has become a significant goal in applied Proteomics. In the study described here, Markovian Backbone Negentropies (MBN) have been introduced in order to model the effect on protein stability of a complete set of alanine substitutions in the Arc repressor. A total of 53 proteins were studied by means of Linear Discriminant Analysis using MBN as molecular descriptors. MBN are molecular descriptors based on a Markov chain model of electron delocalization throughout the protein backbone. The model correctly classified 43 out of 53 (81.13%) proteins according to their thermal stability. More specifically, the model classified 20/28 (71.4%) proteins with near wild-type stability and 23/25 (92%) proteins with reduced stability. Moreover, the model presented a good Mathew's regression coefficient of 0.643. Validation of the model was carried out by several Jackknife procedures. The method compares favorably with surface-dependent and thermodynamic parameter stability scoring functions. For instance, the D-FIRE potential classification function shows a level of good classification of 76.9%. On the other hand, surface, volume, logP, and molar refractivity show accuracies of 70.7, 62.3, 59.0, and 60.0%, respectively.
Notes:
Maykel Pérez González, Humberto González-Díaz, Miguel Angel Cabrera, Reinaldo Molina Ruiz (2004)  A novel approach to predict a toxicological property of aromatic compounds in the Tetrahymena pyriformis.   Bioorg Med Chem 12: 4. 735-744 Feb  
Abstract: The TOPological Substructural MOlecular DEsign (TOPS-MODE) has been successfully used in order to explain the toxicity in the Tetrahymena pyriformis on a large data set. The obtained models for the training set had good statistical parameters (R(2)=0.72-0.81, p<0.05) an also the prediction power of the models found was adequate (Q(2)=0.70-0.80). A detailed study of the influence of variable numbers in the equation and the statistical outliers was carried out; leading to a good final model with a better physicochemical interpretation than the rest of the published models. Only two molecular descriptors codifying dipolar and hydrophobic features were introduced. Finally, the fragment contributions to the toxicity prediction evidenced the powerful of this topological approach.
Notes:
Miguel Angel Cabrera Pérez, Marival Bermejo Sanz, Liliana Ramos Torres, Ricardo Grau Avalos, Maykel Pérez González, Humberto González Díaz (2004)  A topological sub-structural approach for predicting human intestinal absorption of drugs.   Eur J Med Chem 39: 11. 905-916 Nov  
Abstract: The human intestinal absorption (HIA) of drugs was studied using a topological sub-structural approach (TOPS-MODE). The drugs were divided into three classes according to reported cutoff values for HIA. "Poor" absorption was defined as HIA < or =30%, "high" absorption as HIA > or =80%, whereas "moderate" absorption was defined between these two values (30% < HIA < 79%). Two linear discriminant analyses were carried out on a training set of 82 compounds. The percentages of correct classification, for both models, were 89.02%. The predictive power of the models were validated by three test: a leave-one-out cross validation procedure (88.9% and 87.9%), an external prediction set of 127 drugs (92.9% and 80.31%) and a test set of 109 oral drugs with bioavailability values reported (93.58% and 91.84%). Finally, positive and negative sub-structural contributions to the HIA were identified and their possibilities in the lead generation and optimization process were evaluated.
Notes:
Enrique Molina, Humberto González-Díaz, Maykel Pérez González, Elismary Rodríguez, Eugenio Uriarte (2004)  Designing antibacterial compounds through a topological substructural approach.   J Chem Inf Comput Sci 44: 2. 515-521 Mar/Apr  
Abstract: A novel application of TOPological Substructural MOlecular DEsign (TOPS-MODE) was carried out in antibacterial drugs using computer-aided molecular design. Two series of compounds, one containing antibacterial and the other containing non-antibacterial compounds, were processed by a k-means cluster analysis in order to design training and predicting series. All clusters had a p-level < 0.005. Afterward, a linear classification function has been derived toward discrimination between antibacterial and non-antibacterial compounds. The model correctly classifies 94% of active and 86% of inactive compounds in the training series. More specifically, the model showed a global good classification of 91%, i.e., 263 cases out of 289. In predicting series, the model has shown overall predictabilities of 91 and 83% for active and inactive compounds, respectively. Thereby, the model has a global percentage of good classification of 89%. The TOPS-MODE approach, also, similarly compares with respect to one of the most useful models for antimicrobials selection reported to date.
Notes:
Maykel Pérez González, Luiz Carlos Dias, Aliuska Morales Helguera, Yanisleidy Morales Rodríguez, Luciana Gonzaga de Oliveira, Luis Torres Gomez, Humberto González-Díaz (2004)  TOPS-MODE based QSARs derived from heterogeneous series of compounds. Applications to the design of new anti-inflammatory compounds.   Bioorg Med Chem 12: 16. 4467-4475 Aug  
Abstract: A new application of TOPological Sub-structural MOlecular DEsign (TOPS-MODE) was carried out in anti-inflammatory compounds using computer-aided molecular design. Two series of compounds, one containing anti-inflammatory and the other containing nonanti-inflammatory compounds were processed by a k-means cluster analysis in order to design the training and prediction sets. A linear classification function to discriminate the anti-inflammatory from the inactive compounds was developed. The model correctly and clearly classified 88% of active and 91% of inactive compounds in the training set. More specifically, the model showed a good global classification of 90%, that is, (399 cases out of 441). While in the prediction set, they showed an overall predictability of 88% and 84% for active and inactive compounds, being the global percentage of good classification of 85%. Furthermore this paper describes a fragment analysis in order to determine the contribution of several fragments towards anti-inflammatory property, also the present of halogens in the selected fragments were analyzed. It seems that the present TOPS-MODE based QSAR is the first alternate general 'in silico' technique to experimentation in anti-inflammatory discovery.
Notes:
Humberto González-Díaz, Iyusmila Bastida, Nilo Castañedo, Oslay Nasco, Ervelio Olazabal, Alcidez Morales, Hector S Serrano, Ronal Ramos de Armas (2004)  Simple stochastic fingerprints towards mathematical modelling in biology and medicine. 1. The treatment of coccidiosis.   Bull Math Biol 66: 5. 1285-1311 Sep  
Abstract: We have developed a classification function that is capable of discriminating between anticoccidial and nonanticoccidial compounds with different structural patterns. For this purpose, we calculated the Markovian electron delocalization negentropies of several compounds. These molecular descriptors, which act as molecular fingerprints, are derived from an electronegativity-weighted stochastic matrix (1Pi). The method attempts to describe the delocalization of electrons with time during the process of molecule formation by considering the 3D environment of the atoms. Accordingly, the entropies of this random process are used as molecular descriptors. The present study involves a stochastic generalization of the original idea described by Kier, which concerned the use of molecular negentropies in QSAR. Linear discriminant analysis allowed us to fit the discriminant function. This function has given rise to a good classification of 82.35% (28 anticoccidials out of 34) and 91.8% of inactive compounds (56/61) in training series. An overall classification of 88.42% (84/95) was achieved. Validation of the model was carried out by means of an external predicting series and this gave a global predictability of 93.1%. Finally, we report the experimental assay (more than 95% of lesion control) of two compounds selected from a large data set through virtual screening. We conclude that the approach described here seems to be a promising 3D-QSAR tool based on the mathematical theory of stochastic processes.
Notes:
Humberto González-Díaz, Reinaldo Molina, Eugenio Uriarte (2004)  Markov entropy backbone electrostatic descriptors for predicting proteins biological activity.   Bioorg Med Chem Lett 14: 18. 4691-4695 Sep  
Abstract: The spherical truncation of electrostatic interactions between aminoacids makes it possible to break down long-range spatial electrostatic interactions, resulting in short-range interactions. As a result, a Markov Chain model may be used to calculate the probabilities with which the effect of a given interaction reaches aminoacids at different distances within the backbone. The entropies of a Markov Chain model of this type may then be used to codify information about the spatial distribution of charges in the protein used in this study exploring the structure-activity relationship. In this paper, a linear discriminant analysis is reported, which correctly classified 92.3% of 26 under investigation in training and leave-one-out cross validation, purely for illustrative purposes. Classification was carried out for three possible activities: lysozymes, dihydrofolate reductases, and alcohol dehydrogenases. The discriminant analysis equations were contracted into two canonical roots. These simple canonical roots have high regression coefficients (R(c1)=0.903 and R(c2)=0.70). Root1 explains the biological activity of alcohol dehydrogenases while Root2 discriminates between lysozymes and dihydrofolate reductases. It was possible to profile the effect of core, middle, and surface aminoacids on biological activity. In contrast, a model considering classic physicochemical parameters such as: polarizability, refractivity, and partition coefficient classify correctly only the 80.8% of the proteins.
Notes:
Yovani Marrero Ponce, Ricardo Medina Marrero, Eduardo A Castro, Ronal Ramos de Armas, Humberto González-Díaz, Vicente Romero Zaldivar, Francisco Torrens (2004)  Protein quadratic indices of the "macromolecular pseudograph's alpha-carbon atom adjacency matrix". 1. Prediction of Arc repressor alanine-mutant's stability.   Molecules 9: 12. 1124-1147 12  
Abstract: This report describes a new set of macromolecular descriptors of relevance to protein QSAR/QSPR studies, protein's quadratic indices. These descriptors are calculated from the macromolecular pseudograph's alpha-carbon atom adjacency matrix. A study of the protein stability effects for a complete set of alanine substitutions in Arc repressor illustrates this approach. Quantitative Structure-Stability Relationship (QSSR) models allow discriminating between near wild-type stability and reduced-stability A-mutants. A linear discriminant function gives rise to excellent discrimination between 85.4% (35/41)and 91.67% (11/12) of near wild-type stability/reduced stability mutants in training and test series, respectively. The model's overall predictability oscillates from 80.49 until 82.93, when n varies from 2 to 10 in leave-n-out cross validation procedures. This value stabilizes around 80.49% when n was > 6. Additionally, canonical regression analysis corroborates the statistical quality of the classification model (Rcanc = 0.72, p-level <0.0001). This analysis was also used to compute biological stability canonical scores for each Arc A-mutant. On the other hand, nonlinear piecewise regression model compares favorably with respect to linear regression one on predicting the melting temperature (tm)of the Arc A-mutants. The linear model explains almost 72% of the variance of the experimental tm (R = 0.85 and s = 5.64) and LOO press statistics evidenced its predictive ability (q2 = 0.55 and scv = 6.24). However, this linear regression model falls to resolve t(m) predictions of Arc A-mutants in external prediction series. Therefore, the use of nonlinear piecewise models was required. The tm values of A-mutants in training (R = 0.94) and test(R = 0.91) sets are calculated by piecewise model with a high degree of precision. A break-point value of 51.32 degrees C characterizes two mutants' clusters and coincides perfectly with the experimental scale. For this reason, we can use the linear discriminant analysis and piecewise models in combination to classify and predict the stability of the mutants' Arc homodimers. These models also permit the interpretation of the driving forces of such a folding process. The models include protein's quadratic indices accounting for hydrophobic (z1), bulk-steric (z2), and electronic (z3) features of the studied molecules.Preponderance of z1 and z3 over z2 indicates the higher importance of the hydrophobic and electronic side chain terms in the folding of the Arc dimer. In this sense, developed equations involve short-reaching (k < or = 3), middle- reaching (3 < k < or = 7) and far-reaching (k= 8 or greater) z1, 2, 3-protein's quadratic indices. This situation points to topologic/topographic protein's backbone interactions control of the stability profile of wild-type Arc and its A-mutants. Consequently, the present approach represents a novel and very promising way to mathematical research in biology sciences.
Notes:
Maykel Pérez González, Aliuska Morales Helguera, Humberto González-Díaz (2004)  A TOPS-MODE approach to predict permeability coefficients.   Polymer 45: 6. 2073-2079 Mar  
Abstract: The TOPological Sub-Structural Molecular Design (TOPS-MODE) approach has been applied to the study of the permeability coefficient of various compounds through low-density polyethylene at 21.1 °C. A model able to describe closed to 90% of the variance in the experimental permeability of 63 organic compounds was developed with the use of the mentioned approach. In contrast, no one of nine different approaches, including the use of constitutional, topological, BCUT, 2D autocorrelations, geometrical, RDF, 3D Morse, WHIM and GETAWAY descriptors was able to explain more than 73% of the variance in the mentioned property with the same number of descriptors. In addition, genetic algorithms were used in feature selection experiments considering all molecular descriptors in order to obtain mixed models. Although, statistically significant models were derived containing other descriptors than spectral moments still the best one fitted out model was find with these variables. Finally, the TOPS-MODE approach permitted to find the contribution of different fragments to the permeability coefficients giving to the model a straightforward structural interpretability.
Notes:
Yovani Marrero Ponce, Humberto González-Díaz, Vicente Romero Zaldivar, Francisco Torrens, Eduardo A Castro (2004)  3D-chiral quadratic indices of the 'molecular pseudograph's atom adjacency matrix' and their application to central chirality codification: classification of ACE inhibitors and prediction of sigma-receptor antagonist activities.   Bioorg Med Chem 12: 20. 5331-5342 Oct  
Abstract: Quadratic indices of the 'molecular pseudograph's atom adjacency matrix' have been generalized to codify chemical structure information for chiral drugs. These 3D-chiral quadratic indices make use of a trigonometric 3D-chirality correction factor. These indices are nonsymmetric and reduced to classical (2D) descriptors when symmetry is not codified. By this reason, it is expected that they will be useful to predict symmetry-dependent properties. 3D-Chirality quadratic indices are real numbers and thus, can be easily calculated in TOMOCOMD-CARDD software. These descriptors circumvent the inability of conventional 2D quadratic indices (Molecules 2003, 8, 687-726. http://www.mdpi.org) and other (chirality insensitive) topological indices to distinguish sigma-stereoisomers. In this paper, we extend our earlier work by applying 3D-chirality quadratic indices to two data sets containing chiral compounds. Consequently, in order to test the potential of this novel approach in drug design we have modelled the angiotesin-converting enzyme inhibitory activity of perindoprilate's sigma-stereoisomers combinatorial library. Two linear discriminant analysis (LDA) models were obtained. The first one model was performed considering all data set as training series and classifies correctly 88.89% of active compounds and 100.00% of nonactive one for a global good classification of 96.87%. The second one LDA-QSAR model classified correctly 83.33% of the active and 100.00% of the inactive compounds in a training set, result that represent a total of 95.65% accuracy in classification. On the other hand, the model classifies 100.00% of these compounds in the test set. Similar predictive behaviour was observed in a leave-one-out cross-validation procedure for both equations. Canonical regression analysis corroborated the statistical quality of these models (R(can) of 0.82 and of 0.76, respectively) and was also used to compute biology activity canonical scores for each compound. Finally, prediction of the biological activities of chiral 3-(3-hydroxyphenyl)piperidines, which are sigma-receptor antagonists, by linear multiple regression analysis was carried out. Two statistically significant QSAR models were obtained (R2=0.940, s=0.270 and R2=0.977, s=0.175). These models showed high stability to data variation in the leave-one-out cross-validation procedure (q2=0.912, scv=0.289 and q2=0.957, scv=0.211). The results of this study compare favourably with those obtained with other chirality descriptors applied to the same data set. The 3D-chiral TOMOCOMD-CARDD approach provides a powerful alternative to 3D-QSAR.
Notes:
2003
Maykel Pérez González, Humberto González-Díaz, Reinaldo Molina Ruiz, Miguel A Cabrera, Ronal Ramos de Armas (2003)  TOPS-MODE based QSARs derived from heterogeneous series of compounds. Applications to the design of new herbicides.   J Chem Inf Comput Sci 43: 4. 1192-1199 Jul/Aug  
Abstract: A new application of TOPological Sub-structural MOlecular DEsign (TOPS-MODE) was carried out in herbicides using computer-aided molecular design. Two series of compounds, one containing herbicide and the other containing nonherbicide compounds, were processed by a k-Means Cluster Analysis in order to design the training and prediction sets. A linear classification function to discriminate the herbicides from the nonherbicide compounds was developed. The model correctly and clearly classified 88% of active and 94% of inactive compounds in the training set. More specifically, the model showed a good global classification of 91%, i.e., (168 cases out of 185). While in the prediction set, they showed an overall predictability of 91% and 92% for active and inactive compounds, being the global percentage of good classification of 92%. To assess the range of model applicability, a virtual screening of structurally heterogeneous series of herbicidal compounds was carried out. Two hundred eighty-four out of 332 were correctly classified (86%). Furthermore this paper describes a fragment analysis in order to determine the contribution of several fragments toward herbicidal property; also the present of halogens in the selected fragments were analyzed. It seems that the present TOPS-MODE based QSAR is the first alternate general "in silico" technique to experimentation in herbicides discovery.
Notes:
Ernesto Estrada, Humberto González-Díaz (2003)  What are the limits of applicability for graph theoretic descriptors in QSPR/QSAR? Modeling dipole moments of aromatic compounds with TOPS-MODE descriptors.   J Chem Inf Comput Sci 43: 1. 75-84 Jan/Feb  
Abstract: The numerous possibilities of using graph theoretic descriptors in QSPR/QSAR are analyzed, and some misunderstandings on the role of this theoretical approach in chemistry are clarified. Principal component analysis is used to obtain a property space for several physicochemical properties of aromatic compounds. It is proved that most of the QSPR applications of the graph-theoretic structure descriptors are concentrated to the description of properties in a very limited region of this property space. Here, we show that graph-theoretic approaches are also applicable to the modeling of physicochemical properties that are far away from this region traditionally studied. The molecular dipole moments of benzene derivatives, mono-, ortho-, meta-, and para-susbtituted, are modeled by using the Topological Sub-Structural Molecular Design (TOPS-MODE) approach. The TOPS-MODE approach used permits to calculate group dipole moments that are given for several substituents. The differences between these group dipoles and those obtained by simple difference between experimental values are analyzed. Some difficulties arising from this traditional way of deriving substituent constants are identified and analyzed.
Notes:
E Estrada, E Uriarte, Y Gutierrez, Humberto González-Díaz (2003)  Quantitative structure-toxicity relationships using TOPS-MODE. 3. Structural factors influencing the permeability of commercial solvents through living human skin.   SAR QSAR Environ Res 14: 2. 145-163 Apr  
Abstract: The permeability of a series of 12 commercial solvents through living human skin were studied by using a topological sub-structural approach (TOPS-MODE). We first analyzed the influence of several physicochemical parameters used in describing the skin permeability of the solvents. No single significant relationship was found between any of these physicochemical properties and the permeability of the solvents. A QSAR model using TOPS-MODE descriptors was obtained and validated. This model accounted for more than 95% of the variance in the experimental permeability of these solvents. Using the derived model, the structural factors responsible for the permeability of this series of solvents through living human skin were identified. Methyl groups bonded to heteroatoms or to CH2 groups resulted in the greatest contributions to skin permeability and these groups were considered to be "permeability enhancers". In contrast, groups of the type X = O (X = S, C) were found to be "permeability inhibitors" because they possessed negative contributions to the logarithm of permeability in all of the studied solvents. Drawing on the idea of permeability "enhancers" and "inhibitors", we hypothesized that the solvents needed to orientate themselves in front of the stratum corneum layer first before penetrating through the skin.
Notes:
2002
Humberto González-Díaz, Ervelio Olazabal, Nilo Castañedo, Ivan Hernádez Sánchez, Alcidez Morales, Hector S Serrano, Julio González, Ronal Ramos de Armas (2002)  Markovian chemicals "in silico" design (MARCH-INSIDE), a promising approach for computer aided molecular design II: experimental and theoretical assessment of a novel method for virtual screening of fasciolicides.   J Mol Model 8: 8. 237-245 Aug  
Abstract: A novel method for in silico selection of fluckicidal drugs is introduced. Two QSARs that permit us to discriminate between fasciolicide and non-fasciolicide drugs (the first) and to outline some conclusions about the possible mechanism of action of a chemical (the second) are performed. The first model correctly classified 93.85% of compounds in the training series and 89.5% of the compounds in the predicting one. This model correctly classified 87.7, 93.8, 92.2 and 93.9% of compounds in leave- n-out cross validation procedures when n takes values from 2 to until 6. The model seems to be stable in around 92% of good classification in leave- n-out cross validation analysis when n>6. The second model correctly classified 70% of non-fasciolicide compounds, 85.71% of beta-tubulin inhibitors and 100% of proton ionophores in the training set. This model recognizes as proton ionophores 100% of any nitrosalicylanilides in the predicting series. Both models have a low p-level <0.05. Finally, the experimental assay of six organic chemicals by an in vivo test permit us to carry out an assessment of the model with a fairly good 100% agreement between experiment and theoretical prediction.
Notes:
Miguel Angel Cabrera Pérez, Humberto González-Díaz, Carlos Fernández Teruel, José Ma Plá-Delfina, Marival Bermejo Sanz (2002)  A novel approach to determining physicochemical and absorption properties of 6-fluoroquinolone derivatives: experimental assessment.   Eur J Pharm Biopharm 53: 3. 317-325 May  
Abstract: The ToSS MoDe approach is used to estimate the n-octanol/buffer partition coefficient, the apparent intestinal absorption rate constant and intestinal permeability from a 6-fluoroquinolone data set. Improved in silico methods for predicting a drug's ability to be transported across biological membranes and other biopharmaceutical properties is highly desirable to optimize new drug development. The physicochemical property (Log P) of 26 6-fluoroquinolone derivatives and the absorption properties (Log K(a) and Log P(eff)) of 21 derivatives were well described by the present approach. The models obtained confirm the important role of lipophilicity in the absorption process and its relation with the piperazinyl ring spectral moment and general local spectral moment. The normalized group contributions to each property, at the R4 and R5 positions of a 6-fluoroquinolone framework, were calculated. Principal factor analysis between these contributions and the Hammett and Hansch constant, molar refractivity and sterimol parameters was also carried out. Three principal factors explained 78% of the total variance and the correlation coefficients were higher than 0.98. The isocontribution zone analysis for the Log P and Log K(a) of Sarafloxacin and Sparfloxacin, used as external corroboration compounds, was carried out. The absorption rate constants (in situ rat gut technique) for these drugs were also evaluated, and the results were compared with the values predicted by theoretical models for evaluating predictive performance. The present approach proved to be a good method for studying the oral absorption of drug candidates in drug development studies.
Notes:
Powered by publicationslist.org.