hosted by
publicationslist.org
    
José L. Oliver
Dpto.de Genética
Facultad de Ciencias
Universidad de Granada
Granada, Spain
oliver@ugr.es

Journal articles

2008
 
DOI   
PMID 
J L Oliver, P Bernaola-Galván, M Hackenberg, P Carpena (2008)  Phylogenetic distribution of large-scale genome patchiness.   BMC Evol Biol 8: 1. Apr  
Abstract: ABSTRACT: BACKGROUND: The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness) has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. RESULTS: The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris), birds (Gallus gallus), fishes (Danio rerio), invertebrates (Drosophila melanogaster and Caenorhabditis elegans), plants (Arabidopsis thaliana) and yeasts (Saccharomyces cerevisiae). We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. CONCLUSIONS: Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.
Notes:
2007
 
PMID 
P Carpena, P Bernaola-Galván, A V Coronado, M Hackenberg, J L Oliver (2007)  Identifying characteristic scales in the human genome.   Phys Rev E Stat Nonlin Soft Matter Phys 75: 3 Pt 1. Mar  
Abstract: The scale-free, long-range correlations detected in DNA sequences contrast with characteristic lengths of genomic elements, being particularly incompatible with the isochores (long, homogeneous DNA segments). By computing the local behavior of the scaling exponent alpha of detrended fluctuation analysis (DFA), we discriminate between sequences with and without true scaling, and we find that no single scaling exists in the human genome. Instead, human chromosomes show a common compositional structure with two characteristic scales, the large one corresponding to the isochores and the other to small and medium scale genomic elements.
Notes:
2006
 
DOI   
PMID 
Michael Hackenberg, Christopher Previti, Pedro Luis Luque-Escamilla, Pedro Carpena, José Martínez-Aroza, José L Oliver (2006)  CpGcluster: a distance-based algorithm for CpG-island detection.   BMC Bioinformatics 7: 10  
Abstract: BACKGROUND: Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content. RESULTS: Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A new algorithm (CpGcluster) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a p-value to each of these clusters, the most statistically significant ones can be predicted as CGIs. CpGcluster was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. CpGcluster reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, CpGcluster can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by CpGcluster present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). CpGcluster's CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying CpGcluster as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome. CONCLUSION: CpGcluster uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in CpGcluster is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands (neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for CpGcluster predictions.
Notes:
2005
 
DOI   
PMID 
Michael Hackenberg, Pedro Bernaola-Galván, Pedro Carpena, José L Oliver (2005)  The biased distribution of Alus in human isochores might be driven by recombination.   J Mol Evol 60: 3. 365-377 Mar  
Abstract: Alu retrotransposons do not show a homogeneous distribution over the human genome but have a higher density in GC-rich (H) than in AT-rich (L) isochores. However, since they preferentially insert into the L isochores, the question arises: What is the evolutionary mechanism that shifts the Alu density maximum from L to H isochores? To disclose the role played by each of the potential mechanisms involved in such biased distribution, we carried out a genome-wide analysis of the density of the Alus as a function of their evolutionary age, isochore membership, and intron vs. intergene location. Since Alus depend on the retrotransposase encoded by the LINE1 elements, we also studied the distribution of LINE1 to provide a complete evolutionary scenario. We consecutively check, and discard, the contributions of the Alu/LINE1 competition for retrotransposase, compositional matching pressure, and Alu overrepresentation in introns. In analyzing the role played by unequal recombination, we scan the genome for Alu trimers, a direct product of Alu-Alu recombination. Through computer simulations, we show that such trimers are much more frequent than expected, the observed/expected ratio being higher in L than in H isochores. This result, together with the known higher selective disadvantage of recombination products in H isochores, points to Alu-Alu recombination as the main agent provoking the density shift of Alus toward the GC-rich parts of the genome. Two independent pieces of evidence-the lower evolutionary divergence shown by recently inserted Alu subfamilies and the higher frequency of old stand-alone Alus in L isochores-support such a conclusion. Other evolutionary factors, such as population bottlenecks during primate speciation, may have accelerated the fast accumulation of Alus in GC-rich isochores.
Notes:
 
PMID 
Pedro Luis Luque-Escamilla, José Martínez-Aroza, José L Oliver, Juan Francisco Gómez-Lopera, Ramón Román-Roldán (2005)  Compositional searching of CpG islands in the human genome.   Phys Rev E Stat Nonlin Soft Matter Phys 71: 6 Pt 1. Jun  
Abstract: We report on an entropic edge detector based on the local calculation of the Jensen-Shannon divergence with application to the search for CpG islands. CpG islands are pieces of the genome related to gene expression and cell differentiation, and thus to cancer formation. Searching for these CpG islands is a major task in genetics and bioinformatics. Some algorithms have been proposed in the literature, based on moving statistics in a sliding window, but its size may greatly influence the results. The local use of Jensen-Shannon divergence is a completely different strategy: the nucleotide composition inside the islands is different from that in their environment, so a statistical distance--the Jensen-Shannon divergence--between the composition of two adjacent windows may be used as a measure of their dissimilarity. Sliding this double window over the entire sequence allows us to segment it compositionally. The fusion of those segments into greater ones that satisfy certain identification criteria must be achieved in order to obtain the definitive results. We find that the local use of Jensen-Shannon divergence is very suitable in processing DNA sequences for searching for compositionally different structures such as CpG islands, as compared to other algorithms in literature.
Notes:
2004
J L Oliver, P Carpena, M Hackenberg, P Bernaola-Galván (2004)  IsoFinder : computational prediction of isochores in genome sequences   Nucleic Acids Research 32: W287-W292  
Abstract: Isochores are long genome segments homogeneous in G+C. Here, we describe an algorithm (IsoFinder) running on the web (http://bioinfo2.ugr.es/IsoF/isofinder.html) able to predict isochores at the sequence level. We move a sliding pointer from left to right along the DNA sequence. At each position of the pointer, we compute the mean G+C values to the left and to the right of the pointer. We then determine the position of the pointer for which the difference between left and right mean values (as measured by the t-statistic) reaches its maximum. Next, we determine the statistical significance of this potential cutting point, after filtering out short-scale heterogeneities below 3 kb by applying a coarse-graining technique. Finally, the program checks whether this significance exceeds a probability threshold. If so, the sequence is cut at this point into two subsequences; otherwise, the sequence remains undivided. The procedure continues recursively for each of the two resulting subsequences created by each cut. This leads to the decomposition of a chromosome sequence into long homogeneous genome regions (LHGRs) with well-defined mean G+C contents, each significantly different from the G+C contents of the adjacent LHGRs. Most LHGRs can be identified with Bernardi's isochores, given their correlation with biological features such as gene density, SINE and LINE (short, long interspersed repetitive elements) densities, recombination rate or single nucleotide polymorphism variability. The resulting isochore maps are available at our web site (http://bioinfo2.ugr.es/isochores/), and also at the UCSC Genome Browser (http://genome.cse.ucsc.edu/).
Notes: Suppl. 2 xD;832NB xD;Times Cited:8 xD;Cited References Count:32
P Bernaola-Galván, J L Oliver, P Carpena, O Clay, G Bernardi (2004)  Quantifying intrachromosomal GC heterogeneity in prokaryotic genomes   Gene 333: 121-133  
Abstract: The sequencing of prokaryotic genomes covering a wide taxonomic range has sparked renewed interest in intrachromosomal compositional (GC) heterogeneity, largely in view of lateral transfers. We present here a brief overview of some methods for visualizing and quantifying GC variation in prokaryotes. We used these methods to examine heterogeneity levels in sequenced prokaryotes, for a range of scales or stringencies. Some species are consistently homogeneous, whereas others are markedly heterogeneous in comparison, in particular Aeropyrum pernix, Xylella fastidiosa, Mycoplasma genitalium, Enterococcus faecalis, Bacillus subtilis, Pyrobaculum aerophilum, Vibrio vulnificus chromosome I, Deinococcus radiodurans chromosome II and Halobacterium. As we discuss here, the wide range of heterogeneities calls for reexamination of an accepted belief, namely that the endogenous DNA of bacteria and archaea should typically exhibit low intrachromosomal GC contrasts. Supplementary results for all species analyzed are available at our website: http://bioinfo2.ugr.es/prok. (C) 2004 Elsevier B.V. All rights reserved.
Notes: 829UF xD;Times Cited:4 xD;Cited References Count:59
2003
W Li, P Bernaola-Galván, P Carpena, J L Oliver (2003)  Isochores merit the prefix 'iso'   Computational Biology and Chemistry 27: 1. 5-10  
Abstract: The isochore concept in the human genome sequence was challenged in an analysis by the International Human Genome Sequencing Consortium (IHGSC). We argue here that a statement in the IHGSC's analysis concerning the existence of isochores is misleading, because the homogeneity was not examined at a large enough length scale and consequently an inappropriate statistical test was applied. A test of the existence of isochores should be equivalent to a test of homogeneity or equality of windowed GC%. The statistical test applied in the IHGSC's analysis, the binomial test, is a test of whether individual bases are independent and identically-distributed (iid). For testing the existence of isochores, or homogeneity in windowed GC%, we propose to use another statistical test: the analysis of variance (ANOVA). It can be shown that DNA sequences that are rejected by the binomial test may not be rejected by the ANOVA test. (C) 2002 Elsevier Science Ltd. All rights reserved.
Notes: 686NL xD;Times Cited:10 xD;Cited References Count:22
A Marín, J L Oliver (2003)  GC-biased mutation pressure and ORF lengthening   Journal of Molecular Evolution 56: 3. 371-372  
Abstract:
Notes: 651PR xD;Times Cited:0 xD;Cited References Count:6
2002
P Bernaola-Galván, P Carpena, R Román-Roldán, J L Oliver (2002)  Study of statistical correlations in DNA sequences   Gene 300: 1-2. 105-115  
Abstract: Here we present a study of statistical correlations among different positions in DNA sequences and their implications by directly using the autocorrelation function. Such an analysis is possible now because of the availability of large sequences or even complete genomes of many organisms. After describing the way in which the autocorrelation function can be applied to DNA-sequence analysis, we show that long-range correlations, implying scale independence. appear in several bacterial genomes as well as in long human chromosome contigs. The source for such correlations in bacteria, which may extend up to 60 kb in Bacillus subtilis, may be related to massive lateral transfer of compositionally biased genes from other genomes. In the human genome, correlations extend for more than five decades and may be related to the evolution of the 'neogenome', a modern evolutionary acquisition composed by GC-rich isochores displaying long-range correlations and scale invariance. (C) 2002 Elsevier Science B.V. All rights reserved.
Notes: 625YB xD;Times Cited:19 xD;Cited References Count:56
I Grosse, P Bernaola-Galván, P Carpena, R Román-Roldán, J Oliver, H E Stanley (2002)  Analysis of symbolic sequences using the Jensen-Shannon divergence   Physical Review E 65: 4.  
Abstract: We study statistical properties of the Jensen-Shannon divergence D, which quantifies the difference between probability distributions, and which has been widely applied to analyses of symbolic sequences. We present three interpretations of D in the framework of statistical physics, information theory, and mathematical statistics, and obtain approximations of the mean, the variance, and the probability distribution of D in random, uncorrelated sequences. We present a segmentation method based on D that is able to segment a nonstationary symbolic sequence into stationary subsequences, and apply this method to DNA sequences, which are known to be nonstationary on a wide range of different length scales.
Notes: Part 1 xD;544EZ xD;Times Cited:21 xD;Cited References Count:51
P Carpena, P Bernaola-Galván, R Román-Roldán, J L Oliver (2002)  A simple and species-independent coding measure   Gene 300: 1-2. 97-104  
Abstract: We present a coding measure which is based on the statistical properties of the stop codons. and that is able to estimate accurately the variation of coding content along an anonymous sequence. As the stop codons play the same role in all the genomes (with very few exceptions) the measure turns out to be species-independent. We show results both for prokaryotic and for eukarotic genomes, indicating, first, the accuracy of the measure, and, second. that better prediction is achieved if the measure is applied on homogeneous, isochore-like sequences than if it is applied following the standard moving window approach. Finally, we discuss on some of the possible applications of the measure. (C) 2002 Elsevier Science B.V. All rights reserved.
Notes: 625YB xD;Times Cited:4 xD;Cited References Count:20
J L Oliver, P Carpena, R Román-Roldán, T Mata-Balaguer, A Mejías-Romero, M Hackenberg, P Bernaola-Galván (2002)  Isochore chromosome maps of the human genome   Gene 300: 1-2. 117-127  
Abstract: The human genome is a mosaic of isochores, which are long DNA segments ( much greater than 300 kbp) relatively homogeneous in G + C. Human isochores were first identified by density-gradient ultracentriftigation of bulk DNA, and differ in important features, e.g. genes are found predominantly in the GC-richest isochores. Here, we use a reliable segmentation method to partition the longest contigs in the human genome draft sequence into long homogeneous genome regions (LHGRs). thereby revealing the isochore structure of the human genome. The advantages of the isochore maps presented here are: (1) sequence heterogeneities at different scales are shown in the same plot: (2) pair-wise compositional differences between adjacent regions are all statistically significant; (3) isochore boundaries are accurately defined to single base pair resolution and (4) both gradual and abrupt isochore boundaries are simultaneously revealed. Taking advantage of the wide sample of genome sequence analyzed, we investigate the correspondence between LHGRs and true human isochores revealed through DNA centrifugation. LHGRs show many of the typical isochore features, mainly size distribution, G + C range. and proportions of the isochore classes. The relative density of genes, Alu and long interspersed nuclear element repeats and the different types of single nucleotide polymorphisms on LHGRs also coincide with expectations in true isochores. Potential applications of isochore maps range from the improvement of gene-finding algorithms to the prediction of linkage disequilibrium levels in association studies between marker genes and complex traits, The coordinates for the LHGRs identified in all the contigs longer than 2 Mb in the human genome sequence are available at the online resource on isochore mapping: http://bioinfo2.ugr.es/isochores. (C) 2002 Elsevier Science B.V. All rights reserved.
Notes: 625YB xD;Times Cited:16 xD;Cited References Count:56
2001
J L Oliver, P Bernaola-Galván, P Carpena, R Román-Roldán (2001)  Isochore chromosome maps of eukaryotic genomes   Gene 276: 1-2. 47-56  
Abstract: Analytical DNA ultracentrifugation revealed that eukaryotic genomes are mosaics of isochores: long DNA segments ( >> 300 kb on average) relatively homogeneous in G + C. Important genome features are dependent on this isochore structure, e.g. genes are found predominantly in the GC-richest isochore classes. However, no reliable method is available to rigorously partition the genome sequence into relatively homogeneous regions of different composition, thereby revealing the isochore structure of chromosomes at the sequence level. Homogeneous regions are currently ascertained by plain statistics on moving windows of arbitrary length, or simply by eye on G + C plots. On the contrary, the entropic segmentation method is able to divide a DNA sequence into relatively homogeneous, statistically significant domains. An early version of this algorithm only produced domains having an average length far below the typical isochore size. Here we show that an improved segmentation method, specifically intended to determine the most statistically significant partition of the sequence at each scale, is able to identify the boundaries between long homogeneous genome regions displaying the typical features of isochores. The algorithm precisely locates classes II and III of the human major histocompatibility complex region, two well-characterized isochores at the sequence level, the boundary between them being the first isochore boundary experimentally characterized at the sequence level. The analysis is then extended to a collection of human large contigs. The relatively homogeneous regions we find show many of the features (G + C range, relative proportion of isochore classes, size distribution, and relationship with gene density) of the isochores identified through DNA centrifugation. Isochore chromosome maps, with many potential applications in genomics, are then drawn for all the completely sequenced eukaryotic genomes available. (C) 2001 Elsevier Science B.V. All rights reserved.
Notes: Sp. Iss. SI xD;485GH xD;Times Cited:35 xD;Cited References Count:46
2000
P Bernaola-Galván, I Grosse, P Carpena, J L Oliver, R Román-Roldán, H E Stanley (2000)  Finding borders between coding and noncoding DNA regions by an entropic segmentation method   Physical Review Letters 85: 6. 1342-1345  
Abstract: We present a new computational approach to finding borders between coding and noncoding DNA. This approach has two features: (i) DNA sequences are described by a 12-letter alphabet that captures the differential base composition at each codon position, and (ii) the search for the borders is carried out by means of an entropic;segmentation method which uses only the general statistical properties of coding DNA. We find that this method is highly accurate in finding borders between coding and noncoding regions and requires no "prior training" on known data sets. Our results appear to be more accurate than those obtained with moving windows in the discrimination of coding from noncoding DNA.
Notes: 341VN xD;Times Cited:31 xD;Cited References Count:19
1999
P Bernaola-Galván, J L Oliver, R Román-Roldán (1999)  Decomposition of DNA sequence complexity   Physical Review Letters 83: 16. 3336-3339  
Abstract: Profiles of sequence compositional complexity provide a view of the spatial heterogeneity of symbolic sequences at different levels of derail. Sequence compositional complexity profiles are here decomposed into partial profiles using the branching property of the Shannon entropy. This decomposition shows the complexity contributed by each individual symbol or group of symbols. In particular, we apply this method to the mapping rules (symbol groupings) commonly used in DNA sequence analysis. We find that strong-weak bindings are remarkable homogeneously distributed as compared to purine pyrimidine, and that A and T are the most heterogeneous distributed bases.
Notes: 245XH xD;Times Cited:7 xD;Cited References Count:19
A Marín, G Gutiérrez, J L Oliver (1999)  Compositional correlation between open reading frames with opposite transcriptional orientations in Escherichia coli   Journal of Molecular Evolution 48: 6. 712-716  
Abstract: This paper analyzes correlations in base composition between pairs of neighboring genes in Escherichia coli. The G + C contents of nearby, but convergently or divergently transcribed, genes show weak but significant correlations, and this is attributed to compositional variation among genomic regions. The finding that the base composition varies among intergenic regions, depending upon whether the adjacent genes are transcribed convergently, divergently, or in the same orientation, seems to indicate that transcription affects the patterns of mutation and, therefore, the overall base composition of the region.
Notes: 195GC xD;Times Cited:0 xD;Cited References Count:34
P Bernaola-Galván, P Carpena, R Román-Roldán, J L Oliver (1999)  Compositional complexity of DNA sequence models   Computer Physics Communications 122: 136-138  
Abstract: Recently, we proposed a new measure of complexity for symbolic sequences (Sequence Compositional Complexity, SCC) based on the entropic segmentation of a sequence into compositionally homogeneous domains. Such segmentation is carried out by means of a conceptually simple, computationally efficient heuristic algorithm. SCC is now applied to the sequences generated by several stochastic models which describe the statistical properties of DNA, in particular the observed long-range fractal correlations. This approach allows us to test the capability of the different models in describing the complex compositional heterogeneity found in DNA sequences. Moreover, SCC detects clear differences where conventional standard methods fail. (C) 1999 Elsevier Science B.V. All rights reserved.
Notes: Sp. Iss. SI xD;263LP xD;Times Cited:2 xD;Cited References Count:10
J L Oliver, R Román-Roldán, J Pérez, P Bernaola-Galván (1999)  SEGMENT : identifying compositional domains in DNA sequences   Bioinformatics 15: 12. 974-979  
Abstract: Motivation: DNA sequences are formed by patches or domains of different nucleotide composition. In a few simple sequences, domains can simply be identified by eye; however; most DNA sequences show a complex compositional heterogeneity (fractal structure), which cannot be properly detected by current methods. Recently, a computationally efficient segmentation method to analyse such nonstationary sequence structures, based on the Jensen-Shannon entropic divergence, has been described. Specific algorithms implementing this method are now needed. xA;Results: Here we describe a heuristic segmentation algorithm for DNA sequences, which was implemented an a Windows program (SEGMENT). The program divides a DNA sequence into compositionally homogeneous domains by iterating a local optimization procedure at a given statistical significance. Once a sequence is partitioned into domains, a global measure of sequence compositional complexity (SCC), accounting for both the sizes and compositional biases of all the domains in the sequence, is derived. SEGMENT computes SCC as a function of the significance level, which provides a multiscale view of sequence complexity.
Notes: 293DJ xD;Times Cited:18 xD;Cited References Count:13
1998
W T Li, G Stolovitzky, P Bernaola-Galván, J L Oliver (1998)  Compositional heterogeneity within, and uniformity between, DNA sequences of yeast chromosomes   Genome Research 8: 9. 916-928  
Abstract: The heterogeneity within, and similarities between, yeast chromosomes are studied. For the former, we show by the size distribution of domains, coding density, size distribution of open reading frames, spatial power spectra, and deviation from binomial distribution For C + G% in large moving windows that there is a strong deviation of the yeast sequences from random sequences. For the latter, not only do we graphically illustrate the similarity for the above mentioned statistics, but we also carry out a rigorous analysis of variance (ANOVA) test. The hypothesis that all yeast chromosomes are similar cannot be rejected by this test. We examine the two possible explanations of this interchromosomal uniformity: a common origin, such as genome-wide duplication (polyploidization), and a concerted evolutionary process.
Notes: 125AK xD;Times Cited:32 xD;Cited References Count:69
E García, M Jamilena, J I Alvarez, T Arnedo, J L Oliver, R Lozano (1998)  Genetic relationships among melon breeding lines revealed by RAPD markers and agronomic traits   Theoretical and Applied Genetics 96: 6-7. 878-885  
Abstract: RAPD markers and agronomic traits were used to determine the genetic relationships among 32 breeding, lines of melon belonging to seven varietal types. Most of the breeding lines were Galia and Piel de Sapo genotypes, which are currently being used in breeding programmes to develop new hybrid combinations. A total of 115 polymorphic reliable bands from 43 primers and 24 agronomic traits were scored for genetic distance calculations and cluster analysis. A high concordance between RAPDs and agronomic traits was observed when genetic relationships among lines were assessed. In addition, RAPD data were highly correlated with the pedigree information already known for the lines and revealed the existence of two clusters for each varietal type that comprised the lines sharing similar agronomic features. These groupings were consistent with the development of breeding programmes trying to generate two separate sets of parental lines for hybrid production. Nevertheless, the performance of certain hybrids indicated that RAPDs were more suitable markers than agronomic traits in predicting genetic distance among the breeding lines analysed. The employment of RAPDs as molecular markers both in germplasm management and improvement, as well as in the selection of parental lines for the development of new hybrid combinations, is discussed.
Notes: Zt574 xD;Times Cited:25 xD;Cited References Count:31
R Román-Roldán, P Bernaola-Galván, J L Oliver (1998)  Sequence compositional complexity of DNA through an entropic segmentation method   Physical Review Letters 80: 6. 1344-1347  
Abstract: A new complexity measure, based on the entropic segmentation of DNA sequences into compositionally homogeneous domains, is proposed, Sequence compositional complexity (SCC) deals directly with the complex heterogeneity in nonstationary DNA sequences, The plot of SCC as a function of significance level provides a profile of sequence structure at different length scales, SCC is found to be higher in sequences with long-range correlation than those without, and higher in noncoding sequences than coding sequences. Furthermore, a general agrement is found between the SCC of the DNA sequence, on one hand, and the biological complexity of the organism, on the other, attributable to an increasingly complex organization of noncoding DNA over the course of evolution.
Notes: Yw274 xD;Times Cited:35 xD;Cited References Count:35
A Marín, F González, G Gutiérrez, J L Oliver (1998)  Scientific correspondence   Nucleic Acids Research 26: 19. 4540-4540  
Abstract:
Notes: 126JF xD;Times Cited:3 xD;Cited References Count:6
1996
P Bernaola-Galván, R Román-Roldán, J L Oliver (1996)  Compositional segmentation and long-range fractal correlations in DNA sequences   Physical Review E 53: 5. 5181-5189  
Abstract: A segmentation algorithm based on the Jensen-Shannon entropic divergence is used to decompose long-range correlated DNA sequences into statistically significant, compositionally homogeneous patches. By adequately setting the significance level for segmenting the sequence, the underlying power-law distribution of patch lengths can be revealed. Some of the identified DNA domains were uncorrelated, but most of them continued to display long-range correlations even after several steps of recursive segmentation, thus indicating a complex multi-length-scaled structure for the sequence. On the other hand, by separately shuffling each segment, or by randomly rearranging the order in which the different segments occur in the sequence, shuffled sequences preserving the original statistical distribution of patch lengths were generated. Both types of random sequences displayed the same correlation scaling exponents as the original DNA sequence, thus demonstrating that neither the internal structure of patches nor the order in which these are arranged in the sequence is critical; therefore, long-range correlations in nucleotide sequences seem to rely only on the power-law distribution of patch lengths.
Notes: Part B xD;Um619 xD;Times Cited:61 xD;Cited References Count:31
J L Oliver, A Marín (1996)  A relationship between GC content and coding-sequence length   Journal of Molecular Evolution 43: 3. 216-223  
Abstract: Since base composition of translational stop codons (TAG, TAA, and TGA) is biased toward a low G+C content, a differential density for these termination signals is expected in random DNA sequences of different base compositions. The expected length of reading frames (DNA segments of sense codons flanked by in-phase stop codons) in random sequences is thus a function of GC content. The analysis of DNA sequences from several genome databases stratified according to GC content reveals that the longest coding sequences-exons in vertebrates and genes in prokaryotes-are GC-rich, while the shortest ones are GC-poor. Exon lengthening in GC-rich vertebrate regions does not result, however, in longer vertebrate proteins, perhaps because of the lower number of exons in the genes located in these regions. The effects on coding-sequence lengths constitute a new evolutionary meaning for compositional variations in DNA GC content.
Notes: Vf808 xD;Times Cited:23 xD;Cited References Count:36
R Román-Roldán, P Bernaola-Galván, J L Oliver (1996)  Application of information theory to DNA sequence analysis : A review   Pattern Recognition 29: 7. 1187-1194  
Abstract: The analysis of DNA sequences through information theory methods is reviewed from the beginning in the 70s. The subject is addressed within a broad context, describing in some detail the cornerstone contributions in the field. The emerging interest concerning long-range correlations and the mosaic structure of DNA sequences is considered from our own point of view. A recent procedure developed by the authors is also outlined. Copyright (C) 1996 Pattern Recognition Society.
Notes: Uu388 xD;Times Cited:15 xD;Cited References Count:28
1994
G Gutiérrez, J Casadesus, J L Oliver, A Marín (1994)  Compositional Heterogeneity of the Escherichia coli Genome - a Role for Vsp Repair   Journal of Molecular Evolution 39: 4. 340-346  
Abstract: E. coli genes that contain a high frequency of the tetranucleotide CTAG are also rich in the tetramers CTTG, CCTA, CCAA, TTGG, TAGG, and CAAG (group-I tetramers). Conversely, E. coli genes lacking CTAG are rich in the tetranucleotides CCTG, CCAG, CTGG, and CAGG (group-II tetramers). These two gene samples differ also in codon usage, amino acid composition, frequency of Dcm sites, and contrast vocabularies. Group-I tetramers have in common that they are depleted by very-short-patch repair (VSP), while group-II tetramers are favored by VSP activity. The VSP system repairs G:T mismatches to G:C, thereby increasing the overall G+C content of the genome; for this reason the CTAG-rich sample has a lower G+C content than the CTAG-poor sample. This compositional heterogeneity can be tentatively explained by a low level of VSP activity on the CTAG-rich sample. A negative correlation is found between the frequency of group-I tetramers and the level of gene expression, as measured by the Codon Adaptation Index (CAI). A possible link between the rate of VSP activity and the level of gene expression is considered.
Notes: Ph800 xD;Times Cited:16 xD;Cited References Count:33
R Román-Roldán, P Bernaola-Galván, J L Oliver (1994)  Entropic Feature for Sequence Pattern through Iterated Function Systems   Pattern Recognition Letters 15: 6. 567-573  
Abstract: Entropy and relative entropy am proposed as features extracted from symbol sequences. Firstly, a proper Iterated Function System is driven by the sequence, Producing a fractal-like representation (CSR) with a low computational cost. Then, two entropic measures are applied to the CSR history of the CSR and theoretically justified. Examples are included.
Notes: Nr418 xD;Times Cited:9 xD;Cited References Count:9
1993
G Gutiérrez, J L Oliver, A Marín (1993)  Dinucleotides and G+C Content in Human Genes - Opposite Behavior of GpG, GpC, and TpC at II-III Codon Positions and in Introns   Journal of Molecular Evolution 37: 2. 131-136  
Abstract: We have studied the behavior of the dinucleotide preferences under G + C content variation in human genes. The doublet preferences for each dinucleotide were compared between two functionally distinct zones in genes, the II-III codon positions, and the introns. The 16 dinucleotides have been tentatively classified in three groups: AA, AC, CC, CT, and GA, doublets showing no difference between introns and II-III codon positions in the full range of G + C variation TG and TA, which differ in the full range of G + C variation AT, AG, GT, TC, TT, GG, GC, CG, and CA, which show differences in regions over 50% G + C A remarkable pattern observed concerns the behavior of GG, GC, and TC, which showed opposite trends in II-III codon positions and in introns. If codon positions and introns are under the same structural requirements and the same mutational bias, our results indicate that the differences observed could be related to post-transcriptional constraints acting on mRNA.
Notes: Ln866 xD;Times Cited:5 xD;Cited References Count:16
J M Martínez-Zapater, A Marín, J L Oliver (1993)  Evolution of Base Composition in T-DNA Genes from Agrobacterium   Molecular Biology and Evolution 10: 2. 437-448  
Abstract: T-DNA genes on Ti and Ri plasmids from Agrobacterium are replicated and repaired in bacteria but expressed in plant cells. Therefore, they can be useful tools to disclose the relative roles played by the two main mechanisms involved in the evolution of DNA base composition: (1) mutational bias along DNA replication/repair processes and (2) selective gene expression constraints. We compare the base-compositional features of 15 T-DNA genes with those of (1) other genes located on Ti or Ri plasmids but outside the T-DNA region (non-T-DNA genes) and (2) a sample of nuclear genes from a natural host plant species (tobacco). The similarity in G+C content found between T-DNA and plant genes at replacement sites, as well as the similar stronger avoidance of CpG at II-III codon positions, support an ancestral plant origin for T-DNA genes. When G+C content and codon usage are considered, T-DNA genes are more similar to non-T-DNA genes than to those of plants, indicating that the mutational bias along replication and repair processes in bacteria is the major factor driving the global compositional properties of T-DNA genes. However, when the reduction in the available CpG methylation targets and the distribution of these avoidances on the different codon positions are considered, T-DNA genes are more similar to those of plants than they are to the other plasmid genes. The requirements for expression of T-DNA genes in the plant cells would have modulated the compositional features of their sequences, mainly CpG avoidance.
Notes: Kt911 xD;Times Cited:0 xD;Cited References Count:44
1990
 
PMID 
F Rodríguez, J L Oliver, A Marín, J R Medina (1990)  The general stochastic model of nucleotide substitution.   J Theor Biol 142: 4. 485-501 Feb  
Abstract: DNA sequence evolution through nucleotide substitution may be assimilated to a stationary Markov process. The fundamental equations of the general model, with 12 independent substitution parameters, are used to obtain a formula which corrects the effect of multiple and parallel substitutions on the measure of evolutionary divergence between two homologous sequences. We show that only reversible models, with six independent parameters, allow the calculation of the substitution rates. Simulation experiments on DNA sequence evolution through nucleotide substitution call into question the effectiveness of the general model (and of any other more detailed description); nevertheless, the general model results are slightly superior to any of its particular cases.
Notes:
 
PMID 
J L Oliver, A Marín, J M Martínez-Zapater (1990)  Chloroplast genes transferred to the nuclear plant genome have adjusted to nuclear base composition and codon usage.   Nucleic Acids Res 18: 1. 65-73 Jan  
Abstract: During plant evolution, some plastid genes have been moved to the nuclear genome. These transferred genes are now correctly expressed in the nucleus, their products being transported into the chloroplast. We compared the base compositions, the distributions of some dinucleotides and codon usages of transferred, nuclear and chloroplast genes in two dicots and two monocots plant species. Our results indicate that transferred genes have adjusted to nuclear base composition and codon usage, being now more similar to the nuclear genes than to the chloroplast ones in every species analyzed.
Notes:
1989
 
PMID 
J L Oliver, A Marín, J R Medina (1989)  SDSE: a software package to simulate the evolution of a pair of DNA sequences.   Comput Appl Biosci 5: 1. 47-50 Feb  
Abstract: An algorithm to simulate DNA sequence evolution under a general stochastic model, including as particular cases all the previously used schemes of nucleotide substitution, is described. The stimulation is carried out on finite, variable length, DNA sequences through a strict stochastic process, according to the particular substitution rates imposed by each scheme. Five FORTRAN programs, running on an IBM PC and compatibles, carry out all the tasks needed for the simulation. They are menu driven and interfaced to the system through a principal menu. All sequence data files used and generated by the SDSE package conform to the standard GenBank database format, thus allowing the use of any sequence retrieved from this databank, as well as the application of other packages to analyse, manipulate or retrieve stimulated sequences.
Notes:
 
PMID 
A Marín, J Bertranpetit, J L Oliver, J R Medina (1989)  Variation in G + C-content and codon choice: differences among synonymous codon groups in vertebrate genes.   Nucleic Acids Res 17: 15. 6181-6189 Aug  
Abstract: The relationship between G + C-content and codon usage in genes of human, mus, rat, bovine and chicken nuclear genomes was investigated. Correlation and lineal regression analyses were carried out on plots that related the frequency of each codon within each synonymous codon group to the G + C-content of the coding sequence as a whole. Under GC pressure, in most of the quartet codon groups there is a preferential choice of the C-ending codon, except in leucine and valine codon groups where the choice of the G-ending codon is preferred. Among ducts, the choice of codons specifying phenylalanine and glutamate shows the strongest dependence on G + C-content. The relationship found between G + C-content and codon usage in these genomes correlate with taxonomic distance.
Notes:
1988
1987
1985
1984
 
PMID 
J M Martínez-Zapater, J L Oliver (1984)  Genetic Analysis of Isozyme Loci in Tetraploid Potatoes (Solanum tuberosum L.).   Genetics 108: 3. 669-679 Nov  
Abstract: The genetic control of eight isozyme loci revealed by starch gel electrophoresis was studied through the analysis of three progenies derived from four tetraploid cultivars of Solanum tuberosum (groups Andigena and Tuberosum). Duplicate gene expression was found in seven (Got-A, Got-B, Pgd-C, Pgi-B, Pgm-A, Pgm-B and Pox-C) isozyme loci. In another isozyme gene (Adh-A), the parental genotypes were not adequate to distinguish between a monogenic or a digenic model of genetic control. Tetrasomic inheritance was demonstrated in four (Got-A, Got-B, Pgd-C and Pgi-B) isozyme loci. In the remaining duplicate genes, the parental genotypes precluded discrimination between disomic or tetrasomic models. Tetrasomic segregations of the chromosomal type were generally found; however, the isozyme phenotypes shown by three descendants from selfing cv. Katahdin indicate the occurrence of chromatid segregations, although aneuploidy cannot be ruled out. Either autoploidy or amphidiploidy with lack of chromosome differentiation between the two diploid ancestors can account for the existence of tetrasomic inheritance in the common potato.
Notes:
1983
1982
1981
1980

Book chapters

2004
1994
1990
1989
1986
1983

Conference papers

2004
2003
1998
Powered by publicationslist.org.