hosted by
publicationslist.org
    
Sergei L Kosakovsky Pond
University of California, San Diego
Antiviral Research Center
150 W. Washington St., Suite 100
San Diego, CA 92103
USA
spond@ucsd.edu

Journal articles

2008
 
DOI   
PMID 
Kosakovsky Pond, Poon, Zárate, Smith, Little, Pillai, Ellis, Wong, Leigh Brown, Richman, Frost (2008)  Estimating selection pressures on HIV-1 using phylogenetic likelihood models.   Stat Med Apr  
Abstract: Human immunodeficiency virus (HIV-1) can rapidly evolve due to selection pressures exerted by HIV-specific immune responses, antiviral agents, and to allow the virus to establish infection in different compartments in the body. Statistical models applied to HIV-1 sequence data can help to elucidate the nature of these selection pressures through comparisons of non-synonymous (or amino acid changing) and synonymous (or amino acid preserving) substitution rates. These models also need to take into account the non-independence of sequences due to their shared evolutionary history. We review how we have developed these methods and have applied them to characterize the evolution of HIV-1 in vivo. To illustrate our methods, we present an analysis of compartment-specific evolution of HIV-1 env in blood and cerebrospinal fluid and of site-to-site variation in the gag gene of subtype C HIV-1. Copyright (c) 2008 John Wiley & Sons, Ltd.
Notes:
 
DOI   
PMID 
Susan J Little, Simon D W Frost, Joseph K Wong, Davey M Smith, Sergei L Kosakovsky Pond, Caroline C Ignacio, Neil T Parkin, Christos J Petropoulos, Douglas D Richman (2008)  Persistence of transmitted drug resistance among subjects with primary human immunodeficiency virus infection.   J Virol 82: 11. 5510-5518 Jun  
Abstract: Following interruption of antiretroviral therapy among individuals with acquired drug resistance, preexisting drug-sensitive virus emerges relatively rapidly. In contrast, wild-type virus is not archived in individuals infected with drug-resistant human immunodeficiency virus (HIV) and thus cannot emerge rapidly in the absence of selective drug pressure. Fourteen recently HIV-infected patients with transmitted drug-resistant virus were followed for a median of 2.1 years after the estimated date of infection (EDI) without receiving antiretroviral therapy. HIV drug resistance and pol replication capacity (RC) in longitudinal plasma samples were assayed. Resistance mutations were characterized as pure populations or mixtures. The mean time to first detection of a mixture of wild-type and drug-resistant viruses was 96 weeks (1.8 years) (95% confidence interval, 48 to 192 weeks) after the EDI. The median time to loss of detectable drug resistance using population-based assays ranged from 4.1 years (conservative estimate) to longer than the lifetime of the individual (less conservative estimate). The transmission of drug-resistant virus was not associated with virus with reduced RC. Sexual transmission of HIV selects for highly fit drug-resistant variants that persist for years. The prolonged persistence of transmitted drug resistance strongly supports the routine use of HIV resistance genotyping for all newly diagnosed individuals.
Notes:
 
DOI   
PMID 
Pamina M Gorbach, Lydia N Drumright, Marjan Javanbakht, Sergei L Pond, Christopher H Woelk, Eric S Daar, Susan J Little (2008)  Antiretroviral drug resistance and risk behavior among recently HIV-infected men who have sex with men.   J Acquir Immune Defic Syndr 47: 5. 639-643 Apr  
Abstract: OBJECTIVES: Examine associations among behaviors including substance use during sexual encounters, and transmitted HIV drug resistance in recently HIV-infected men who have sex with men (MSM). METHODS: Between 2002 and 2006, 117 recently HIV-infected MSM completed questionnaires regarding their 3 most recent sexual partners. Serum samples were tested for the presence of genotypic and phenotypic HIV drug resistance. Logistic regression analysis was used to assess the association of substance use, behaviors, and resistance to at least 1 class of HIV drugs. RESULTS: The mean age of participants was 35 years; 71% identified as white and 19% as Hispanic. Sixty (51%) reported substance use during sexual activity in the past 12 months. A total of 12.5% of 112 had genotypic drug resistance to at least 1 class of antiretroviral medications, and 14% of 117 had phenotypic drug resistance. Substances used during sexual activity associated with phenotypic drug resistance in multivariate models included any substance use (adjusted odds ratio [aOR] = 4.21, 95% confidence interval [CI]: 1.13 to 15.68), polysubstance use (aOR = 5.64, 95% CI: 1.62 to 19.60), methamphetamine (aOR = 4.00, 95% CI: 1.19 to 13.38), 3,4-methylenedioxy-N-methylamphetamine (MDMA)/Ecstasy (aOR = 7.16, 95% CI: 1.40 to 36.59), and gamma-hydroxyl butyrate (GHB) (aOR = 6.98, 95% CI: 1.82 to 26.80). The genotype analysis was similar. CONCLUSIONS: Among these recently HIV-infected MSM, methamphetamine use during sexual activity and use of other substances, such as MDMA and GHB, was associated with acquired drug-resistant virus. No other behaviors associated with acquisition of drug-resistant HIV.
Notes:
2007
 
DOI   
PMID 
Wen-Yu Chung, Samir Wadhawan, Radek Szklarczyk, Sergei Kosakovsky Pond, Anton Nekrutenko (2007)  A first look at ARFome: dual-coding genes in mammalian genomes.   PLoS Comput Biol 3: 5. May  
Abstract: Coding of multiple proteins by overlapping reading frames is not a feature one would associate with eukaryotic genes. Indeed, codependency between codons of overlapping protein-coding regions imposes a unique set of evolutionary constraints, making it a costly arrangement. Yet in cases of tightly coexpressed interacting proteins, dual coding may be advantageous. Here we show that although dual coding is nearly impossible by chance, a number of human transcripts contain overlapping coding regions. Using newly developed statistical techniques, we identified 40 candidate genes with evolutionarily conserved overlapping coding regions. Because our approach is conservative, we expect mammals to possess more dual-coding genes. Our results emphasize that the skepticism surrounding eukaryotic dual coding is unwarranted: rather than being artifacts, overlapping reading frames are often hallmarks of fascinating biology.
Notes:
 
DOI   
PMID 
Selene Zárate, Sergei L Kosakovsky Pond, Paul Shapshak, Simon D W Frost (2007)  Comparative study of methods for detecting sequence compartmentalization in human immunodeficiency virus type 1.   J Virol 81: 12. 6643-6651 Jun  
Abstract: Human immunodeficiency virus (HIV) infects different organs and tissues. During these infection events, subpopulations of HIV type 1 (HIV-1) develop and, if viral trafficking is restricted between subpopulations, the viruses can follow independent evolutionary histories, i.e., become compartmentalized. This phenomenon is usually detected via comparative sequence analysis and has been reported for viruses isolated from the central nervous system (CNS) and the genital tract. Several approaches have been proposed to study the compartmentalization of HIV sequences, but to date, no rigorous comparison of the most commonly employed methods has been made. In this study, we systematically compared inferences made by six different methods for detecting compartmentalization based on three data sets: (i) a sample of 45 patients with sequences gathered from the CNS, (ii) sequences from the female genital tract of 18 patients, and (iii) a set of simulated sequences. We found that different methods often reached contradictory conclusions. Methods based on the topology of a phylogenetic tree derived from clonal sequences were generally more sensitive in detecting compartmentalization than those that relied solely upon pairwise genetic distances between sequences. However, as the branching structure in a phylogenetic tree is often uncertain, especially for short, low-diversity, or recombinant sequences, tree-based approaches may need to be modified to take phylogenetic uncertainty into account. Given the frequently discordant predictions of different methods and the strengths and weaknesses of each particular methodology, we recommend that a suite of several approaches be used for reliable inference of compartmentalized population structure.
Notes:
 
DOI   
PMID 
Radek Szklarczyk, Jaap Heringa, Sergei Kosakovsky Pond, Anton Nekrutenko (2007)  Rapid asymmetric evolution of a dual-coding tumor suppressor INK4a/ARF locus contradicts its function.   Proc Natl Acad Sci U S A 104: 31. 12807-12812 Jul  
Abstract: INK4a/ARF tumor suppressor locus encodes two protein products, INK4a and ARF, essential for controlling tumorigenesis and mutated in more than half of human cancers. There is no resemblance between the two proteins: their coding regions are assembled by alternative splicing of two mutually exclusive 5' exons into a constitutive one containing overlapping out-of-phase reading frames. We show that the dual-coding arrangement conflicts with the high cost of mutations within INK4a/ARF. Unexpectedly, the locus evolves rapidly and asymmetrically, with ARF accumulating the majority of amino acid replacements. Rapid evolution drives both INK4a and ARF proteins out of sync with other members of the RB and p53 tumor suppressor pathways, both of which are controlled by the locus. Yet, the asymmetric behavior may be an intrinsic property of dual-coding exons: INK4a/ARF closely mimics the evolution of 90 newly identified genes with similar dual-coding structure. Thus, the strong link between mutations in INK4a/ARF and cancer may be a direct consequence of the architecture of the locus.
Notes:
 
DOI   
PMID 
Art F Y Poon, Fraser I Lewis, Sergei L Kosakovsky Pond, Simon D W Frost (2007)  Evolutionary interactions between N-linked glycosylation sites in the HIV-1 envelope.   PLoS Comput Biol 3: 1. Jan  
Abstract: The addition of asparagine (N)-linked polysaccharide chains (i.e., glycans) to the gp120 and gp41 glycoproteins of human immunodeficiency virus type 1 (HIV-1) envelope is not only required for correct protein folding, but also may provide protection against neutralizing antibodies as a "glycan shield." As a result, strong host-specific selection is frequently associated with codon positions where nonsynonymous substitutions can create or disrupt potential N-linked glycosylation sites (PNGSs). Moreover, empirical data suggest that the individual contribution of PNGSs to the neutralization sensitivity or infectivity of HIV-1 may be critically dependent on the presence or absence of other PNGSs in the envelope sequence. Here we evaluate how glycan-glycan interactions have shaped the evolution of HIV-1 envelope sequences by analyzing the distribution of PNGSs in a large-sequence alignment. Using a "covarion"-type phylogenetic model, we find that the rates at which individual PNGSs are gained or lost vary significantly over time, suggesting that the selective advantage of having a PNGS may depend on the presence or absence of other PNGSs in the sequence. Consequently, we identify specific interactions between PNGSs in the alignment using a new paired-character phylogenetic model of evolution, and a Bayesian graphical model. Despite the fundamental differences between these two methods, several interactions are jointly identified by both. Mapping these interactions onto a structural model of HIV-1 gp120 reveals that negative (exclusive) interactions occur significantly more often between colocalized glycans, while positive (inclusive) interactions are restricted to more distant glycans. Our results imply that the adaptive repertoire of alternative configurations in the HIV-1 glycan shield is limited by functional interactions between the N-linked glycans. This represents a potential vulnerability of rapidly evolving HIV-1 populations that may provide useful glycan-based targets for neutralizing antibodies.
Notes:
 
DOI   
PMID 
C M Noviello, S L Kosakovsky Pond, M J Lewis, D D Richman, S K Pillai, O O Yang, S J Little, D M Smith, J C Guatelli (2007)  Maintenance of Nef-mediated modulation of major histocompatibility complex class I and CD4 after sexual transmission of human immunodeficiency virus type 1.   J Virol 81: 9. 4776-4786 May  
Abstract: Viruses encounter changing selective pressures during transmission between hosts, including host-specific immune responses and potentially varying functional demands on specific proteins. The human immunodeficiency virus type 1 Nef protein performs several functions potentially important for successful infection, including immune escape via down-regulation of class I major histocompatibility complex (MHC-I) and direct enhancement of viral infectivity and replication. Nef is also a major target of the host cytotoxic T-lymphocyte (CTL) response. To examine the impact of changing selective pressures on Nef functions following sexual transmission, we analyzed genetic and functional changes in nef clones from six transmission events. Phylogenetic analyses indicated that the diversity of nef was similar in both sources and acutely infected recipients, the patterns of selection across transmission were variable, and regions of Nef associated with distinct functions evolved similarly in sources and recipients. These results weighed against the selection of specific Nef functions by transmission or during acute infection. Measurement of Nef function provided no evidence that the down-regulation of either CD4 or MHC-I was optimized by transmission or during acute infection, although rare nef clones from sources that were impaired in these activities were not detected in recipients. Nef-specific CTL activity was detected as early as 3 weeks after infection and appeared to be an evolutionary force driving the diversification of nef. Despite the change in selective pressure between the source and recipient immune systems and concomitant genetic diversity, the majority of Nef proteins maintained robust abilities to down-regulate MHC-I and CD4. These data suggest that both functions are important for the successful establishment of infection in a new host.
Notes:
 
DOI   
PMID 
Art F Y Poon, Fraser I Lewis, Sergei L Kosakovsky Pond, Simon D W Frost (2007)  An evolutionary-network model reveals stratified interactions in the V3 loop of the HIV-1 envelope.   PLoS Comput Biol 3: 11. Nov  
Abstract: The third variable loop (V3) of the human immunodeficiency virus type 1 (HIV-1) envelope is a principal determinant of antibody neutralization and progression to AIDS. Although it is undoubtedly an important target for vaccine research, extensive genetic variation in V3 remains an obstacle to the development of an effective vaccine. Comparative methods that exploit the abundance of sequence data can detect interactions between residues of rapidly evolving proteins such as the HIV-1 envelope, revealing biological constraints on their variability. However, previous studies have relied implicitly on two biologically unrealistic assumptions: (1) that founder effects in the evolutionary history of the sequences can be ignored, and; (2) that statistical associations between residues occur exclusively in pairs. We show that comparative methods that neglect the evolutionary history of extant sequences are susceptible to a high rate of false positives (20%-40%). Therefore, we propose a new method to detect interactions that relaxes both of these assumptions. First, we reconstruct the evolutionary history of extant sequences by maximum likelihood, shifting focus from extant sequence variation to the underlying substitution events. Second, we analyze the joint distribution of substitution events among positions in the sequence as a Bayesian graphical model, in which each branch in the phylogeny is a unit of observation. We perform extensive validation of our models using both simulations and a control case of known interactions in HIV-1 protease, and apply this method to detect interactions within V3 from a sample of 1,154 HIV-1 envelope sequences. Our method greatly reduces the number of false positives due to founder effects, while capturing several higher-order interactions among V3 residues. By mapping these interactions to a structural model of the V3 loop, we find that the loop is stratified into distinct evolutionary clusters. We extend our model to detect interactions between the V3 and C4 domains of the HIV-1 envelope, and account for the uncertainty in mapping substitutions to the tree with a parametric bootstrap.
Notes:
 
DOI   
PMID 
David C Nickle, Morgane Rolland, Mark A Jensen, Sergei L Kosakovsky Pond, Wenjie Deng, Mark Seligman, David Heckerman, James I Mullins, Nebojsa Jojic (2007)  Coping with viral diversity in HIV vaccine design.   PLoS Comput Biol 3: 4. Apr  
Abstract: The ability of human immunodeficiency virus type 1 (HIV-1) to develop high levels of genetic diversity, and thereby acquire mutations to escape immune pressures, contributes to the difficulties in producing a vaccine. Possibly no single HIV-1 sequence can induce sufficiently broad immunity to protect against a wide variety of infectious strains, or block mutational escape pathways available to the virus after infection. The authors describe the generation of HIV-1 immunogens that minimizes the phylogenetic distance of viral strains throughout the known viral population (the center of tree [COT]) and then extend the COT immunogen by addition of a composite sequence that includes high-frequency variable sites preserved in their native contexts. The resulting COT(+) antigens compress the variation found in many independent HIV-1 isolates into lengths suitable for vaccine immunogens. It is possible to capture 62% of the variation found in the Nef protein and 82% of the variation in the Gag protein into immunogens of three gene lengths. The authors put forward immunogen designs that maximize representation of the diverse antigenic features present in a spectrum of HIV-1 strains. These immunogens should elicit immune responses against high-frequency viral strains as well as against most mutant forms of the virus.
Notes:
 
DOI   
PMID 
David C Nickle, Laura Heath, Mark A Jensen, Peter B Gilbert, James I Mullins, Sergei L Kosakovsky Pond (2007)  HIV-specific probabilistic models of protein evolution.   PLoS ONE 2: 6. 06  
Abstract: Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1-the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses.
Notes:
 
DOI   
PMID 
Christopher H Woelk, Simon D W Frost, Douglas D Richman, Prentice E Higley, Sergei L Kosakovsky Pond (2007)  Evolution of the interferon alpha gene family in eutherian mammals.   Gene 397: 1-2. 38-50 Aug  
Abstract: Interferon alpha (IFNA) genes code for proteins with important signaling roles during the innate immune response. Phylogenetically, IFNA family members in eutherians (placental mammals) cluster together in a species-specific manner except for closely related species (i.e. Homo sapiens and Pan troglodytes) where gene-specific clustering is evident. Previous research has been unable to clarify whether gene conversion or recent gene duplication accounts for gene-specific clustering, partly because the similarity of members of the IFNA family within species has made it historically difficult to identify the exact composition of IFNA gene families. IFNA gene families were fully characterized in recently available genomes from Canis familiaris, Macaca mulatta, P. troglodytes and Rattus norvegicus, and combined with previously characterized IFNA gene families from H. sapiens and Mus musculus, for the analysis of both whole and partial gene conversion events using a variety of statistical methods. Gene conversion was inferred in every eutherian species analyzed and comparison of the IFNA gene family locus between primate species revealed independent gene duplication in M. mulatta. Thus, both gene conversion and gene duplication have shaped the evolution of the IFNA gene family in eutherian species. Scenarios may be envisaged whereby the increased production of a specific IFN-alpha protein would be beneficial against a particular pathogenic infection. Gene conversion, similar to duplication, provides a mechanism by which the protein product of a specific IFNA gene can be increased.
Notes:
 
DOI   
PMID 
Art F Y Poon, Sergei L Kosakovsky Pond, Douglas D Richman, Simon D W Frost (2007)  Mapping protease inhibitor resistance to human immunodeficiency virus type 1 sequence polymorphisms within patients.   J Virol 81: 24. 13598-13607 Dec  
Abstract: Resistance genotyping provides an important resource for the clinical management of patients infected with human immunodeficiency virus type 1 (HIV-1). However, resistance to protease (PR) inhibitors (PIs) is a complex phenotype shaped by interactions among nearly half of the residues in HIV-1 PR. Previous studies of the genetic basis of PI resistance focused on fixed substitutions among populations of HIV-1, i.e., host-specific adaptations. Consequently, they are susceptible to a high false discovery rate due to founder effects. Here, we employ sequencing "mixtures" (i.e., ambiguous base calls) as a site-specific marker of genetic variation within patients that is independent of the phylogeny. We demonstrate that the transient response to selection by PIs is manifested as an excess of nonsynonymous mixtures. Using a sample of 5,651 PR sequences isolated from both PI-naive and -treated patients, we analyze the joint distribution of mixtures and eight PIs as a Bayesian network, which distinguishes residue-residue interactions from direct associations with PIs. We find that selection for resistance is associated with the emergence of nonsynonymous mixtures in two distinct groups of codon sites clustered along the substrate cleft and distal regions of PR, respectively. Within-patient evolution at several positions is independent of PIs, including those formerly postulated to be involved in resistance. These positions are under strong positive selection in the PI-naive patient population, implying that other factors can produce spurious associations with resistance, e.g., mutational escape from the immune response.
Notes:
 
DOI   
PMID 
Art F Y Poon, Sergei L Kosakovsky Pond, Phil Bennett, Douglas D Richman, Andrew J Leigh Brown, Simon D W Frost (2007)  Adaptation to human populations is revealed by within-host polymorphisms in HIV-1 and hepatitis C virus.   PLoS Pathog 3: 3. Mar  
Abstract: CD8(+) cytotoxic T-lymphocytes (CTLs) perform a critical role in the immune control of viral infections, including those caused by human immunodeficiency virus type 1 (HIV-1) and hepatitis C virus (HCV). As a result, genetic variation at CTL epitopes is strongly influenced by host-specific selection for either escape from the immune response, or reversion due to the replicative costs of escape mutations in the absence of CTL recognition. Under strong CTL-mediated selection, codon positions within epitopes may immediately "toggle" in response to each host, such that genetic variation in the circulating virus population is shaped by rapid adaptation to immune variation in the host population. However, this hypothesis neglects the substantial genetic variation that accumulates in virus populations within hosts. Here, we evaluate this quantity for a large number of HIV-1- (n > or = 3,000) and HCV-infected patients (n > or = 2,600) by screening bulk RT-PCR sequences for sequencing "mixtures" (i.e., ambiguous nucleotides), which act as site-specific markers of genetic variation within each host. We find that nonsynonymous mixtures are abundant and significantly associated with codon positions under host-specific CTL selection, which should deplete within-host variation by driving the fixation of the favored variant. Using a simple model, we demonstrate that this apparently contradictory outcome can be explained by the transmission of unfavorable variants to new hosts before they are removed by selection, which occurs more frequently when selection and transmission occur on similar time scales. Consequently, the circulating virus population is shaped by the transmission rate and the disparity in selection intensities for escape or reversion as much as it is shaped by the immune diversity of the host population, with potentially serious implications for vaccine design.
Notes:
 
DOI   
PMID 
Webb Miller, Kate Rosenbloom, Ross C Hardison, Minmei Hou, James Taylor, Brian Raney, Richard Burhans, David C King, Robert Baertsch, Daniel Blankenberg, Sergei L Kosakovsky Pond, Anton Nekrutenko, Belinda Giardine, Robert S Harris, Svitlana Tyekucheva, Mark Diekhans, Thomas H Pringle, William J Murphy, Arthur Lesk, George M Weinstock, Kerstin Lindblad-Toh, Richard A Gibbs, Eric S Lander, Adam Siepel, David Haussler, W James Kent (2007)  28-way vertebrate alignment and conservation track in the UCSC Genome Browser.   Genome Res 17: 12. 1797-1808 Dec  
Abstract: This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome.ucsc.edu, downloaded in bulk by anonymous FTP from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2.bx.psu.edu. This article illustrates the power of this resource for exploring vertebrate and mammalian evolution, using three examples. First, we present several vignettes involving insertions and deletions within protein-coding regions, including a look at some human-specific indels. Then we study the extent to which start codons and stop codons in the human sequence are conserved in other species, showing that start codons are in general more poorly conserved than stop codons. Finally, an investigation of the phylogenetic depth of conservation for several classes of functional elements in the human genome reveals striking differences in the rates and modes of decay in alignability. Each functional class has a distinctive period of stringent constraint, followed by decays that allow (for the case of regulatory regions) or reject (for coding regions and ultraconserved elements) insertions and deletions.
Notes:
 
DOI   
PMID 
Philippe Lemey, Sergei L Kosakovsky Pond, Alexei J Drummond, Oliver G Pybus, Beth Shapiro, Helena Barroso, Nuno Taveira, Andrew Rambaut (2007)  Synonymous substitution rates predict HIV disease progression as a result of underlying replication dynamics.   PLoS Comput Biol 3: 2. Feb  
Abstract: Upon HIV transmission, some patients develop AIDS in only a few months, while others remain disease free for 20 or more years. This variation in the rate of disease progression is poorly understood and has been attributed to host genetics, host immune responses, co-infection, viral genetics, and adaptation. Here, we develop a new "relaxed-clock" phylogenetic method to estimate absolute rates of synonymous and nonsynonymous substitution through time. We identify an unexpected association between the synonymous substitution rate of HIV and disease progression parameters. Since immune activation is the major determinant of HIV disease progression, we propose that this process can also determine viral generation times, by creating favourable conditions for HIV replication. These conclusions may apply more generally to HIV evolution, since we also observed an overall low synonymous substitution rate for HIV-2, which is known to be less pathogenic than HIV-1 and capable of tempering the detrimental effects of immune activation. Humoral immune responses, on the other hand, are the major determinant of nonsynonymous rate changes through time in the envelope gene, and our relaxed-clock estimates support a decrease in selective pressure as a consequence of immune system collapse.
Notes:
 
DOI   
PMID 
Raja Mazumder, Zhang-Zhi Hu, C R Vinayaka, Jose-Luis Sagripanti, Simon D W Frost, Sergei L Kosakovsky Pond, Cathy H Wu (2007)  Computational analysis and identification of amino acid sites in dengue E proteins relevant to development of diagnostics and vaccines.   Virus Genes 35: 2. 175-186 Oct  
Abstract: We have identified 72 completely conserved amino acid residues in the E protein of major groups of the Flavivirus genus by computational analyses. In the dengue species we have identified 12 highly conserved sequence regions, 186 negatively selected sites, and many dengue serotype-specific negatively selected sites. The flavivirus-conserved sites included residues involved in forming six disulfide bonds crucial for the structural integrity of the protein, the fusion motif involved in viral infectivity, and the interface residues of the oligomers. The structural analysis of the E protein showed 19 surface-exposed non-conserved residues, 128 dimer or trimer interface residues, and regions, which undergo major conformational change during trimerization. Eleven consensus T(h)-cell epitopes common to all four dengue serotypes were predicted. Most of these corresponded to dengue-conserved regions or negatively selected sites. Of special interest are six singular sites (N(37), Q(211), D(215), P(217), H(244), K(246)) in dengue E protein that are conserved, are part of the predicted consensus T(h)-cell epitopes and are exposed in the dimer or trimer. We propose these sites and corresponding epitopic regions as potential candidates for prioritization by experimental biologists for development of diagnostics and vaccines that may be difficult to circumvent by natural or man-made alteration of dengue virus.
Notes:
2006
 
DOI   
PMID 
Satish K Pillai, Sergei L Kosakovsky Pond, Yang Liu, Benjamin M Good, Matthew C Strain, Ronald J Ellis, Scott Letendre, Davey M Smith, Huldrych F Günthard, Igor Grant, Thomas D Marcotte, J Allen McCutchan, Douglas D Richman, Joseph K Wong (2006)  Genetic attributes of cerebrospinal fluid-derived HIV-1 env.   Brain 129: Pt 7. 1872-1883 Jul  
Abstract: HIV-1 often invades the CNS during primary infection, eventually resulting in neurological disorders in up to 50% of untreated patients. The CNS is a distinct viral reservoir, differing from peripheral tissues in immunological surveillance, target cell characteristics and antiretroviral penetration. Neurotropic HIV-1 likely develops distinct genotypic characteristics in response to this unique selective environment. We sought to catalogue the genetic features of CNS-derived HIV-1 by analysing 456 clonal RNA sequences of the C2-V3 env subregion generated from CSF and plasma of 18 chronically infected individuals. Neuropsychological performance of all subjects was evaluated and summarized as a global deficit score. A battery of phylogenetic, statistical and machine learning tools was applied to these data to identify genetic features associated with HIV-1 neurotropism and neurovirulence. Eleven of 18 individuals exhibited significant viral compartmentalization between blood and CSF (P < 0.01, Slatkin-Maddison test). A CSF-specific genetic signature was identified, comprising positions 9, 13 and 19 of the V3 loop. The residue at position 5 of the V3 loop was highly correlated with neurocognitive deficit (P < 0.0025, Fisher's exact test). Antibody-mediated HIV-1 neutralizing activity was significantly reduced in CSF with respect to autologous blood plasma (P < 0.042, Student's t-test). Accordingly, CSF-derived sequences exhibited constrained diversity and contained fewer glycosylated and positively selected sites. Our results suggest that there are several genetic features that distinguish CSF- and plasma-derived HIV-1 populations, probably reflecting altered cellular entry requirements and decreased immune pressure in the CNS. Furthermore, neurological impairment may be influenced by mutations within the viral V3 loop sequence.
Notes:
 
DOI   
PMID 
Ulf Sorhannus, Sergei L Kosakovsky Pond (2006)  Evidence for positive selection on a sexual reproduction gene in the diatom genus Thalassiosira (Bacillariophyta).   J Mol Evol 63: 2. 231-239 Aug  
Abstract: Single likelihood ancestor counting (SLAC), fixed effects likelihood (FEL), and several random effects likelihood (REL) methods were utilized to identify positively and negatively selected sites in sexually induced gene 1 (Sig1) of four different Thalassiosira species. The SLAC analysis did not find any sites affected by positive selection but suggested 13 sites influenced by negative selection. The SLAC approach may be too conservative because of low sequence divergence. The FEL and REL analyses revealed over 60 negatively selected sites and two positively selected sites that were unique to each method. The REL method may not be able to reliably identify individual sites under selection when applied to short sequences with low divergence. Instead, we proposed a new alignment-wide test for adaptive evolution based on codon models with variation in synonymous and nonsynonymous substitution rates among sites and found evidence for diversifying evolution without relying on site-by-site testing. The performance of the FEL and REL approaches was evaluated by subjecting the tests to a type I error rate simulation analysis, using the specific characteristics of the Sig1 data set. Simulation results indicated that the FEL test had reasonable Type I errors, while REL might have been too liberal, suggesting that the two positively selected sites identified by FEL (codons 94 and 174) are not likely to be false positives. The evolution of these codon sites, one of which is located in functional domain II, appears to be associated with divergence among the three major Thalassiosira lineages.
Notes:
 
DOI   
PMID 
John P Huelsenbeck, Sonia Jain, Simon W D Frost, Sergei L Kosakovsky Pond (2006)  A Dirichlet process model for detecting positive selection in protein-coding DNA sequences.   Proc Natl Acad Sci U S A 103: 16. 6263-6268 Apr  
Abstract: Most methods for detecting Darwinian natural selection at the molecular level rely on estimating the rates or numbers of nonsynonymous and synonymous changes in an alignment of protein-coding DNA sequences. In some of these methods, the nonsynonymous rate of substitution is allowed to vary across the sequence, permitting the identification of single amino acid positions that are under positive natural selection. However, it is unclear which probability distribution should be used to describe how the nonsynonymous rate of substitution varies across the sequence. One widely used solution is to model variation in the nonsynonymous rate across the sequence as a mixture of several discrete or continuous probability distributions. Unfortunately, there is little population genetics theory to inform us of the appropriate probability distribution for among-site variation in the nonsynonymous rate of substitution. Here, we describe an approach to modeling variation in the nonsynonymous rate of substitution by using a Dirichlet process mixture model. The Dirichlet process allows there to be a countably infinite number of nonsynonymous rate classes and is very flexible in accommodating different potential distributions for the nonsynonymous rate of substitution. We implemented the model in a fully Bayesian approach, with all parameters of the model considered as random variables.
Notes:
 
DOI   
PMID 
Sergei L Kosakovsky Pond, David Posada, Michael B Gravenor, Christopher H Woelk, Simon D W Frost (2006)  Automated phylogenetic detection of recombination using a genetic algorithm.   Mol Biol Evol 23: 10. 1891-1901 Oct  
Abstract: The evolution of homologous sequences affected by recombination or gene conversion cannot be adequately explained by a single phylogenetic tree. Many tree-based methods for sequence analysis, for example, those used for detecting sites evolving nonneutrally, have been shown to fail if such phylogenetic incongruity is ignored. However, it may be possible to propose several phylogenies that can correctly model the evolution of nonrecombinant fragments. We propose a model-based framework that uses a genetic algorithm to search a multiple-sequence alignment for putative recombination break points, quantifies the level of support for their locations, and identifies sequences or clades involved in putative recombination events. The software implementation can be run quickly and efficiently in a distributed computing environment, and various components of the methods can be chosen for computational expediency or statistical rigor. We evaluate the performance of the new method on simulated alignments and on an array of published benchmark data sets. Finally, we demonstrate that prescreening alignments with our method allows one to analyze recombinant sequences for positive selection.
Notes:
 
DOI   
PMID 
Sergei L Kosakovsky Pond, Simon D W Frost, Zehava Grossman, Michael B Gravenor, Douglas D Richman, Andrew J Leigh Brown (2006)  Adaptation to different human populations by HIV-1 revealed by codon-based analyses.   PLoS Comput Biol 2: 6. Jun  
Abstract: Several codon-based methods are available for detecting adaptive evolution in protein-coding sequences, but to date none specifically identify sites that are selected differentially in two populations, although such comparisons between populations have been historically useful in identifying the action of natural selection. We have developed two fixed effects maximum likelihood methods: one for identifying codon positions showing selection patterns that persist in a population and another for detecting whether selection is operating differentially on individual codons of a gene sampled from two different populations. Applying these methods to two HIV populations infecting genetically distinct human hosts, we have found that few of the positively selected amino acid sites persist in the population; the other changes are detected only at the tips of the phylogenetic tree and appear deleterious in the long term. Additionally, we have identified seven amino acid sites in protease and reverse transcriptase that are selected differentially in the two samples, demonstrating specific population-level adaptation of HIV to human populations.
Notes:
2005
 
DOI   
PMID 
Sergei Kosakovsky Pond, Spencer V Muse (2005)  Site-to-site variation of synonymous substitution rates.   Mol Biol Evol 22: 12. 2375-2385 Dec  
Abstract: We develop a new model for studying the molecular evolution of protein-coding DNA sequences. In contrast to existing models, we incorporate the potential for site-to-site heterogeneity of both synonymous and nonsynonymous substitution rates. We demonstrate that within-gene heterogeneity of synonymous substitution rates appears to be common. Using the new family of models, we investigate the utility of a variety of new statistical inference procedures, and we pay particular attention to issues surrounding the detection of sites undergoing positive selection. We discuss how failure to model synonymous rate variation in the model can lead to misidentification of sites as positively selected.
Notes:
 
DOI   
PMID 
Satish K Pillai, Benjamin Good, Sergei Kosakovsky Pond, Joseph K Wong, Matt C Strain, Douglas D Richman, Davey M Smith (2005)  Semen-specific genetic characteristics of human immunodeficiency virus type 1 env.   J Virol 79: 3. 1734-1742 Feb  
Abstract: Human immunodeficiency virus type 1 (HIV-1) in the male genital tract may comprise virus produced locally in addition to virus transported from the circulation. Virus produced in the male genital tract may be genetically distinct, due to tissue-specific cellular characteristics and immunological pressures. HIV-1 env sequences derived from paired blood and semen samples from the Los Alamos HIV Sequence Database were analyzed to ascertain a male genital tract-specific viral signature. Machine learning algorithms could predict seminal tropism based on env sequences with accuracies exceeding 90%, suggesting that a strong genetic signature does exist for virus replicating in the male genital tract. Additionally, semen-derived viral populations exhibited constrained diversity (P < 0.05), decreased levels of positive selection (P < 0.025), decreased CXCR4 coreceptor utilization, and altered glycosylation patterns. Our analysis suggests that the male genital tract represents a distinct selective environment that contributes to the apparent genetic bottlenecks associated with the sexual transmission of HIV-1.
Notes:
 
DOI   
PMID 
Sergei L Kosakovsky Pond, Simon D W Frost (2005)  A genetic algorithm approach to detecting lineage-specific variation in selection pressure.   Mol Biol Evol 22: 3. 478-485 Mar  
Abstract: The ratio of nonsynonymous (dN) to synonymous (dS) substitution rates, omega, provides a measure of selection at the protein level. Models have been developed that allow omega to vary among lineages. However, these models require the lineages in which differential selection has acted to be specified a priori. We propose a genetic algorithm approach to assign lineages in a phylogeny to a fixed number of different classes of omega, thus allowing variable selection pressure without a priori specification of particular lineages. This approach can identify models with a better fit than a single-ratio model, and with fits that are better than (in an information theoretic sense) a fully local model, in which all lineages are assumed to evolve under different values of omega, but with far fewer parameters. By averaging over models which explain the data reasonably well, we can assess the robustness of our conclusions to uncertainty in model estimation. Our approach can also be used to compare results from models in which branch classes are specified a priori with a wide range of credible models. We illustrate our methods on primate lysozyme sequences and compare them with previous methods applied to the same data sets.
Notes:
 
DOI   
PMID 
Sergei L Kosakovsky Pond, Simon D W Frost (2005)  Not so different after all: a comparison of methods for detecting amino acid sites under selection.   Mol Biol Evol 22: 5. 1208-1222 May  
Abstract: We consider three approaches for estimating the rates of nonsynonymous and synonymous changes at each site in a sequence alignment in order to identify sites under positive or negative selection: (1) a suite of fast likelihood-based "counting methods" that employ either a single most likely ancestral reconstruction, weighting across all possible ancestral reconstructions, or sampling from ancestral reconstructions; (2) a random effects likelihood (REL) approach, which models variation in nonsynonymous and synonymous rates across sites according to a predefined distribution, with the selection pressure at an individual site inferred using an empirical Bayes approach; and (3) a fixed effects likelihood (FEL) method that directly estimates nonsynonymous and synonymous substitution rates at each site. All three methods incorporate flexible models of nucleotide substitution bias and variation in both nonsynonymous and synonymous substitution rates across sites, facilitating the comparison between the methods. We demonstrate that the results obtained using these approaches show broad agreement in levels of Type I and Type II error and in estimates of substitution rates. Counting methods are well suited for large alignments, for which there is high power to detect positive and negative selection, but appear to underestimate the substitution rate. A REL approach, which is more computationally intensive than counting methods, has higher power than counting methods to detect selection in data sets of intermediate size but may suffer from higher rates of false positives for small data sets. A FEL approach appears to capture the pattern of rate variation better than counting methods or random effects models, does not suffer from as many false positives as random effects models for data sets comprising few sequences, and can be efficiently parallelized. Our results suggest that previously reported differences between results obtained by counting methods and random effects models arise due to a combination of the conservative nature of counting-based methods, the failure of current random effects models to allow for variation in synonymous substitution rates, and the naive application of random effects models to extremely sparse data sets. We demonstrate our methods on sequence data from the human immunodeficiency virus type 1 env and pol genes and simulated alignments.
Notes:
 
DOI   
PMID 
Satish K Pillai, Sergei L Kosakovsky Pond, Christopher H Woelk, Douglas D Richman, Davey M Smith (2005)  Codon volatility does not reflect selective pressure on the HIV-1 genome.   Virology 336: 2. 137-143 Jun  
Abstract: Codon volatility is defined as the proportion of a codon's point-mutation neighbors that encode different amino acids. The cumulative volatility of a gene in relation to its associated genome was recently reported to be an indicator of selection pressure. We used this approach to measure selection on all available full-length HIV-1 subtype B genomes in the Los Alamos HIV Sequence Database, and compared these estimates against those obtained via established likelihood- and distance-based comparative methods. Volatility failed to correlate with the results of any of the comparative methods demonstrating that it is not a reliable indicator of selection pressure.
Notes:
 
DOI   
PMID 
Sergei L Kosakovsky Pond, Simon D W Frost (2005)  A simple hierarchical approach to modeling distributions of substitution rates.   Mol Biol Evol 22: 2. 223-234 Feb  
Abstract: Genetic sequence data typically exhibit variability in substitution rates across sites. In practice, there is often too little variation to fit a different rate for each site in the alignment, but the distribution of rates across sites may not be well modeled using simple parametric families. Mixtures of different distributions can capture more complex patterns of rate variation, but are often parameter-rich and difficult to fit. We present a simple hierarchical model in which a baseline rate distribution, such as a gamma distribution, is discretized into several categories, the quantiles of which are estimated using a discretized beta distribution. Although this approach involves adding only two extra parameters to a standard distribution, a wide range of rate distributions can be captured. Using simulated data, we demonstrate that a "beta-" model can reproduce the moments of the rate distribution more accurately than the distribution used to simulate the data, even when the baseline rate distribution is misspecified. Using hepatitis C virus and mammalian mitochondrial sequences, we show that a beta- model can fit as well or better than a model with multiple discrete rate categories, and compares favorably with a model which fits a separate rate category to each site. We also demonstrate this discretization scheme in the context of codon models specifically aimed at identifying individual sites undergoing adaptive or purifying evolution.
Notes:
 
DOI   
PMID 
Simon D W Frost, Yang Liu, Sergei L Kosakovsky Pond, Colombe Chappey, Terri Wrin, Christos J Petropoulos, Susan J Little, Douglas D Richman (2005)  Characterization of human immunodeficiency virus type 1 (HIV-1) envelope variation and neutralizing antibody responses during transmission of HIV-1 subtype B.   J Virol 79: 10. 6523-6527 May  
Abstract: We analyzed neutralization sensitivity and genetic variation of transmitted subtype B human immunodeficiency virus type 1 (HIV-1) in eight recently infected men who have sex with men and the virus from the six subjects who infected them. In contrast to reports of heterosexual transmission of subtype C HIV-1, in which the transmitted virus appears to be more neutralization sensitive, we demonstrate that in our study population, relatively few phenotypic changes in neutralization sensitivity or genotypic changes in envelope occurred during transmission of subtype B HIV-1. We suggest that limited genetic variation within the infecting host reduces the likelihood of selective transmission of neutralization-sensitive HIV.
Notes:
 
DOI   
PMID 
Sergei L Kosakovsky Pond, Simon D W Frost (2005)  Datamonkey: rapid detection of selective pressure on individual sites of codon alignments.   Bioinformatics 21: 10. 2531-2533 May  
Abstract: Datamonkey is a web interface to a suite of cutting edge maximum likelihood-based tools for identification of sites subject to positive or negative selection. The methods range from very fast data exploration to the some of the most complex models available in public domain software, and are implemented to run in parallel on a cluster of computers. AVAILABILITY: http://www.datamonkey.org. In the future, we plan to expand the collection of available analytic tools, and provide a package for installation on other systems.
Notes:
 
DOI   
PMID 
Simon D W Frost, Terri Wrin, Davey M Smith, Sergei L Kosakovsky Pond, Yang Liu, Ellen Paxinos, Colombe Chappey, Justin Galovich, Jeff Beauchaine, Christos J Petropoulos, Susan J Little, Douglas D Richman (2005)  Neutralizing antibody responses drive the evolution of human immunodeficiency virus type 1 envelope during recent HIV infection.   Proc Natl Acad Sci U S A 102: 51. 18514-18519 Dec  
Abstract: HIV type 1 (HIV-1) can rapidly escape from neutralizing antibody responses. The genetic basis of this escape in vivo is poorly understood. We compared the pattern of evolution of the HIV-1 env gene between individuals with recent HIV infection whose virus exhibited either a low or a high rate of escape from neutralizing antibody responses. We demonstrate that the rate of viral escape at a phenotypic level is highly variable among individuals, and is strongly correlated with the rate of amino acid substitutions. We show that dramatic escape from neutralizing antibodies can occur in the relative absence of changes in glycosylation or insertions and deletions ("indels") in the envelope; conversely, changes in glycosylation and indels occur even in the absence of neutralizing antibody responses. Comparison of our data with the predictions of a mathematical model support a mechanism in which escape from neutralizing antibodies occurs via many amino acid substitutions, with low cross-neutralization between closely related viral strains. Our results suggest that autologous neutralizing antibody responses may play a pivotal role in the diversification of HIV-1 envelope during the early stages of infection.
Notes:
 
DOI   
PMID 
Sergei L Kosakovsky Pond, Simon D W Frost, Spencer V Muse (2005)  HyPhy: hypothesis testing using phylogenies.   Bioinformatics 21: 5. 676-679 Mar  
Abstract: SUMMARY: The HyPhypackage is designed to provide a flexible and unified platform for carrying out likelihood-based analyses on multiple alignments of molecular sequence data, with the emphasis on studies of rates and patterns of sequence evolution. AVAILABILITY: http://www.hyphy.org CONTACT: muse@stat.ncsu.edu SUPPLEMENTARY INFORMATION: HyPhydocumentation and tutorials are available at http://www.hyphy.org.
Notes:
2004
 
DOI   
PMID 
Sergei L Kosakovsky Pond, Spencer V Muse (2004)  Column sorting: rapid calculation of the phylogenetic likelihood function.   Syst Biol 53: 5. 685-692 Oct  
Abstract: Likelihood applications have become a central approach for molecular evolutionary analyses since the first computationally tractable treatment two decades ago. Although Felsenstein's original pruning algorithm makes likelihood calculations feasible, it is usually possible to take advantage of repetitive structure present in the data to arrive at even greater computational reductions. In particular, alignment columns with certain similarities have components of the likelihood calculation that are identical and need not be recomputed if columns are evaluated in an optimal order. We develop an algorithm for exploiting this speed improvement via an application of graph theory. The reductions provided by the method depend on both the tree and the data, but typical savings range between 15%and 50%. Real-data examples with time reductions of 80%have been identified. The overhead costs associated with implementing the algorithm are minimal, and they are recovered in all but the smallest data sets. The modifications will provide faster likelihood algorithms, which will allow likelihood methods to be applied to larger sets of taxa and to include more thorough searches of the tree topology space.
Notes:
2002
 
PMID 
Rachel L Israel, Sergei L Kosakovsky Pond, Spencer V Muse, Laura A Katz (2002)  Evolution of duplicated alpha-tubulin genes in ciliates.   Evolution Int J Org Evolution 56: 6. 1110-1122 Jun  
Abstract: Ciliates provide a powerful system to analyze the evolution of duplicated alpha-tubulin genes in the context of single-celled organisms. Genealogical analyses of ciliate alpha-tubulin sequences reveal five apparently recent gene duplications. Comparisons of paralogs in different ciliates implicate differing patterns of substitutions (e.g., ratios of replacement/synonymous nucleotides and radical/conservative amino acids) following duplication. Most substitutions between paralogs in Euplotes crassus, Halteria grandinella and Paramecium tetraurelia are synonymous. In contrast, alpha-tubulin paralogs within Stylonychia lemnae and Chilodonella uncinata are evolving at significantly different rates and have higher ratios of both replacement substitutions to synonymous substitutions and radical amino acid changes to conservative amino acid changes. Moreover, the amino acid substitutions in C. uncinata and S. lemnae paralogs are limited to short stretches that correspond to functionally important regions of the alpha-tubulin protein. The topology of ciliate alpha-tubulin genealogies are inconsistent with taxonomy based on morphology and other molecular markers, which may be due to taxonomic sampling, gene conversion, unequal rates of evolution, or asymmetric patterns of gene duplication and loss.
Notes:
2001
 
PMID 
L Zhang, S K Pond, B S Gaut (2001)  A survey of the molecular evolutionary dynamics of twenty-five multigene families from four grass taxa.   J Mol Evol 52: 2. 144-156 Feb  
Abstract: We surveyed the molecular evolutionary characteristics of 25 plant gene families, with the goal of better understanding general processes in plant gene family evolution. The survey was based on 247 GenBank sequences representing four grass species (maize, rice, wheat, and barley). For each gene family, orthology and paralogy relationships were uncertain. Recognizing this uncertainty, we characterized the molecular evolution of each gene family in four ways. First, we calculated the ratio of nonsynonymous to synonymous substitutions (d(N)/d(S)) both on branches of gene phylogenies and across codons. Our results indicated that the d(N)/d(S) ratio was statistically heterogeneous across branches in 17 of 25 (68%) gene families. The vast majority of d(N)/d(S) estimates were <<1.0, suggestive of selective constraint on amino acid replacements, and no estimates were >1.0, either across phylogenetic lineages or across codons. Second, we tested separately for nonsynonymous and synonymous molecular clocks. Sixty-eight percent of gene families rejected a nonsynonymous molecular clock, and 52% of gene families rejected a synonymous molecular clock. Thus, most gene families in this study deviated from clock-like evolution at either synonymous or nonsynonymous sites. Third, we calculated the effective number of codons and the proportion of G+C synonymous sites for each sequence in each gene family. One or both quantities vary significantly within 18 of 25 gene families. Finally, we tested for gene conversion, and only six gene families provided evidence of gene conversion events. Altogether, evolution for these 25 gene families is marked by selective constraint that varies among gene family members, a lack of molecular clock at both synonymous and nonsynonymous sites, and substantial variation in codon usage.
Notes:
Powered by publicationslist.org.