Institut de Biologia Evolutiva
facilityBarcelona, Catalonia, Spain
Research output, citation impact, and the most-cited recent papers from Institut de Biologia Evolutiva (Spain). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Institut de Biologia Evolutiva
Neandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development. We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other.
Abstract Somatic mutations in cancer genomes are caused by multiple mutational processes, each of which generates a characteristic mutational signature 1 . Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium 2 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), we characterized mutational signatures using 84,729,690 somatic mutations from 4,645 whole-genome and 19,184 exome sequences that encompass most types of cancer. We identified 49 single-base-substitution, 11 doublet-base-substitution, 4 clustered-base-substitution and 17 small insertion-and-deletion signatures. The substantial size of our dataset, compared with previous analyses 3–15 , enabled the discovery of new signatures, the separation of overlapping signatures and the decomposition of signatures into components that may represent associated—but distinct—DNA damage, repair and/or replication mechanisms. By estimating the contribution of each signature to the mutational catalogues of individual cancer genomes, we revealed associations of signatures to exogenous or endogenous exposures, as well as to defective DNA-maintenance processes. However, many signatures are of unknown cause. This analysis provides a systematic perspective on the repertoire of mutational processes that contribute to the development of human cancer.
Abstract Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale 1–3 . Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4–5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter 4 ; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation 5,6 ; analyses timings and patterns of tumour evolution 7 ; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity 8,9 ; and evaluates a range of more-specialized features of cancer genomes 8,10–18 .
Abstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species 1–4 . To address this issue, the international Genome 10K (G10K) consortium 5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
, much of which is attributable to common risk alleles. Here, in a two-stage genome-wide association study of up to 76,755 individuals with schizophrenia and 243,649 control individuals, we report common variant associations at 287 distinct genomic loci. Associations were concentrated in genes that are expressed in excitatory and inhibitory neurons of the central nervous system, but not in other tissues or cell types. Using fine-mapping and functional genomic data, we identify 120 genes (106 protein-coding) that are likely to underpin associations at some of these loci, including 16 genes with credible causal non-synonymous or untranslated region variation. We also implicate fundamental processes related to neuronal function, including synaptic organization, differentiation and transmission. Fine-mapped candidates were enriched for genes associated with rare disruptive coding variants in people with schizophrenia, including the glutamate receptor subunit GRIN2A and transcription factor SP4, and were also enriched for genes implicated by such variants in neurodevelopmental disorders. We identify biological processes relevant to schizophrenia pathophysiology; show convergence of common and rare variant associations in schizophrenia and neurodevelopmental disorders; and provide a resource of prioritized genes and variants to advance mechanistic studies.
Gene sequences sampled at different points in time can be used to infer molecular phylogenies on a natural timescale of months or years, provided that the sequences in question undergo measurable amounts of evolutionary change between sampling times. Data sets with this property are termed heterochronous and have become increasingly common in several fields of biology, most notably the molecular epidemiology of rapidly evolving viruses. Here we introduce the cross-platform software tool, TempEst (formerly known as Path-O-Gen), for the visualization and analysis of temporally sampled sequence data. Given a molecular phylogeny and the dates of sampling for each sequence, TempEst uses an interactive regression approach to explore the association between genetic divergence through time and sampling dates. TempEst can be used to (1) assess whether there is sufficient temporal signal in the data to proceed with phylogenetic molecular clock analysis, and (2) identify sequences whose genetic divergence and sampling date are incongruent. Examination of the latter can help identify data quality problems, including errors in data annotation, sample contamination, sequence recombination, or alignment error. We recommend that all users of the molecular clock models implemented in BEAST first check their data using TempEst prior to analysis.
Plant traits-the morphological, anatomical, physiological, biochemical and phenological characteristics of plants-determine how plants respond to environmental factors, affect other trophic levels, and influence ecosystem properties and their benefits and detriments to people. Plant trait data thus represent the basis for a vast area of research spanning from evolutionary biology, community and functional ecology, to biodiversity conservation, ecosystem and landscape management, restoration, biogeography and earth system modelling. Since its foundation in 2007, the TRY database of plant traits has grown continuously. It now provides unprecedented data coverage under an open access data policy and is the main plant trait database used by the research community worldwide. Increasingly, the TRY database also supports new frontiers of trait-based plant research, including the identification of data gaps and the subsequent mobilization or measurement of new data. To support this development, in this article we evaluate the extent of the trait data compiled in TRY and analyse emerging patterns of data coverage and representativeness. Best species coverage is achieved for categorical traits-almost complete coverage for 'plant growth form'. However, most traits relevant for ecology and vegetation modelling are characterized by continuous intraspecific variation and trait-environmental relationships. These traits have to be measured on individual plants in their respective environment. Despite unprecedented data coverage, we observe a humbling lack of completeness and representativeness of these continuous traits in many aspects. We, therefore, conclude that reducing data gaps and biases in the TRY database remains a key challenge and requires a coordinated approach to data mobilization and trait measurements. This can only be achieved in collaboration with other initiatives.
To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago.
As a key factor in endemic and epidemic dynamics, the geographical distribution of viruses has been frequently interpreted in the light of their genetic histories. Unfortunately, inference of historical dispersal or migration patterns of viruses has mainly been restricted to model-free heuristic approaches that provide little insight into the temporal setting of the spatial dynamics. The introduction of probabilistic models of evolution, however, offers unique opportunities to engage in this statistical endeavor. Here we introduce a Bayesian framework for inference, visualization and hypothesis testing of phylogeographic history. By implementing character mapping in a Bayesian software that samples time-scaled phylogenies, we enable the reconstruction of timed viral dispersal patterns while accommodating phylogenetic uncertainty. Standard Markov model inference is extended with a stochastic search variable selection procedure that identifies the parsimonious descriptions of the diffusion process. In addition, we propose priors that can incorporate geographical sampling distributions or characterize alternative hypotheses about the spatial dynamics. To visualize the spatial and temporal information, we summarize inferences using virtual globe software. We describe how Bayesian phylogeography compares with previous parsimony analysis in the investigation of the influenza A H5N1 origin and H5N1 epidemiological linkage among sampling localities. Analysis of rabies in West African dog populations reveals how virus diffusion may enable endemic maintenance through continuous epidemic cycles. From these analyses, we conclude that our phylogeographic framework will make an important asset in molecular epidemiology that can be easily generalized to infer biogeogeography from genetic data for many organisms.
From an evolutionary perspective, social behaviours are those which have fitness consequences for both the individual that performs the behaviour, and another individual. Over the last 43 years, a huge theoretical and empirical literature has developed on this topic. However, progress is often hindered by poor communication between scientists, with different people using the same term to mean different things, or different terms to mean the same thing. This can obscure what is biologically important, and what is not. The potential for such semantic confusion is greatest with interdisciplinary research. Our aim here is to address issues of semantic confusion that have arisen with research on the problem of cooperation. In particular, we: (i) discuss confusion over the terms kin selection, mutualism, mutual benefit, cooperation, altruism, reciprocal altruism, weak altruism, altruistic punishment, strong reciprocity, group selection and direct fitness; (ii) emphasize the need to distinguish between proximate (mechanism) and ultimate (survival value) explanations of behaviours. We draw examples from all areas, but especially recent work on humans and microbes.
Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans. Deep whole-genome sequencing of 300 individuals from 142 diverse populations provides insights into key population genetic parameters, shows that all modern human ancestry outside of Africa including in Australasians is consistent with descending from a single founding population, and suggests a higher rate of accumulation of mutations in non-Africans compared to Africans since divergence. Three international collaborations reporting in this issue of Nature describe 787 high-quality genomes from individuals from geographically diverse populations. David Reich and colleagues analysed whole-genome sequences of 300 individuals from 142 populations. Their findings include an accelerated estimated rate of accumulation of mutations in non-Africans compared to Africans since divergence, and that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans but from the same source as that of other non-Africans. Eske Willerlsev and colleagues obtained whole-genome data for 83 Aboriginal Australians and 25 Papuans from the New Guinea Highlands. They estimate that Aboriginal Australians and Papuans diverged from Eurasian populations 51,000–72,000 years ago, following a single out-of-Africa dispersal. Luca Pagani et al. report on a dataset of 483 high-coverage human genomes from 148 populations worldwide, including 379 new genomes from 125 populations. Their analyses support the model by which all non-African populations derive most of their genetic ancestry from a single recent migration out of Africa, although a Papuan contribution suggests a trace of an earlier human expansion.
This revision of the classification of eukaryotes follows that of Adl et al., 2012 [J. Euk. Microbiol. 59(5)] and retains an emphasis on protists. Changes since have improved the resolution of many nodes in phylogenetic analyses. For some clades even families are being clearly resolved. As we had predicted, environmental sampling in the intervening years has massively increased the genetic information at hand. Consequently, we have discovered novel clades, exciting new genera and uncovered a massive species level diversity beyond the morphological species descriptions. Several clades known from environmental samples only have now found their home. Sampling soils, deeper marine waters and the deep sea will continue to fill us with surprises. The main changes in this revision are the confirmation that eukaryotes form at least two domains, the loss of monophyly in the Excavata, robust support for the Haptista and Cryptista. We provide suggested primer sets for DNA sequences from environmental samples that are effective for each clade. We have provided a guide to trophic functional guilds in an appendix, to facilitate the interpretation of environmental samples, and a standardized taxonomic guide for East Asian users.
Rapid climate change has been implicated as a cause of evolution in poorly adapted populations. However, phenotypic plasticity provides the potential for organisms to respond rapidly and effectively to environmental change. Using a 47-year population study of the great tit (Parus major) in the United Kingdom, we show that individual adjustment of behavior in response to the environment has enabled the population to track a rapidly changing environment very closely. Individuals were markedly invariant in their response to environmental variation, suggesting that the current response may be fixed in this population. Phenotypic plasticity can thus play a central role in tracking environmental change; understanding the limits of plasticity is an important goal for future research.
Abstract Individual humans, and members of diverse other species, show consistent differences in aggressiveness, shyness, sociability and activity. Such intraspecific differences in behaviour have been widely assumed to be non‐adaptive variation surrounding (possibly) adaptive population‐average behaviour. Nevertheless, in keeping with recent calls to apply Darwinian reasoning to ever‐finer scales of biological variation, we sketch the fundamentals of an adaptive theory of consistent individual differences in behaviour. Our thesis is based on the notion that such ‘personality differences’ can be selected for if fitness payoffs are dependent on both the frequencies with which competing strategies are played and an individual's behavioural history. To this end, we review existing models that illustrate this and propose a game theoretic approach to analyzing personality differences that is both dynamic and state‐dependent. Our motivation is to provide insights into the evolution and maintenance of an apparently common animal trait: personality, which has far reaching ecological and evolutionary implications.
Sequencing of the genome of the butterfly Heliconius melpomene shows that closely related Heliconius species exchange protective colour-pattern genes promiscuously. Heliconius butterflies are an excellent system in which to study ecology, behaviour, mimicry and speciation. The genome of the postman butterfly Heliconius melpomene has now been sequenced. Using genomic resequencing of individuals from distinct lineages, the authors document heterogenous patterns of genomic diversity associated with adaptively divergent wing-colour patterns. As the second lepidopteran genome to be sequenced, Heliconius offers novel opportunities for comparative genomics within this economically significant insect order, which includes many pest species, as well as the only domesticated insect, the silkmoth Bombyx mori. The evolutionary importance of hybridization and introgression has long been debated1. Hybrids are usually rare and unfit, but even infrequent hybridization can aid adaptation by transferring beneficial traits between species. Here we use genomic tools to investigate introgression in Heliconius, a rapidly radiating genus of neotropical butterflies widely used in studies of ecology, behaviour, mimicry and speciation2,3,4,5. We sequenced the genome of Heliconius melpomene and compared it with other taxa to investigate chromosomal evolution in Lepidoptera and gene flow among multiple Heliconius species and races. Among 12,669 predicted genes, biologically important expansions of families of chemosensory and Hox genes are particularly noteworthy. Chromosomal organization has remained broadly conserved since the Cretaceous period, when butterflies split from the Bombyx (silkmoth) lineage. Using genomic resequencing, we show hybrid exchange of genes between three co-mimics, Heliconius melpomene, Heliconius timareta and Heliconius elevatus, especially at two genomic regions that control mimicry pattern. We infer that closely related Heliconius species exchange protective colour-pattern genes promiscuously, implying that hybridization has an important role in adaptive radiation.
BACKGROUND As global climate change accelerates, one of the most urgent tasks for the coming decades is to develop accurate predictions about biological responses to guide the effective protection of biodiversity. Predictive models in biology provide a means for scientists to project changes to species and ecosystems in response to disturbances such as climate change. Most current predictive models, however, exclude important biological mechanisms such as demography, dispersal, evolution, and species interactions. These biological mechanisms have been shown to be important in mediating past and present responses to climate change. Thus, current modeling efforts do not provide sufficiently accurate predictions. Despite the many complexities involved, biologists are rapidly developing tools that include the key biological processes needed to improve predictive accuracy. The biggest obstacle to applying these more realistic models is that the data needed to inform them are almost always missing. We suggest ways to fill this growing gap between model sophistication and information to predict and prevent the most damaging aspects of climate change for life on Earth. ADVANCES On the basis of empirical and theoretical evidence, we identify six biological mechanisms that commonly shape responses to climate change yet are too often missing from current predictive models: physiology; demography, life history, and phenology; species interactions; evolutionary potential and population differentiation; dispersal, colonization, and range dynamics; and responses to environmental variation. We prioritize the types of information needed to inform each of these mechanisms and suggest proxies for data that are missing or difficult to collect. We show that even for well-studied species, we often lack critical information that would be necessary to apply more realistic, mechanistic models. Consequently, data limitations likely override the potential gains in accuracy of more realistic models. Given the enormous challenge of collecting this detailed information on millions of species around the world, we highlight practical methods that promote the greatest gains in predictive accuracy. Trait-based approaches leverage sparse data to make more general inferences about unstudied species. Targeting species with high climate sensitivity and disproportionate ecological impact can yield important insights about future ecosystem change. Adaptive modeling schemes provide a means to target the most important data while simultaneously improving predictive accuracy. OUTLOOK Strategic collections of essential biological information will allow us to build generalizable insights that inform our broader ability to anticipate species’ responses to climate change and other human-caused disturbances. By increasing accuracy and making uncertainties explicit, scientists can deliver improved projections for biodiversity under climate change together with characterizations of uncertainty to support more informed decisions by policymakers and land managers. Toward this end, a globally coordinated effort to fill data gaps in advance of the growing climate-fueled biodiversity crisis offers substantial advantages in efficiency, coverage, and accuracy. Biologists can take advantage of the lessons learned from the Intergovernmental Panel on Climate Change’s development, coordination, and integration of climate change projections. Climate and weather projections were greatly improved by incorporating important mechanisms and testing predictions against global weather station data. Biology can do the same. We need to adopt this meteorological approach to predicting biological responses to climate change to enhance our ability to mitigate future changes to global biodiversity and the services it provides to humans. Emerging models are beginning to incorporate six key biological mechanisms that can improve predictions of biological responses to climate change. Models that include biological mechanisms have been used to project (clockwise from top) the evolution of disease-harboring mosquitoes, future environments and land use, physiological responses of invasive species such as cane toads, demographic responses of penguins to future climates, climate-dependent dispersal behavior in butterflies, and mismatched interactions between butterflies and their host plants. Despite these modeling advances, we seldom have the detailed data needed to build these models, necessitating new efforts to collect the relevant data to parameterize more biologically realistic predictive models.
We study the evolution of inversions that capture locally adapted alleles when two populations are exchanging migrants or hybridizing. By suppressing recombination between the loci, a new inversion can spread. Neither drift nor coadaptation between the alleles (epistasis) is needed, so this local adaptation mechanism may apply to a broader range of genetic and demographic situations than alternative hypotheses that have been widely discussed. The mechanism can explain many features observed in inversion systems. It will drive an inversion to high frequency if there is no countervailing force, which could explain fixed differences observed between populations and species. An inversion can be stabilized at an intermediate frequency if it also happens to capture one or more deleterious recessive mutations, which could explain polymorphisms that are common in some species. This polymorphism can cycle in frequency with the changing selective advantage of the locally favored alleles. The mechanism can establish underdominant inversions that decrease heterokaryotype fitness by several percent if the cause of fitness loss is structural, while if the cause is genic there is no limit to the strength of underdominance that can result. The mechanism is expected to cause loci responsible for adaptive species-specific differences to map to inversions, as seen in recent QTL studies. We discuss data that support the hypothesis, review other mechanisms for inversion evolution, and suggest possible tests.
MOTIVATION: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrapment in local optima. Metropolis coupled MCMC [(MC)(3)], a variant of MCMC, allows multiple peaks in the landscape of trees to be more readily explored, but at the cost of increased execution time. RESULTS: This paper presents a parallel algorithm for (MC)(3). The proposed parallel algorithm retains the ability to explore multiple peaks in the posterior distribution of trees while maintaining a fast execution time. The algorithm has been implemented using two popular parallel programming models: message passing and shared memory. Performance results indicate nearly linear speed improvement in both programming models for small and large data sets.
Aridity, which is increasing worldwide because of climate change, affects the structure and functioning of dryland ecosystems. Whether aridification leads to gradual (versus abrupt) and systemic (versus specific) ecosystem changes is largely unknown. We investigated how 20 structural and functional ecosystem attributes respond to aridity in global drylands. Aridification led to systemic and abrupt changes in multiple ecosystem attributes. These changes occurred sequentially in three phases characterized by abrupt decays in plant productivity, soil fertility, and plant cover and richness at aridity values of 0.54, 0.7, and 0.8, respectively. More than 20% of the terrestrial surface will cross one or several of these thresholds by 2100, which calls for immediate actions to minimize the negative impacts of aridification on essential ecosystem services for the more than 2 billion people living in drylands.
Abstract Cancer develops through a process of somatic evolution 1,2 . Sequencing data from a single biopsy represent a snapshot of this process that can reveal the timing of specific genomic aberrations and the changing influence of mutational processes 3 . Here, by whole-genome sequencing analysis of 2,658 cancers as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) 4 , we reconstruct the life history and evolution of mutational processes and driver mutation sequences of 38 types of cancer. Early oncogenesis is characterized by mutations in a constrained set of driver genes, and specific copy number gains, such as trisomy 7 in glioblastoma and isochromosome 17q in medulloblastoma. The mutational spectrum changes significantly throughout tumour evolution in 40% of samples. A nearly fourfold diversification of driver genes and increased genomic instability are features of later stages. Copy number alterations often occur in mitotic crises, and lead to simultaneous gains of chromosomal segments. Timing analyses suggest that driver mutations often precede diagnosis by many years, if not decades. Together, these results determine the evolutionary trajectories of cancer, and highlight opportunities for early cancer detection.