NobleBlocks

European Molecular Biology Laboratory

governmentHeidelberg, Baden-Wurttemberg, Germany

Research output, citation impact, and the most-cited recent papers from European Molecular Biology Laboratory (Germany). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
9.9K
Citations
4.2M
h-index
687
i10-index
16.0K
Also known as
EMBL HeidelbergEuropean Molecular Biology LaboratoryEuropäisches Laboratorium für Molekularbiologie

Top-cited papers from European Molecular Biology Laboratory

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
Michael I. Love, Wolfgang Huber, Simon Anders
2014· Genome biology99.2Kdoi:10.1186/s13059-014-0550-8

In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html webcite.

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Julie Thompson, Desmond G. Higgins, Toby J. Gibson
1994· Nucleic Acids Research64.8Kdoi:10.1093/nar/22.22.4673

The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

Clustal W and Clustal X version 2.0
Mark Larkin, Gordon Blackshields, Nigel P. Brown, R. Chenna +4 more
2007· Bioinformatics29.0Kdoi:10.1093/bioinformatics/btm404

SUMMARY: The Clustal W and Clustal X multiple sequence alignment programs have been completely rewritten in C++. This will facilitate the further development of the alignment algorithms in the future and has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems. AVAILABILITY: The programs can be run on-line from the EBI web server: http://www.ebi.ac.uk/tools/clustalw2. The source code and executables for Windows, Linux and Macintosh computers are available from the EBI ftp site ftp://ftp.ebi.ac.uk/pub/software/clustalw2/

HTSeq—a Python framework to work with high-throughput sequencing data
Simon Anders, Paul Theodor Pyl, Wolfgang Huber
2014· Bioinformatics22.6Kdoi:10.1093/bioinformatics/btu638

MOTIVATION: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. RESULTS: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. AVAILABILITY AND IMPLEMENTATION: HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq.

A global reference for human genetic variation
Corresponding authors, Adam Auton, Gonçalo R. Abecasis, David M. Altshuler +4 more
2015· Nature19.8Kdoi:10.1038/nature15393

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.

STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets
Damian Szklarczyk, Annika L. Gable, David Lyon, Alexander Junge +4 more
2018· Nucleic Acids Research19.1Kdoi:10.1093/nar/gky1131

Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.

Differential expression analysis for sequence count data
Simon Anders, Wolfgang Huber
2010· Genome biology16.4Kdoi:10.1186/gb-2010-11-10-r106

High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.

A human gut microbial gene catalogue established by metagenomic sequencing
Junjie Qin, Ruiqiang Li, Jeroen Raes, Manimozhiyan Arumugam +4 more
2010· Nature11.6Kdoi:10.1038/nature08821

To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set, ∼150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions present in all individuals and most bacteria, respectively. The human body plays host to an estimated 100 trillion microbial cells, most of them in the gut where they have a profound influence on human physiology and nutrition — and are now regarded as crucial for human life. Gut microbes contribute to the energy harvest from food, and changes of gut microbiome may be associated with bowel diseases or obesity. Now the international MetaHIT (Metagenomics of the Human Intestinal Tract) project has published a gene catalogue of the human gut microbiome derived from 124 healthy, overweight and obese human adults, as well as inflammatory disease patients, from Denmark and Spain. The resulting data provide the first insights into this gene set — which is over 150 times larger than the human gene complement — and show that the genes are largely shared among individuals. Based on the variety of functions encoded by the gene set, it is possible to define both a minimal gut metagenome and a minimal gut bacterial genome. Deep metagenomic sequencing and characterization of the human gut microbiome from healthy and obese individuals, as well as those suffering from inflammatory bowel disease, provide the first insights into this gene set and how much of it is shared among individuals. The minimal gut metagenome as well as the minimal gut bacterial genome is also described.

STRING v10: protein–protein interaction networks, integrated over the tree of life
Damian Szklarczyk, Andrea Franceschini, Stefan Wyder, Kristoffer Forslund +4 more
2014· Nucleic Acids Research11.1Kdoi:10.1093/nar/gku1003

The many functional partnerships and interactions that occur between proteins are at the core of cellular processing and their systematic characterization helps to provide context in molecular systems biology. However, known and predicted interactions are scattered over multiple resources, and the available data exhibit notable differences in terms of quality and completeness. The STRING database (http://string-db.org) aims to provide a critical assessment and integration of protein-protein interactions, including direct (physical) as well as indirect (functional) associations. The new version 10.0 of STRING covers more than 2000 organisms, which has necessitated novel, scalable algorithms for transferring interaction information between organisms. For this purpose, we have introduced hierarchical and self-consistent orthology annotations for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution. Further improvements in version 10.0 include a completely redesigned prediction pipeline for inferring protein-protein associations from co-expression data, an API interface for the R computing environment and improved statistical analysis for enrichment tests in user-provided networks.

Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic Analysis
José Castresana
2000· Molecular Biology and Evolution10.8Kdoi:10.1093/oxfordjournals.molbev.a026334

The use of some multiple-sequence alignments in phylogenetic analysis, particularly those that are not very well conserved, requires the elimination of poorly aligned positions and divergent regions, since they may not be homologous or may have been saturated by multiple substitutions. A computerized method that eliminates such positions and at the same time tries to minimize the loss of informative sites is presented here. The method is based on the selection of blocks of positions that fulfill a simple set of requirements with respect to the number of contiguous conserved positions, lack of gaps, and high conservation of flanking positions, making the final alignment more suitable for phylogenetic analysis. To illustrate the efficiency of this method, alignments of 10 mitochondrial proteins from several completely sequenced mitochondrial genomes belonging to diverse eukaryotes were used as examples. The percentages of removed positions were higher in the most divergent alignments. After removing divergent segments, the amino acid composition of the different sequences was more uniform, and pairwise distances became much smaller. Phylogenetic trees show that topologies can be different after removing conserved blocks, particularly when there are several poorly resolved nodes. Strong support was found for the grouping of animals and fungi but not for the position of more basal eukaryotes. The use of a computerized method such as the one presented here reduces to a certain extent the necessity of manually editing multiple alignments, makes the automation of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers.

Mass Spectrometric Sequencing of Proteins from Silver-Stained Polyacrylamide Gels
Andrej Shevchenko, Matthias Wilm, Ole Vorm, Matthias Mann
1996· Analytical Chemistry9.1Kdoi:10.1021/ac950914h

Proteins from silver-stained gels can be digested enzymatically and the resulting peptide analyzed and sequenced by mass spectrometry. Standard proteins yield the same peptide maps when extracted from Coomassie- and silver-stained gels, as judged by electrospray and MALDI mass spectrometry. The low nanogram range can be reached by the protocols described here, and the method is robust. A silver-stained one-dimensional gel of a fraction from yeast proteins was analyzed by nano-electrospray tandem mass spectrometry. In the sequencing, more than 1000 amino acids were covered, resulting in no evidence of chemical modifications due to the silver staining procedure. Silver staining allows a substantial shortening of sample preparation time and may, therefore, be preferable over Coomassie staining. This work removes a major obstacle to the low-level sequence analysis of proteins separated on polyacrylamide gels.

The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets
Damian Szklarczyk, Annika L Gable, Katerina Nastou, David Lyon +4 more
2020· Nucleic Acids Research8.5Kdoi:10.1093/nar/gkaa1074

Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein-protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.

An integrated map of genetic variation from 1,092 human genomes
 Zamin Iqbal ,  Zamin Iqbal,  Andy Rimmer,  Anjali Gupta-Hinch +4 more
2012· Nature8.2Kdoi:10.1038/nature11632

By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations. This report from the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations; hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites, can be found in each individual. This report by the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations. Integrative analyses reveal profiles of rare and common variants in different populations. The frequencies of rare variants vary across biological pathways, and hundreds of rare, non-coding variants at conserved sites — such as changes disrupting transcription-factor motifs — can be established for each individual.

A map of human genome variation from population-scale sequencing
 Min Hu,  Yuan Chen,  James Stalker,  Richard M. Durbin  +4 more
2010· Nature8.1Kdoi:10.1038/nature09534

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother–father–child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10−8 per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research. This issue of Nature contains the first publication from The 1000 Genomes Project, an international collaboration that will produce an extensive public catalogue of human genetic variation. The plan, in fact, is to sequence about 2,000 unidentified individuals from 20 populations around the world. This first paper presents the results from the project's pilot phase, testing three different strategies for genome-wide sequencing with high-throughput platforms: low-coverage whole-genome sequencing of 179 individuals in three population groups, high-coverage sequencing of two mother–father–child trios, and exon-targeted sequencing of 697 individuals from seven populations. The goal of the 1000 Genomes Project is to provide in-depth information on variation in human genome sequences. In the pilot phase reported here, different strategies for genome-wide sequencing, using high-throughput sequencing platforms, were developed and compared. The resulting data set includes more than 95% of the currently accessible variants found in any individual, and can be used to inform association and functional studies.

The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible
Damian Szklarczyk, John H. Morris, Helen Cook, Michael Kuhn +4 more
2016· Nucleic Acids Research7.4Kdoi:10.1093/nar/gkw937

A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organisms. The associations in STRING include direct (physical) interactions, as well as indirect (functional) interactions, as long as both are specific and biologically meaningful. Apart from collecting and reassessing available experimental data on protein-protein interactions, and importing known pathways and protein complexes from curated databases, interaction predictions are derived from the following sources: (i) systematic co-expression analysis, (ii) detection of shared selective signals across genomes, (iii) automated text-mining of the scientific literature and (iv) computational transfer of interaction knowledge between organisms based on gene orthology. In the latest version 10.5 of STRING, the biggest changes are concerned with data dissemination: the web frontend has been completely redesigned to reduce dependency on outdated browser technologies, and the database can now also be queried from inside the popular Cytoscape software framework. Further improvements include automated background analysis of user inputs for functional enrichments, and streamlined download options. The STRING resource is available online, at http://string-db.org/.

The Genome Sequence of <i>Drosophila melanogaster</i>
Mark D. Adams, S Celniker, Robert A. Holt, Cheryl Evans +4 more
2000· Science6.0Kdoi:10.1126/science.287.5461.2185

The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

Guidelines for the use and interpretation of assays for monitoring autophagy (3rd edition)
Daniel J. Klionsky, Kotb Abdelmohsen, Akihisa Abe, Md. Joynal Abedin +4 more
2016· Autophagy6.0Kdoi:10.1080/15548627.2015.1100356

AUTORES: Daniel J Klionsky1745,1749*, Kotb Abdelmohsen840, Akihisa Abe1237, Md Joynal Abedin1762, Hagai Abeliovich425,&#13;\nAbraham Acevedo Arozena789, Hiroaki Adachi1800, Christopher M Adams1669, Peter D Adams57, Khosrow Adeli1981,&#13;\nPeter J Adhihetty1625, Sharon G Adler700, Galila Agam67, Rajesh Agarwal1587, Manish K Aghi1537, Maria Agnello1826,&#13;\nPatrizia Agostinis664, Patricia V Aguilar1960, Julio Aguirre-Ghiso784,786, Edoardo M Airoldi89,422, Slimane Ait-Si-Ali1376,&#13;\nTakahiko Akematsu2010, Emmanuel T Akporiaye1097, Mohamed Al-Rubeai1394, Guillermo M Albaiceta1294,&#13;\nChris Albanese363, Diego Albani561, Matthew L Albert517, Jesus Aldudo128, Hana Alg€ul1164, Mehrdad Alirezaei1198,&#13;\nIraide Alloza642,888, Alexandru Almasan206, Maylin Almonte-Beceril524, Emad S Alnemri1212, Covadonga Alonso544,&#13;\nNihal Altan-Bonnet848, Dario C Altieri1205, Silvia Alvarez1497, Lydia Alvarez-Erviti1395, Sandro Alves107,&#13;\nGiuseppina Amadoro860, Atsuo Amano930, Consuelo Amantini1554, Santiago Ambrosio1458, Ivano Amelio756,&#13;\nAmal O Amer918, Mohamed Amessou2089, Angelika Amon726, Zhenyi An1538, Frank A Anania291, Stig U Andersen6,&#13;\nUsha P Andley2079, Catherine K Andreadi1690, Nathalie Andrieu-Abadie502, Alberto Anel2027, David K Ann58,&#13;\nShailendra Anoopkumar-Dukie388, Manuela Antonioli832,858, Hiroshi Aoki1791, Nadezda Apostolova2007,&#13;\nSaveria Aquila1500, Katia Aquilano1876, Koichi Araki292, Eli Arama2098, Agustin Aranda456, Jun Araya591,&#13;\nAlexandre Arcaro1472, Esperanza Arias26, Hirokazu Arimoto1225, Aileen R Ariosa1749, Jane L Armstrong1930,&#13;\nThierry Arnould1773, Ivica Arsov2120, Katsuhiko Asanuma675, Valerie Askanas1924, Eric Asselin1867, Ryuichiro Atarashi794,&#13;\nSally S Atherton369, Julie D Atkin713, Laura D Attardi1131, Patrick Auberger1787, Georg Auburger379, Laure Aurelian1727,&#13;\nRiccardo Autelli1992, Laura Avagliano1029,1755, Maria Laura Avantaggiati364, Limor Avrahami1166, Suresh Awale1986,&#13;\nNeelam Azad404, Tiziana Bachetti568, Jonathan M Backer28, Dong-Hun Bae1933, Jae-sung Bae677, Ok-Nam Bae409,&#13;\nSoo Han Bae2117, Eric H Baehrecke1729, Seung-Hoon Baek17, Stephen Baghdiguian1368,&#13;\nAgnieszka Bagniewska-Zadworna2, Hua Bai90, Jie Bai667, Xue-Yuan Bai1133, Yannick Bailly884,&#13;\nKithiganahalli Narayanaswamy Balaji473, Walter Balduini2002, Andrea Ballabio316, Rena Balzan1711, Rajkumar Banerjee239,&#13;\nG abor B anhegyi1052, Haijun Bao2109, Benoit Barbeau1363, Maria D Barrachina2007, Esther Barreiro467, Bonnie Bartel997,&#13;\nAlberto Bartolom e222, Diane C Bassham550, Maria Teresa Bassi1046, Robert C Bast Jr1273, Alakananda Basu1798,&#13;\nMaria Teresa Batista1578, Henri Batoko1336, Maurizio Battino970, Kyle Bauckman2085, Bradley L Baumgarner1909,&#13;\nK Ulrich Bayer1594, Rupert Beale1553, Jean-Fran¸cois Beaulieu1360, George R. Beck Jr48,294, Christoph Becker336,&#13;\nJ David Beckham1595, Pierre-Andr e B edard749, Patrick J Bednarski301, Thomas J Begley1135, Christian Behl1419,&#13;\nChristian Behrends757, Georg MN Behrens406, Kevin E Behrns1627, Eloy Bejarano26, Amine Belaid490,&#13;\nFrancesca Belleudi1041, Giovanni B enard497, Guy Berchem706, Daniele Bergamaschi983, Matteo Bergami1401,&#13;\nBen Berkhout1441, Laura Berliocchi714, Am elie Bernard1749, Monique Bernard1354, Francesca Bernassola1880,&#13;\nAnne Bertolotti791, Amanda S Bess272, S ebastien Besteiro1351, Saverio Bettuzzi1828, Savita Bhalla913,&#13;\nShalmoli Bhattacharyya973, Sujit K Bhutia838, Caroline Biagosch1159, Michele Wolfe Bianchi520,1378,1381,&#13;\nMartine Biard-Piechaczyk210, Viktor Billes298, Claudia Bincoletto1314, Baris Bingol350, Sara W Bird1128, Marc Bitoun1112,&#13;\nIvana Bjedov1258, Craig Blackstone843, Lionel Blanc1183, Guillermo A Blanco1496, Heidi Kiil Blomhoff1812,&#13;\nEmilio Boada-Romero1297, Stefan B€ockler1464, Marianne Boes1423, Kathleen Boesze-Battaglia1835, Lawrence H Boise286,287,&#13;\nAlessandra Bolino2063, Andrea Boman693, Paolo Bonaldo1823, Matteo Bordi897, J€urgen Bosch608, Luis M Botana1308,&#13;\nJoelle Botti1375, German Bou1405, Marina Bouch e1038, Marion Bouchecareilh1331, Marie-Jos ee Boucher1901,&#13;\nMichael E Boulton481, Sebastien G Bouret1926, Patricia Boya133, Micha€el Boyer-Guittaut1345, Peter V Bozhkov1141,&#13;\nNathan Brady374, Vania MM Braga469, Claudio Brancolini1997, Gerhard H Braus353, Jos e M Bravo-San Pedro299,393,508,1374,&#13;\nLisa A Brennan322, Emery H Bresnick2022, Patrick Brest490, Dave Bridges1939, Marie-Agn es Bringer124, Marisa Brini1822,&#13;\nGlauber C Brito1311, Bertha Brodin631, Paul S Brookes1872, Eric J Brown352, Karen Brown1690, Hal E Broxmeyer480,&#13;\nAlain Bruhat486,1339, Patricia Chakur Brum1893, John H Brumell446, Nicola Brunetti-Pierri315,1171,&#13;\nRobert J Bryson-Richardson781, Shilpa Buch1777, Alastair M Buchan1819, Hikmet Budak1022, Dmitry V Bulavin118,505,1789,&#13;\nScott J Bultman1792, Geert Bultynck665, Vladimir Bumbasirevic1470, Yan Burelle1356, Robert E Burke216,217,&#13;\nMargit Burmeister1750, Peter B€utikofer1473, Laura Caberlotto1987, Ken Cadwell896, Monika Cahova112, Dongsheng Cai24,&#13;\nJingjing Cai2099, Qian Cai1018, Sara Calatayud2007, Nadine Camougrand1343, Michelangelo Campanella1700,&#13;\nGrant R Campbell1525, Matthew Campbell1249, Silvia Campello556,1876, Robin Candau1769, Isabella Caniggia1983,&#13;\nLavinia Cantoni560, Lizhi Cao116, Allan B Caplan1656, Michele Caraglia1051, Claudio Cardinali1043, Sandra Morais Cardoso1579, Jennifer S Carew208, Laura A Carleton874, Cathleen R Carlin101, Silvia Carloni2002,&#13;\nSven R Carlsson1267, Didac Carmona-Gutierrez1643, Leticia AM Carneiro312, Oliana Carnevali971, Serena Carra1318,&#13;\nAlice Carrier120, Bernadette Carroll900, Caty Casas1324, Josefina Casas1116, Giuliana Cassinelli324, Perrine Castets1462,&#13;\nSusana Castro-Obregon214, Gabriella Cavallini1841, Isabella Ceccherini568, Francesco Cecconi253,555,1884,&#13;\nArthur I Cederbaum459, Valent ın Ce~na199,1281, Simone Cenci1323,2064, Claudia Cerella444, Davide Cervia1996,&#13;\nSilvia Cetrullo1478, Hassan Chaachouay2028, Han-Jung Chae187, Andrei S Chagin634, Chee-Yin Chai626,628,&#13;\nGopal Chakrabarti1502, Georgios Chamilos1601, Edmond YW Chan1142, Matthew TV Chan181, Dhyan Chandra1003,&#13;\nPallavi Chandra548, Chih-Peng Chang818, Raymond Chuen-Chung Chang1653, Ta Yuan Chang345, John C Chatham1434,&#13;\nSaurabh Chatterjee1910, Santosh Chauhan527, Yongsheng Che62, Michael E Cheetham1263, Rajkumar Cheluvappa1783,&#13;\nChun-Jung Chen1153, Gang Chen598,1676, Guang-Chao Chen9, Guoqiang Chen1078, Hongzhuan Chen1077, Jeff W Chen1514,&#13;\nJian-Kang Chen370,371, Min Chen249, Mingzhou Chen2104, Peiwen Chen1823, Qi Chen1674, Quan Chen172,&#13;\nShang-Der Chen138, Si Chen325, Steve S-L Chen10, Wei Chen2125, Wei-Jung Chen829, Wen Qiang Chen979, Wenli Chen1113,&#13;\nXiangmei Chen1133, Yau-Hung Chen1157, Ye-Guang Chen1250, Yin Chen1447, Yingyu Chen953,955, Yongshun Chen2135,&#13;\nYu-Jen Chen712, Yue-Qin Chen1145, Yujie Chen1208, Zhen Chen339, Zhong Chen2123, Alan Cheng1702,&#13;\nChristopher HK Cheng184, Hua Cheng1728, Heesun Cheong814, Sara Cherry1836, Jason Chesney1703,&#13;\nChun Hei Antonio Cheung817, Eric Chevet1359, Hsiang Cheng Chi140, Sung-Gil Chi656, Fulvio Chiacchiera308,&#13;\nHui-Ling Chiang958, Roberto Chiarelli1826, Mario Chiariello235,567,577, Marcello Chieppa835, Lih-Shen Chin290,&#13;\nMario Chiong1285, Gigi NC Chiu878, Dong-Hyung Cho676, Ssang-Goo Cho650, William C Cho982, Yong-Yeon Cho105,&#13;\nYoung-Seok Cho1064, Augustine MK Choi2095, Eui-Ju Choi656, Eun-Kyoung Choi387,400,685, Jayoung Choi1563,&#13;\nMary E Choi2093, Seung-Il Choi2116, Tsui-Fen Chou412, Salem Chouaib395, Divaker Choubey1574, Vinay Choubey1936,&#13;\nKuan-Chih Chow822, Kamal Chowdhury730, Charleen T Chu1856, Tsung-Hsien Chuang827, Taehoon Chun657,&#13;\nHyewon Chung652, Taijoon Chung978, Yuen-Li Chung1194, Yong-Joon Chwae18, Valentina Cianfanelli254,&#13;\nRoberto Ciarcia1775, Iwona A Ciechomska886, Maria Rosa Ciriolo1876, Mara Cirone1042, Sofie Claerhout1694,&#13;\nMichael J Clague1698, Joan Cl aria1457, Peter GH Clarke1687, Robert Clarke361, Emilio Clementi1045,1398, C edric Cleyrat1781,&#13;\nMiriam Cnop1366, Eliana M Coccia574, Tiziana Cocco1459, Patrice Codogno1375, J€orn Coers271, Ezra EW Cohen1533,&#13;\nDavid Colecchia235,567,577, Luisa Coletto25, N uria S Coll123, Emma Colucci-Guyon516, Sergio Comincini1829,&#13;\nMaria Condello578, Katherine L Cook2073, Graham H Coombs1929, Cynthia D Cooper2076, J Mark Cooper1395,&#13;\nIsabelle Coppens601, Maria Tiziana Corasaniti1387, Marco Corazzari485,1884, Ramon Corbalan1566,&#13;\nElisabeth Corcelle-Termeau251, Mario D Cordero1899, Cristina Corral-Ramos1289, Olga Corti507,1109, Andrea Cossarizza1767,&#13;\nPaola Costelli1993, Safia Costes1518, Susan L Cotman721, Ana Coto-Montes946, Sandra Cottet566,1688, Eduardo Couve1301,&#13;\nLori R Covey1015, L Ashley Cowart762, Jeffery S Cox1536, Fraser P Coxon1427, Carolyn B Coyne1846, Mark S Cragg1919,&#13;\nRolf J Craven1679, Tiziana Crepaldi1995, Jose L Crespo1300, Alfredo Criollo1285, Valeria Crippa558, Maria Teresa Cruz1576,&#13;\nAna Maria Cuervo26, Jose M Cuezva1277, Taixing Cui1907, Pedro R Cutillas987, Mark J Czaja27, Maria F Czyzyk-Krzeska1572,&#13;\nRuben K Dagda2068, Uta Dahmen1404, Chunsun Dai800, Wenjie Dai1187, Yun Dai2059, Kevin N Dalby1940,&#13;\nLuisa Dalla Valle1822, Guillaume Dalmasso1340, Marcello D’Amelio557, Markus Damme188, Arlette Darfeuille-Michaud1340,&#13;\nCatherine Dargemont950, Victor M Darley-Usmar1433, Srinivasan Dasarathy205, Biplab Dasgupta202, Srikanta Dash1254,&#13;\nCrispin R Dass242, Hazel Marie Davey8, Lester M Davids1560, David D avila227, Roger J Davis1731, Ted M Dawson604,&#13;\nValina L Dawson606, Paula Daza1898, Jackie de Belleroche470, Paul de Figueiredo1180,1182,&#13;\nRegina Celia Bressan Queiroz de Figueiredo135, Jos e de la Fuente1023, Luisa De Martino1775,&#13;\nAntonella De Matteis1171, Guido RY De Meyer1443, Angelo De Milito631, Mauro De Santi2002,

New tools for automated high-resolution cryo-EM structure determination in RELION-3
Jasenko Zivanov, Takanori Nakane, Björn Forsberg, Dari Kimanius +3 more
2018· eLife5.4Kdoi:10.7554/elife.42166

Here, we describe the third major release of RELION. CPU-based vector acceleration has been added in addition to GPU support, which provides flexibility in use of resources and avoids memory limitations. Reference-free autopicking with Laplacian-of-Gaussian filtering and execution of jobs from python allows non-interactive processing during acquisition, including 2D-classification, de novo model generation and 3D-classification. Per-particle refinement of CTF parameters and correction of estimated beam tilt provides higher resolution reconstructions when particles are at different heights in the ice, and/or coma-free alignment has not been optimal. Ewald sphere curvature correction improves resolution for large particles. We illustrate these developments with publicly available data sets: together with a Bayesian approach to beam-induced motion correction it leads to resolution improvements of 0.2–0.7 Å compared to previous RELION versions.

eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses
Jaime Huerta‐Cepas, Damian Szklarczyk, Davide Heller, Ana Hernández-Plaza +4 more
2018· Nucleic Acids Research5.2Kdoi:10.1093/nar/gky1085

eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations. Here, we present version 5.0, featuring a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes, as well as 477 eukaryotic organisms and 2502 viral proteomes that were selected for diversity and filtered by genome quality. In total, 4.4M orthologous groups (OGs) distributed across 379 taxonomic levels were computed together with their associated sequence alignments, phylogenies, HMM models and functional descriptors. Precomputed evolutionary analysis provides fine-grained resolution of duplication/speciation events within each OG. Our benchmarks show that, despite doubling the amount of genomes, the quality of orthology assignments and functional annotations (80% coverage) has persisted without significant changes across this update. Finally, we improved eggNOG online services for fast functional annotation and orthology prediction of custom genomics or metagenomics datasets. All precomputed data are publicly available for downloading or via API queries at http://eggnog.embl.de.

Software for Computing and Annotating Genomic Ranges
Michael Lawrence, Wolfgang Huber, Hervé Pagès, Patrick Aboyoun +4 more
2013· PLoS Computational Biology4.9Kdoi:10.1371/journal.pcbi.1003118

We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.