Max Delbrück Center

facilityBerlin, Germany

Research output, citation impact, and the most-cited recent papers from Max Delbrück Center (Germany). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works

18.1K

Citations

3.7M

h-index

688

i10-index

28.3K

Also known as

Max Delbrück CenterMax Delbrück Center for Molecular MedicineMax Delbrück Center for Molecular Medicine in the Helmholtz AssociationMax-Delbrück-Centrum für Molekulare Medizin in der Helmholtz-Gemeinschaft

Top-cited papers from Max Delbrück Center

Initial sequencing and analysis of the human genome

Eric S. Lander, Lauren Linton, Bruce W. Birren, Chad Nusbaum +4 more

2001· Nature24.7Kdoi:10.1038/35057062

The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets

Damian Szklarczyk, Annika L. Gable, David Lyon, Alexander Junge +4 more

2018· Nucleic Acids Research19.3Kdoi:10.1093/nar/gky1131

Proteins and their functional interactions form the backbone of the cellular machinery. Their connectivity network needs to be considered for the full understanding of biological phenomena, but the available information on protein-protein associations is incomplete and exhibits varying levels of annotation granularity and reliability. The STRING database aims to collect, score and integrate all publicly available sources of protein-protein interaction information, and to complement these with computational predictions. Its goal is to achieve a comprehensive and objective global network, including direct (physical) as well as indirect (functional) interactions. The latest version of STRING (11.0) more than doubles the number of organisms it covers, to 5090. The most important new feature is an option to upload entire, genome-wide datasets as input, allowing users to visualize subsets as interaction networks and to perform gene-set enrichment analysis on the entire input. For the enrichment analysis, STRING implements well-known classification systems such as Gene Ontology and KEGG, but also offers additional, new classification systems based on high-throughput text-mining as well as on a hierarchical clustering of the association network itself. The STRING resource is available online at https://string-db.org/.

The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest

Damian Szklarczyk, Rebecca Kirsch, Mikaela Koutrouli, Katerina Nastou +4 more

2022· Nucleic Acids Research9.1Kdoi:10.1093/nar/gkac1000

Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein-protein interactions-both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.

The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets

Damian Szklarczyk, Annika L Gable, Katerina Nastou, David Lyon +4 more

2020· Nucleic Acids Research8.7Kdoi:10.1093/nar/gkaa1074

Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein-protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.

The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible

Damian Szklarczyk, John H. Morris, Helen Cook, Michael Kuhn +4 more

2016· Nucleic Acids Research7.5Kdoi:10.1093/nar/gkw937

A system-wide understanding of cellular function requires knowledge of all functional interactions between the expressed proteins. The STRING database aims to collect and integrate this information, by consolidating known and predicted protein-protein association data for a large number of organisms. The associations in STRING include direct (physical) interactions, as well as indirect (functional) interactions, as long as both are specific and biologically meaningful. Apart from collecting and reassessing available experimental data on protein-protein interactions, and importing known pathways and protein complexes from curated databases, interaction predictions are derived from the following sources: (i) systematic co-expression analysis, (ii) detection of shared selective signals across genomes, (iii) automated text-mining of the scientific literature and (iv) computational transfer of interaction knowledge between organisms based on gene orthology. In the latest version 10.5 of STRING, the biggest changes are concerned with data dissemination: the web frontend has been completely redesigned to reduce dependency on outdated browser technologies, and the database can now also be queried from inside the popular Cytoscape software framework. Further improvements include automated background analysis of user inputs for functional enrichments, and streamlined download options. The STRING resource is available online, at http://string-db.org/.

Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees

Ivica Letunić, Peer Bork

2016· Nucleic Acids Research5.4Kdoi:10.1093/nar/gkw290

Interactive Tree Of Life (http://itol.embl.de) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. It is freely available and open to everyone. The current version was completely redesigned and rewritten, utilizing current web technologies for speedy and streamlined processing. Numerous new features were introduced and several new data types are now supported. Trees with up to 100,000 leaves can now be efficiently displayed. Full interactive control over precise positioning of various annotation features and an unlimited number of datasets allow the easy creation of complex tree visualizations. iTOL 3 is the first tool which supports direct visualization of the recently proposed phylogenetic placements format. Finally, iTOL's account system has been redesigned to simplify the management of trees in user-defined workspaces and projects, as it is heavily used and currently handles already more than 500,000 trees from more than 10,000 individual users.

eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses

Jaime Huerta‐Cepas, Damian Szklarczyk, Davide Heller, Ana Hernández-Plaza +4 more

2018· Nucleic Acids Research5.3Kdoi:10.1093/nar/gky1085

eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations. Here, we present version 5.0, featuring a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes, as well as 477 eukaryotic organisms and 2502 viral proteomes that were selected for diversity and filtered by genome quality. In total, 4.4M orthologous groups (OGs) distributed across 379 taxonomic levels were computed together with their associated sequence alignments, phylogenies, HMM models and functional descriptors. Precomputed evolutionary analysis provides fine-grained resolution of duplication/speciation events within each OG. Our benchmarks show that, despite doubling the amount of genomes, the quality of orthology assignments and functional annotations (80% coverage) has persisted without significant changes across this update. Finally, we improved eggNOG online services for fast functional annotation and orthology prediction of custom genomics or metagenomics datasets. All precomputed data are publicly available for downloading or via API queries at http://eggnog.embl.de.

STRING v9.1: protein-protein interaction networks, with increased coverage and integration

Andrea Franceschini, Damian Szklarczyk, Sune Pletscher-Frankild, Michael Kuhn +4 more

2012· Nucleic Acids Research4.6Kdoi:10.1093/nar/gks1094

Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made-particularly for certain model organisms and functional systems. Currently, protein interactions and associations are annotated at various levels of detail in online resources, ranging from raw data repositories to highly formalized pathway databases. For many applications, a global view of all the available interaction data is desirable, including lower-quality data and/or computational predictions. The STRING database (http://string-db.org/) aims to provide such a global perspective for as many organisms as feasible. Known and predicted associations are scored and integrated, resulting in comprehensive protein networks covering >1100 organisms. Here, we describe the update to version 9.1 of STRING, introducing several improvements: (i) we extend the automated mining of scientific texts for interaction information, to now also include full-text articles; (ii) we entirely re-designed the algorithm for transferring interactions from one model organism to the other; and (iii) we provide users with statistical information on any functional enrichment observed in their networks.

The repertoire of mutational signatures in human cancer

Ludmil B. Alexandrov, Jaegil Kim, Nicholas J. Haradhvala, Mi Ni Huang +4 more

2020· Nature3.8Kdoi:10.1038/s41586-020-1943-3

Abstract Somatic mutations in cancer genomes are caused by multiple mutational processes, each of which generates a characteristic mutational signature 1 . Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium 2 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), we characterized mutational signatures using 84,729,690 somatic mutations from 4,645 whole-genome and 19,184 exome sequences that encompass most types of cancer. We identified 49 single-base-substitution, 11 doublet-base-substitution, 4 clustered-base-substitution and 17 small insertion-and-deletion signatures. The substantial size of our dataset, compared with previous analyses 3–15 , enabled the discovery of new signatures, the separation of overlapping signatures and the decomposition of signatures into components that may represent associated—but distinct—DNA damage, repair and/or replication mechanisms. By estimating the contribution of each signature to the mutational catalogues of individual cancer genomes, we revealed associations of signatures to exogenous or endogenous exposures, as well as to defective DNA-maintenance processes. However, many signatures are of unknown cause. This analysis provides a systematic perspective on the repertoire of mutational processes that contribute to the development of human cancer.

PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments

Mikita Suyama, David Torrents, Peer Bork

2006· Nucleic Acids Research3.6Kdoi:10.1093/nar/gkl315

PAL2NAL is a web server that constructs a multiple codon alignment from the corresponding aligned protein sequences. Such codon alignments can be used to evaluate the type and rate of nucleotide substitutions in coding DNA for a wide range of evolutionary analyses, such as the identification of levels of selective constraint acting on genes, or to perform DNA-based phylogenetic studies. The server takes a protein sequence alignment and the corresponding DNA sequences as input. In contrast to other existing applications, this server is able to construct codon alignments even if the input DNA sequence has mismatches with the input protein sequence, or contains untranslated regions and polyA tails. The server can also deal with frame shifts and inframe stop codons in the input models, and is thus suitable for the analysis of pseudogenes. Another distinct feature is that the user can specify a subregion of the input alignment in order to specifically analyze functional domains or exons of interest. The PAL2NAL server is available at http://www.bork.embl.de/pal2nal.

Physiology of Microglia

Helmut Kettenmann, Uwe‐Karsten Hanisch, Mami Noda, Alexei Verkhratsky

2011· Physiological Reviews3.5Kdoi:10.1152/physrev.00011.2010

Microglial cells are the resident macrophages in the central nervous system. These cells of mesodermal/mesenchymal origin migrate into all regions of the central nervous system, disseminate through the brain parenchyma, and acquire a specific ramified morphological phenotype termed "resting microglia." Recent studies indicate that even in the normal brain, microglia have highly motile processes by which they scan their territorial domains. By a large number of signaling pathways they can communicate with macroglial cells and neurons and with cells of the immune system. Likewise, microglial cells express receptors classically described for brain-specific communication such as neurotransmitter receptors and those first discovered as immune cell-specific such as for cytokines. Microglial cells are considered the most susceptible sensors of brain pathology. Upon any detection of signs for brain lesions or nervous system dysfunction, microglial cells undergo a complex, multistage activation process that converts them into the "activated microglial cell." This cell form has the capacity to release a large number of substances that can act detrimental or beneficial for the surrounding cells. Activated microglial cells can migrate to the site of injury, proliferate, and phagocytose cells and cellular compartments.

miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades

Marc R. Friedländer, Sebastian D. Mackowiak, Na Li, Wei Chen +1 more

2011· Nucleic Acids Research3.4Kdoi:10.1093/nar/gkr688

microRNAs (miRNAs) are a large class of small non-coding RNAs which post-transcriptionally regulate the expression of a large fraction of all animal genes and are important in a wide range of biological processes. Recent advances in high-throughput sequencing allow miRNA detection at unprecedented sensitivity, but the computational task of accurately identifying the miRNAs in the background of sequenced RNAs remains challenging. For this purpose, we have designed miRDeep2, a substantially improved algorithm which identifies canonical and non-canonical miRNAs such as those derived from transposable elements and informs on high-confidence candidates that are detected in multiple independent samples. Analyzing data from seven animal species representing the major animal clades, miRDeep2 identified miRNAs with an accuracy of 98.6-99.9% and reported hundreds of novel miRNAs. To test the accuracy of miRDeep2, we knocked down the miRNA biogenesis pathway in a human cell line and sequenced small RNAs before and after. The vast majority of the >100 novel miRNAs expressed in this cell line were indeed specifically downregulated, validating most miRDeep2 predictions. Last, a new miRNA expression profiling routine, low time and memory usage and user-friendly interactive graphic output can make miRDeep2 useful to a wide range of researchers.

Pan-cancer analysis of whole genomes

Lauri A. Aaltonen, Federico Abascal, Adam Abeshouse, Hiroyuki Aburatani +4 more

2020· Nature3.3Kdoi:10.1038/s41586-020-1969-6

Abstract Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale 1–3 . Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4–5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter 4 ; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation 5,6 ; analyses timings and patterns of tumour evolution 7 ; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity 8,9 ; and evaluates a range of more-specialized features of cancer genomes 8,10–18 .

Structure and function of the global ocean microbiome

Shinichi Sunagawa, Luís Pedro Coelho, Samuel Chaffron, Jens Roat Kultima +4 more

2015· Science3.2Kdoi:10.1126/science.1261359

Microbes are dominant drivers of biogeochemical processes, yet drawing a global picture of functional diversity, microbial community structure, and their ecological determinants remains a grand challenge. We analyzed 7.2 terabases of metagenomic data from 243 Tara Oceans samples from 68 locations in epipelagic and mesopelagic waters across the globe to generate an ocean microbial reference gene catalog with >40 million nonredundant, mostly novel sequences from viruses, prokaryotes, and picoeukaryotes. Using 139 prokaryote-enriched samples, containing >35,000 species, we show vertical stratification with epipelagic community composition mostly driven by temperature rather than other environmental factors or geography. We identify ocean microbial core functionality and reveal that >73% of its abundance is shared with the human gut microbiome despite the physicochemical differences between these two ecosystems.

Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper

Jaime Huerta‐Cepas, Kristoffer Forslund, Luís Pedro Coelho, Damian Szklarczyk +3 more

2017· Molecular Biology and Evolution3.0Kdoi:10.1093/molbev/msx148

Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs ∼15× faster than BLAST and at least 2.5× faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de.

Guidelines for the use and interpretation of assays for monitoring autophagy (4th edition)<sup>1</sup>

Daniel J. Klionsky, Amal Kamal Abdel‐Aziz, Sara Abdelfatah, Mahmoud Abdellatif +4 more

2021· Autophagy2.7Kdoi:10.1080/15548627.2020.1797280

autophagic responses. Here, we critically discuss current methods of assessing autophagy and the information they can, or cannot, provide. Our ultimate goal is to encourage intellectual and technical innovation in the field.

ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data

Jaime Huerta‐Cepas, François Serra, Peer Bork

2016· Molecular Biology and Evolution2.5Kdoi:10.1093/molbev/msw046

The Environment for Tree Exploration (ETE) is a computational framework that simplifies the reconstruction, analysis, and visualization of phylogenetic trees and multiple sequence alignments. Here, we present ETE v3, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics. The new features include (i) building gene-based and supermatrix-based phylogenies using a single command, (ii) testing and visualizing evolutionary models, (iii) calculating distances between trees of different size or including duplications, and (iv) providing seamless integration with the NCBI taxonomy database. ETE is freely available at http://etetoolkit.org.

Tracking the Evolution of Non–Small-Cell Lung Cancer

Mariam Jamal‐Hanjani, Gareth A. Wilson, Nicholas McGranahan, Nicolai J. Birkbak +4 more

2017· New England Journal of Medicine2.5Kdoi:10.1056/nejmoa1616288

BACKGROUND: Among patients with non-small-cell lung cancer (NSCLC), data on intratumor heterogeneity and cancer genome evolution have been limited to small retrospective cohorts. We wanted to prospectively investigate intratumor heterogeneity in relation to clinical outcome and to determine the clonal nature of driver events and evolutionary processes in early-stage NSCLC. METHODS: In this prospective cohort study, we performed multiregion whole-exome sequencing on 100 early-stage NSCLC tumors that had been resected before systemic therapy. We sequenced and analyzed 327 tumor regions to define evolutionary histories, obtain a census of clonal and subclonal events, and assess the relationship between intratumor heterogeneity and recurrence-free survival. RESULTS: ), which remained significant in multivariate analysis. CONCLUSIONS: Intratumor heterogeneity mediated through chromosome instability was associated with an increased risk of recurrence or death, a finding that supports the potential value of chromosome instability as a prognostic predictor. (Funded by Cancer Research UK and others; TRACERx ClinicalTrials.gov number, NCT01888601 .).

Circ-ZNF609 Is a Circular RNA that Can Be Translated and Functions in Myogenesis

Ivano Legnini, Gaia Di Timoteo, Francesca Rossi, Mariangela Morlando +4 more

2017· Molecular Cell2.4Kdoi:10.1016/j.molcel.2017.02.017

Circular RNAs (circRNAs) constitute a family of transcripts with unique structures and still largely unknown functions. Their biogenesis, which proceeds via a back-splicing reaction, is fairly well characterized, whereas their role in the modulation of physiologically relevant processes is still unclear. Here we performed expression profiling of circRNAs during in vitro differentiation of murine and human myoblasts, and we identified conserved species regulated in myogenesis and altered in Duchenne muscular dystrophy. A high-content functional genomic screen allowed the study of their functional role in muscle differentiation. One of them, circ-ZNF609, resulted in specifically controlling myoblast proliferation. Circ-ZNF609 contains an open reading frame spanning from the start codon, in common with the linear transcript, and terminating at an in-frame STOP codon, created upon circularization. Circ-ZNF609 is associated with heavy polysomes, and it is translated into a protein in a splicing-dependent and cap-independent manner, providing an example of a protein-coding circRNA in eukaryotes.

SMART: recent updates, new developments and status in 2020

Ivica Letunić, Supriya Khedkar, Peer Bork

2020· Nucleic Acids Research2.4Kdoi:10.1093/nar/gkaa937

SMART (Simple Modular Architecture Research Tool) is a web resource (https://smart.embl.de) for the identification and annotation of protein domains and the analysis of protein domain architectures. SMART version 9 contains manually curated models for more than 1300 protein domains, with a topical set of 68 new models added since our last update article (1). All the new models are for diverse recombinase families and subfamilies and as a set they provide a comprehensive overview of mobile element recombinases namely transposase, integrase, relaxase, resolvase, cas1 casposase and Xer like cellular recombinase. Further updates include the synchronization of the underlying protein databases with UniProt (2), Ensembl (3) and STRING (4), greatly increasing the total number of annotated domains and other protein features available in architecture analysis mode. Furthermore, SMART's vector-based protein display engine has been extended and updated to use the latest web technologies and the domain architecture analysis components have been optimized to handle the increased number of protein features available.

Search all NobleBlocks papers mentioning “Max Delbrück Center” →