NobleBlocks

Quantitative BioSciences

nonprofitSolana Beach, California, United States

Research output, citation impact, and the most-cited recent papers from Quantitative BioSciences (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
12.5K
Citations
1.1M
h-index
455
i10-index
6.4K
Also known as
Quantitative BioSciences

Top-cited papers from Quantitative BioSciences

School readiness and later achievement.
Greg J. Duncan, Chantelle Dowsett, Amy Claessens, Katherine Magnuson +4 more
2007· Developmental Psychology5.3Kdoi:10.1037/0012-1649.43.6.1428

Using 6 longitudinal data sets, the authors estimate links between three key elements of school readiness--school-entry academic, attention, and socioemotional skills--and later school reading and math achievement. In an effort to isolate the effects of these school-entry skills, the authors ensured that most of their regression models control for cognitive, attention, and socioemotional skills measured prior to school entry, as well as a host of family background measures. Across all 6 studies, the strongest predictors of later achievement are school-entry math, reading, and attention skills. A meta-analysis of the results shows that early math skills have the greatest predictive power, followed by reading and then attention skills. By contrast, measures of socioemotional behaviors, including internalizing and externalizing problems and social skills, were generally insignificant predictors of later academic performance, even among children with relatively high levels of problem behavior. Patterns of association were similar for boys and girls and for children from high and low socioeconomic backgrounds.

The repertoire of mutational signatures in human cancer
Ludmil B. Alexandrov, Jaegil Kim, Nicholas J. Haradhvala, Mi Ni Huang +4 more
2020· Nature3.7Kdoi:10.1038/s41586-020-1943-3

Abstract Somatic mutations in cancer genomes are caused by multiple mutational processes, each of which generates a characteristic mutational signature 1 . Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium 2 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), we characterized mutational signatures using 84,729,690 somatic mutations from 4,645 whole-genome and 19,184 exome sequences that encompass most types of cancer. We identified 49 single-base-substitution, 11 doublet-base-substitution, 4 clustered-base-substitution and 17 small insertion-and-deletion signatures. The substantial size of our dataset, compared with previous analyses 3–15 , enabled the discovery of new signatures, the separation of overlapping signatures and the decomposition of signatures into components that may represent associated—but distinct—DNA damage, repair and/or replication mechanisms. By estimating the contribution of each signature to the mutational catalogues of individual cancer genomes, we revealed associations of signatures to exogenous or endogenous exposures, as well as to defective DNA-maintenance processes. However, many signatures are of unknown cause. This analysis provides a systematic perspective on the repertoire of mutational processes that contribute to the development of human cancer.

Causal Inference without Balance Checking: Coarsened Exact Matching
Stefano M. Iacus, Gary King, Giuseppe Porro
2011· Political Analysis3.5Kdoi:10.1093/pan/mpr013

We discuss a method for improving causal inferences called “Coarsened Exact Matching” (CEM), and the new “Monotonic Imbalance Bounding” (MIB) class of matching methods from which CEM is derived. We summarize what is known about CEM and MIB, derive and illustrate several new desirable statistical properties of CEM, and then propose a variety of useful extensions. We show that CEM possesses a wide range of statistical properties not available in most other matching methods but is at the same time exceptionally easy to comprehend and use. We focus on the connection between theoretical properties and practical applications. We also make available easy-to-use open source software for R, Stata , and SPSS that implement all our suggestions.

Pan-cancer analysis of whole genomes
Lauri A. Aaltonen, Federico Abascal, Adam Abeshouse, Hiroyuki Aburatani +4 more
2020· Nature3.3Kdoi:10.1038/s41586-020-1969-6

Abstract Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale 1–3 . Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4–5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter 4 ; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation 5,6 ; analyses timings and patterns of tumour evolution 7 ; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity 8,9 ; and evaluates a range of more-specialized features of cancer genomes 8,10–18 .

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts
Justin Grimmer, Brandon Stewart
2013· Political Analysis3.1Kdoi:10.1093/pan/mps028

Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods—they are no substitute for careful thought and close reading and require extensive and problem-specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.

Quantitative Analysis of Culture Using Millions of Digitized Books
Jean-Baptiste Michel, Yuan Shen, Aviva Presser Aiden, Adrian Veres +4 more
2010· Science3.1Kdoi:10.1126/science.1199644

We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of 'culturomics,' focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.

Guidelines for the use and interpretation of assays for monitoring autophagy (4th edition)<sup>1</sup>
Daniel J. Klionsky, Amal Kamal Abdel‐Aziz, Sara Abdelfatah, Mahmoud Abdellatif +4 more
2021· Autophagy2.6Kdoi:10.1080/15548627.2020.1797280

autophagic responses. Here, we critically discuss current methods of assessing autophagy and the information they can, or cannot, provide. Our ultimate goal is to encourage intellectual and technical innovation in the field.

The Parable of Google Flu: Traps in Big Data Analysis
David Lazer, Ryan Kennedy, Gary King, Alessandro Vespignani
2014· Science2.5Kdoi:10.1126/science.1248506

Large errors in flu prediction were largely avoidable, which offers lessons for the use of big data.

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program
Daniel Taliun, Daniel Harris, Michael D. Kessler, Jedidiah Carlson +4 more
2021· Nature2.3Kdoi:10.1038/s41586-021-03205-y

Abstract The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes) 1 . In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

Fake news on Twitter during the 2016 U.S. presidential election
Nir Grinberg, Kenneth Joseph, Lisa Friedland, Briony Swire‐Thompson +1 more
2019· Science1.8Kdoi:10.1126/science.aau2706

The spread of fake news on social media became a public concern in the United States after the 2016 presidential election. We examined exposure to and sharing of fake news by registered voters on Twitter and found that engagement with fake news sources was extremely concentrated. Only 1% of individuals accounted for 80% of fake news source exposures, and 0.1% accounted for nearly 80% of fake news sources shared. Individuals most likely to engage with fake news sources were conservative leaning, older, and highly engaged with political news. A cluster of fake news sources shared overlapping audiences on the extreme right, but for people across the political spectrum, most political news exposure still came from mainstream media outlets.

AmberTools
David A. Case, Hasan Metin Aktulga, Kellon Belfon, David S. Cerutti +4 more
2023· Journal of Chemical Information and Modeling1.8Kdoi:10.1021/acs.jcim.3c01153

AmberTools is a free and open-source collection of programs used to set up, run, and analyze molecular simulations. The newer features contained within AmberTools23 are briefly described in this Application note.

Sustainable data analysis with Snakemake
Felix Mölder, Kim Philipp Jablonski, Brice Letcher, Michael B. Hall +4 more
2021· F1000Research1.7Kdoi:10.12688/f1000research.29032.2

Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.

Why Propensity Scores Should Not Be Used for Matching
Gary King, Richard A. Nielsen
2019· Political Analysis1.6Kdoi:10.1017/pan.2019.11

We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal—thus increasing imbalance, inefficiency, model dependence, and bias. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which, we show, increases imbalance even relative to the original data. Although these results suggest researchers replace PSM with one of the other available matching methods, propensity scores have other productive uses.

Integrated Modeling Program, Applied Chemical Theory (IMPACT)
Jay L. Banks, Hege S. Beard, Yixiang Cao, Art E. Cho +4 more
2005· Journal of Computational Chemistry1.5Kdoi:10.1002/jcc.20292

We provide an overview of the IMPACT molecular mechanics program with an emphasis on recent developments and a description of its current functionality. With respect to core molecular mechanics technologies we include a status report for the fixed charge and polarizable force fields that can be used with the program and illustrate how the force fields, when used together with new atom typing and parameter assignment modules, have greatly expanded the coverage of organic compounds and medicinally relevant ligands. As we discuss in this review, explicit solvent simulations have been used to guide our design of implicit solvent models based on the generalized Born framework and a novel nonpolar estimator that have recently been incorporated into the program. With IMPACT it is possible to use several different advanced conformational sampling algorithms based on combining features of molecular dynamics and Monte Carlo simulations. The program includes two specialized molecular mechanics modules: Glide, a high-throughput docking program, and QSite, a mixed quantum mechanics/molecular mechanics module. These modules employ the IMPACT infrastructure as a starting point for the construction of the protein model and assignment of molecular mechanics parameters, but have then been developed to meet specialized objectives with respect to sampling and the energy function.

On the promotion of human flourishing
Tyler J. VanderWeele
2017· Proceedings of the National Academy of Sciences1.3Kdoi:10.1073/pnas.1702996114

Many empirical studies throughout the social and biomedical sciences focus only on very narrow outcomes such as income, or a single specific disease state, or a measure of positive affect. Human well-being or flourishing, however, consists in a much broader range of states and outcomes, certainly including mental and physical health, but also encompassing happiness and life satisfaction, meaning and purpose, character and virtue, and close social relationships. The empirical literature from longitudinal, experimental, and quasiexperimental studies is reviewed in attempt to identify major determinants of human flourishing, broadly conceived. Measures of human flourishing are proposed. Discussion is given to the implications of a broader conception of human flourishing, and of the research reviewed, for policy, and for future research in the biomedical and social sciences.

Public Health and Online Misinformation: Challenges and Recommendations
Briony Swire‐Thompson, David Lazer
2019· Annual Review of Public Health1.1Kdoi:10.1146/annurev-publhealth-040119-094127

The internet has become a popular resource to learn about health and to investigate one's own health condition. However, given the large amount of inaccurate information online, people can easily become misinformed. Individuals have always obtained information from outside the formal health care system, so how has the internet changed people's engagement with health information? This review explores how individuals interact with health misinformation online, whether it be through search, user-generated content, or mobile apps. We discuss whether personal access to information is helping or hindering health outcomes and how the perceived trustworthiness of the institutions communicating health has changed over time. To conclude, we propose several constructive strategies for improving the online information ecosystem. Misinformation concerning health has particularly severe consequences with regard to people's quality of life and even their risk of mortality; therefore, understanding it within today's modern context is an extremely important task.

The Liver Tumor Segmentation Benchmark (LiTS)
Patrick Bilic, Patrick Ferdinand Christ, Hongwei Li, Eugene Vorontsov +4 more
2022· Medical Image Analysis1.1Kdoi:10.1016/j.media.2022.102680

In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018. The image dataset is diverse and contains primary and secondary tumors with varied sizes and appearances with various lesion-to-background levels (hyper-/hypo-dense), created in collaboration with seven hospitals and research institutions. Seventy-five submitted liver and liver tumor segmentation algorithms were trained on a set of 131 computed tomography (CT) volumes and were tested on 70 unseen test images acquired from different patients. We found that not a single algorithm performed best for both liver and liver tumors in the three events. The best liver segmentation algorithm achieved a Dice score of 0.963, whereas, for tumor segmentation, the best algorithms achieved Dices scores of 0.674 (ISBI 2017), 0.702 (MICCAI 2017), and 0.739 (MICCAI 2018). Retrospectively, we performed additional analysis on liver tumor detection and revealed that not all top-performing segmentation algorithms worked well for tumor detection. The best liver tumor detection method achieved a lesion-wise recall of 0.458 (ISBI 2017), 0.515 (MICCAI 2017), and 0.554 (MICCAI 2018), indicating the need for further research. LiTS remains an active benchmark and resource for research, e.g., contributing the liver-related segmentation tasks in http://medicaldecathlon.com/. In addition, both data and online evaluation are accessible via https://competitions.codalab.org/competitions/17094.

The evolutionary history of 2,658 cancers
Moritz Gerstung, Clemency Jolly, Ignaty Leshchiner, Stefan C. Dentro +4 more
2020· Nature1.1Kdoi:10.1038/s41586-019-1907-7

Abstract Cancer develops through a process of somatic evolution 1,2 . Sequencing data from a single biopsy represent a snapshot of this process that can reveal the timing of specific genomic aberrations and the changing influence of mutational processes 3 . Here, by whole-genome sequencing analysis of 2,658 cancers as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) 4 , we reconstruct the life history and evolution of mutational processes and driver mutation sequences of 38 types of cancer. Early oncogenesis is characterized by mutations in a constrained set of driver genes, and specific copy number gains, such as trisomy 7 in glioblastoma and isochromosome 17q in medulloblastoma. The mutational spectrum changes significantly throughout tumour evolution in 40% of samples. A nearly fourfold diversification of driver genes and increased genomic instability are features of later stages. Copy number alterations often occur in mitotic crises, and lead to simultaneous gains of chromosomal segments. Timing analyses suggest that driver mutations often precede diagnosis by many years, if not decades. Together, these results determine the evolutionary trajectories of cancer, and highlight opportunities for early cancer detection.

Phenomapping for Novel Classification of Heart Failure With Preserved Ejection Fraction
Sanjiv J. Shah, Daniel H. Katz, Senthil Selvaraj, Michael A. Burke +4 more
2014· Circulation1.1Kdoi:10.1161/circulationaha.114.010637

BACKGROUND: Heart failure with preserved ejection fraction (HFpEF) is a heterogeneous clinical syndrome in need of improved phenotypic classification. We sought to evaluate whether unbiased clustering analysis using dense phenotypic data (phenomapping) could identify phenotypically distinct HFpEF categories. METHODS AND RESULTS: We prospectively studied 397 patients with HFpEF and performed detailed clinical, laboratory, ECG, and echocardiographic phenotyping of the study participants. We used several statistical learning algorithms, including unbiased hierarchical cluster analysis of phenotypic data (67 continuous variables) and penalized model-based clustering, to define and characterize mutually exclusive groups making up a novel classification of HFpEF. All phenomapping analyses were performed by investigators blinded to clinical outcomes, and Cox regression was used to demonstrate the clinical validity of phenomapping. The mean age was 65±12 years; 62% were female; 39% were black; and comorbidities were common. Although all patients met published criteria for the diagnosis of HFpEF, phenomapping analysis classified study participants into 3 distinct groups that differed markedly in clinical characteristics, cardiac structure/function, invasive hemodynamics, and outcomes (eg, phenogroup 3 had an increased risk of HF hospitalization [hazard ratio, 4.2; 95% confidence interval, 2.0-9.1] even after adjustment for traditional risk factors [P<0.001]). The HFpEF phenogroup classification, including its ability to stratify risk, was successfully replicated in a prospective validation cohort (n=107). CONCLUSIONS: Phenomapping results in a novel classification of HFpEF. Statistical learning algorithms applied to dense phenotypic data may allow improved classification of heterogeneous clinical syndromes, with the ultimate goal of defining therapeutically homogeneous patient subclasses.

Knee osteoarthritis has doubled in prevalence since the mid-20th century
Ian J. Wallace, Steven Worthington, David T. Felson, Robert Jurmain +4 more
2017· Proceedings of the National Academy of Sciences1.1Kdoi:10.1073/pnas.1703856114

= 176). OA was diagnosed based on the presence of eburnation (polish from bone-on-bone contact). Overall, knee OA prevalence was found to be 16% among the postindustrial sample but only 6% and 8% among the early industrial and prehistoric samples, respectively. After controlling for age, BMI, and other variables, knee OA prevalence was 2.1-fold higher (95% confidence interval, 1.5-3.1) in the postindustrial sample than in the early industrial sample. Our results indicate that increases in longevity and BMI are insufficient to explain the approximate doubling of knee OA prevalence that has occurred in the United States since the mid-20th century. Knee OA is thus more preventable than is commonly assumed, but prevention will require research on additional independent risk factors that either arose or have become amplified in the postindustrial era.