Computational Improvements Reveal Great Bacterial Diversity and High Metal Toxicity in SoilThe complexity of soil bacterial communities has thus far confounded effective measurement. However, with improved analytical methods, we show that the abundance distribution and total diversity can be deciphered. Reanalysis of reassociation kinetics for bacterial community DNA from pristine and metal-polluted soils showed that a power law best described the abundance distributions. More than one million distinct genomes occurred in the pristine soil, exceeding previous estimates by two orders of magnitude. Metal pollution reduced diversity more than 99.9%, revealing the highly toxic effect of metal contamination, especially for rare taxa.
Qmol: a program for molecular visualization on Windows-based PCsJason Gans, David Shalloway|Journal of Molecular Graphics and Modelling|2001 Community and gene composition of a human dental plaque microbiota obtained by metagenomic sequencingGary Xie, Patrick Chain, Chien‐Chi Lo et al.|Molecular Oral Microbiology|2010 Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance.
A cross-study analysis of drug response prediction in cancer cell linesTo enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data set typically provides an overly optimistic estimate of the prediction performance on independent test sets. To provide a more rigorous assessment of model generalizability between different studies, we use machine learning to analyze five publicly available cell line-based data sets: National Cancer Institute 60, ancer Therapeutics Response Portal (CTRP), Genomics of Drug Sensitivity in Cancer, Cancer Cell Line Encyclopedia and Genentech Cell Line Screening Initiative (gCSI). Based on observed experimental variability across studies, we explore estimates of prediction upper bounds. We report performance results of a variety of machine learning models, with a multitasking deep neural network achieving the best cross-study generalizability. By multiple measures, models trained on CTRP yield the most accurate predictions on the remaining testing data, and gCSI is the most predictable among the cell line data sets included in this study. With these experiments and further simulations on partial data, two lessons emerge: (1) differences in viability assays can limit model generalizability across studies and (2) drug diversity, more than tumor diversity, is crucial for raising model generalizability in preclinical screening.
Shadow mass and the relationship between velocity and momentum in symplectic numerical integrationJason Gans, David Shalloway|Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics|2000 It is often assumed, when interpreting the discrete trajectory computed by a symplectic numerical integrator of Hamilton's equations in Cartesian coordinates, that velocity is equal to the momentum divided by the physical mass. However, the "shadow Hamiltonian" which is almost exactly solved by the symplectic integrator will, in general, induce a nonlinear relationship between velocity and momentum. For the (symplectic) momentum- and midpoint-momentum-Verlet algorithms, the "shadow mass" that relates velocity and momentum is momentum independent only for a quadratic potential and, even in this case, differs from the physical mass. Thus, naively assuming the standard velocity-momentum relationship leads to inconsistencies and unnecessarily inaccurate estimates of velocity-dependent quantities. As examples, we calculate the shadow Hamiltonians for the momentum- and midpoint-momentum-Verlet solutions of the multidimensional harmonic oscillator, and show how their velocity-momentum relationships depend on the time step. Of practical importance is the conclusion that, to gain the full advantage of symplecticity, velocities derived from interpolated positions, rather than conventional velocity-Verlet velocities, should be used to compute physical properties.