University of California, Los Angeles
ORCID: 0000-0002-2373-3691Publishes on Genetic Associations and Epidemiology, Genetic Mapping and Diversity in Plants and Animals, Epigenetics and DNA Methylation. 277 papers and 39k citations.
Add your photo, update your bio, and get notified when your ranking changes.
Anonymity Compromised The balance between maintaining individual privacy and sharing genomic information for research purposes has been a topic of considerable controversy. Gymrek et al. (p. 321 ; see the Policy Forum by Rodriguez et al. ) demonstrate that the anonymity of participants (and their families) can be compromised by analyzing Y-chromosome sequences from public genetic genealogy Web sites that contain (sometimes distant) relatives with the same surname. Short tandem repeats (STRs) on the Y chromosome of a target individual (whose sequence was freely available and identified in GenBank) were compared with information in public genealogy Web sites to determine the shortest time to the most recent common ancestor and find the most likely surname, which, when combined with age and state of residency identified the individual. When STRs from 911 individuals were used as the starting points, the analysis projected a success rate of 12% within the U.S. male population with Caucasian ancestry. Further analysis of detailed pedigrees from one collection revealed that families of individuals whose genomes are in public repositories could be identified with high probability.
Individual differences in DNA sequence are the genetic basis of human variability. We have characterized whole-genome patterns of common human DNA variation by genotyping 1,586,383 single-nucleotide polymorphisms (SNPs) in 71 Americans of European, African, and Asian ancestry. Our results indicate that these SNPs capture most common genetic variation as a result of linkage disequilibrium, the correlation among common SNP alleles. We observe a strong correlation between extended regions of linkage disequilibrium and functional genomic elements. Our data provide a tool for exploring many questions that remain regarding the causal role of common human DNA variation in complex human traits and for investigating the nature of genetic variation within and between human populations.
Reachability and distance queries in graphs are fundamental to numerous applications, ranging from geographic navigation systems to Internet routing. Some of these applications involve huge graphs and yet require fast query answering. We propose a new data structure for representing all distances in a graph. The data structure is distributed in the sense that it may be viewed as assigning labels to the vertices, such that a query involving vertices u and v may be answered using only the labels of u and v. Our labels are based on 2-hop covers of the shortest paths, or of all paths, in a graph. For shortest paths, such a cover is a collection S of shortest paths such that, for every two vertices u and v, there is a shortest path from u to v that is a concatenation of two paths from S. We describe an efficient algorithm for finding an almost optimal 2-hop cover of a given collection of paths. Our approach is general and can be applied to directed or undirected graphs, exact or approximate shortest paths, or to reachability queries. We study the proposed data structure using a combination of theoretical and experimental means. We implemented our algorithm and checked the size of the resulting data structure on several real-life networks from different application areas. Our experiments show that the total size of the labels is typically not much larger than the network itself, and is usually considerably smaller than an explicit representation of the transitive closure of the network.