Yun Fu

Statistical tests of neutrality of mutations.

Yun Fu, Wen‐Hsiung Li|Genetics|1993

Cited by 4.1kOpen Access

Mutations in the genealogy of the sequences in a random sample from a population can be classified as external and internal. External mutations are mutations that occurred in the external branches and internal mutations are mutations that occurred in the internal branches of the genealogy. Under the assumption of selective neutrality, the expected number of external mutations is equal to theta = 4Ne mu, where Ne is the effective population size and mu is the rate of mutation per gene per generation. Interestingly, this expectation is independent of the sample size. The number of external mutations is likely to deviate from its neutral expectation when there is selection while the number of internal mutations is less affected by the presence of selection. Statistical properties of the numbers of external mutations and of internal mutations are studied and their relationships to two commonly used estimates of theta are derived. From these properties, several new statistical tests based on a random sample of DNA sequences from the population are developed for testing the hypothesis that all mutations at a locus are neutral.

Maximum likelihood estimation of population parameters.

Yun Fu, Wen‐Hsiung Li|Genetics|1993

Cited by 154Open Access

One of the most important parameters in population genetics is theta = 4Ne mu where Ne is the effective population size and mu is the rate of mutation per gene per generation. We study two related problems, using the maximum likelihood method and the theory of coalescence. One problem is the potential improvement of accuracy in estimating the parameter theta over existing methods and the other is the estimation of parameter lambda which is the ratio of two theta's. The minimum variances of estimates of the parameter theta are derived under two idealized situations. These minimum variances serve as the lower bounds of the variances of all possible estimates of theta in practice. We then show that Watterson's estimate of theta based on the number of segregating sites is asymptotically an optimal estimate of theta. However, for a finite sample of sequences, substantial improvement over Watterson's estimate is possible when theta is large. The maximum likelihood estimate of lambda = theta 1/theta 2 is obtained and the properties of the estimate are discussed.

Global Patterns of Human DNA Sequence Variation in a 10-kb Region on Chromosome 1

Yu Ning, Zhongming Zhao, Yun Fu et al.|Molecular Biology and Evolution|2001

Cited by 151Open Access

Human DNA variation is currently a subject of intense research because of its importance for studying human origins, evolution, and demographic history and for association studies of complex diseases. A approximately 10-kb region on chromosome 1, which contains only four small exons (each <155 bp), was sequenced for 61 humans (20 Africans, 20 Asians, and 21 Europeans) and for 1 chimpanzee, 1 gorilla, and 1 orangutan. We found 52 polymorphic sites among the 122 human sequences and 382 variant sites among the human, chimpanzee, gorilla, and orangutan sequences. For the introns sequenced (8,991 bp), the nucleotide diversity (pi) was 0.058% among all sequences, 0.076% among the African sequences, 0.047% among the Asian sequences, and 0.045% among the European sequences. A compilation of data revealed that autosomal regions have, on average, the highest pi value (0.091%), X-linked regions have a somewhat lower pi value (0.079%), and Y-linked regions have a very low pi value (0.008%). The lower polymorphism in the present region may be due to a lower mutation rate and/or selection in the gene containing these introns or in genes linked to this region. The present region and two other 10-kb noncoding regions all show a strong excess of low-frequency variants, indicating a relatively recent population expansion. This region has a low mutation rate, which was estimated to be 0.74 x 10 per nucleotide per year. An average estimate of approximately 12,600 for the long-term effective population size was obtained using various methods; the estimate was not far from the commonly used value of 10,000. Fu and Li's tests rejected the assumption of an equilibrium neutral Wright-Fisher population, largely owing to the high proportion of low-frequency variants. The age of the most recent common ancestor of the sequences in our sample was estimated to be more than 1 Myr. Allowing for some unrealistic assumptions in the model, this estimate would still suggest an age of more than 500,000 years, providing further evidence for a genetic history of humans much more ancient than the emergence of modern humans. The fact that many unique variants exist in Europe and Asia also suggests a fairly long genetic history outside of Africa and argues against a complete replacement of all indigenous populations in Europe and Asia by a small Africa stock. Moreover, the ancient genetic history of humans indicates no severe bottleneck during the evolution of humans in the last half million years; otherwise, much of the ancient genetic history would have been lost during a severe bottleneck. We suggest that both the "Out of Africa" and the multiregional models are too simple to explain the evolution of modern humans.

Estimating effective population size or mutation rate using the frequencies of mutations of various classes in a sample of DNA sequences.

Yun Fu|Genetics|1994

Cited by 118Open Access

Mutations resulting in segregating sites of a sample of DNA sequences can be classified by size and type and the frequencies of mutations of different sizes and types can be inferred from the sample. A framework for estimating the essential parameter theta = 4Nu utilizing the frequencies of mutations of various sizes and types is developed in this paper, where N is the effective size of a population and mu is mutation rate per sequence per generation. The framework is a combination of coalescent theory, general linear model and Monte-Carlo integration, which leads to two new estimators theta xi and theta eta as well as a general Watterson's estimator theta K and a general Tajima's estimator theta tau. The greatest strength of the framework is that it can be used under a variety of population models. The properties of the framework and the four estimators theta K, theta tau, theta xi and theta eta are investigated under three important population models: the neutral Wright-Fisher model, the neutral model with recombination and the neutral Wright's finite-islands model. Under all these models, it is shown that theta xi is the best estimator among the four even when recombination rate or migration rate has to be estimated. Under the neutral Wright-Fisher model, it is shown that the new estimator theta xi has a variance close to a lower bound of variances of all unbiased estimators of theta which suggests that theta xi is a very efficient estimator.

A phylogenetic estimator of effective population size or mutation rate.

Yun Fu|Genetics|1994

Cited by 101Open Access

A new estimator of the essential parameter theta = 4Ne mu from DNA polymorphism data is developed under the neutral Wright-Fisher model without recombination and population subdivision, where Ne is the effective population size and mu is the mutation rate per locus per generation. The new estimator has a variance only slightly larger than the minimum variance of all possible unbiased estimators of the parameter and is substantially smaller than that of any existing estimator. The high efficiency of the new estimator is achieved by making full use of phylogenetic information in a sample of DNA sequences from a population. An example of estimating theta by the new method is presented using the mitochondrial sequences from an American Indian population.

Is this you? Claim your profile.

Top publicationsby citations