Markov Chain Monte Carlo in PracticeIn a family study of breast cancer, epidemiologists in Southern California increase the power for detecting a gene-environment interaction. In Gambia, a study helps a vaccination program reduce the incidence of Hepatitis B carriage. Archaeologists in Austria place a Bronze Age site in its true temporal location on the calendar scale. And in France,
Weak convergence and optimal scaling of random walk Metropolis algorithmsThis paper considers the problem of scaling the proposal distribution of a multidimensional random walk Metropolis algorithm in order to maximize the efficiency of the algorithm. The main result is a weak convergence result as the dimension of a sequence of target densities, n, converges to $\infty$. When the proposal variance is appropriately scaled according to n, the sequence of stochastic processes formed by the first component of each Markov chain converges to the appropriate limiting Langevin diffusion process. The limiting diffusion approximation admits a straightforward efficiency maximization problem, and the resulting asymptotically optimal policy is related to the asymptotic acceptance rate of proposed moves for the algorithm. The asymptotically optimal acceptance rate is 0.234 under quite general conditions. The main result is proved in the case where the target density has a symmetric product form. Extensions of the result are discussed.
Adaptive Rejection Sampling for Gibbs SamplingWalter R. Gilks, Pascal Wild|Journal of the Royal Statistical Society Series C (Applied Statistics)|1992 We propose a method for rejection sampling from any univariate log‐concave probability density function. The method is adaptive: As sampling proceeds, the rejection envelope and the squeezing function converge to the density function. The rejection envelope and squeezing function are piece‐wise exponential functions, the rejection envelope touching the density at previously sampled points, and the squeezing function forming arcs between those points of contact. The technique is intended for situations where evaluation of the density is computationally expensive, in particular for applications of Gibbs sampling to Bayesian models with non‐conjugacy. We apply the technique to a Gibbs sampling analysis of monoclonal antibody reactivity.
Highly Conserved Non-Coding Sequences Are Associated with Vertebrate DevelopmentIn addition to protein coding sequence, the human genome contains a significant amount of regulatory DNA, the identification of which is proving somewhat recalcitrant to both in silico and functional methods. An approach that has been used with some success is comparative sequence analysis, whereby equivalent genomic regions from different organisms are compared in order to identify both similarities and differences. In general, similarities in sequence between highly divergent organisms imply functional constraint. We have used a whole-genome comparison between humans and the pufferfish, Fugu rubripes, to identify nearly 1,400 highly conserved non-coding sequences. Given the evolutionary divergence between these species, it is likely that these sequences are found in, and furthermore are essential to, all vertebrates. Most, and possibly all, of these sequences are located in and around genes that act as developmental regulators. Some of these sequences are over 90% identical across more than 500 bases, being more highly conserved than coding sequence between these two species. Despite this, we cannot find any similar sequences in invertebrate genomes. In order to begin to functionally test this set of sequences, we have used a rapid in vivo assay system using zebrafish embryos that allows tissue-specific enhancer activity to be identified. Functional data is presented for highly conserved non-coding sequences associated with four unrelated developmental regulators (SOX21, PAX6, HLXB9, and SHH), in order to demonstrate the suitability of this screen to a wide range of genes and expression patterns. Of 25 sequence elements tested around these four genes, 23 show significant enhancer activity in one or more tissues. We have identified a set of non-coding sequences that are highly conserved throughout vertebrates. They are found in clusters across the human genome, principally around genes that are implicated in the regulation of development, including many transcription factors. These highly conserved non-coding sequences are likely to form part of the genomic circuitry that uniquely defines vertebrate development.
Efficient Metropolis Jumping RulesAbstract The algorithm of Metropolis et al. (1953) and its generalizations have been increasingly popular in computational physics and, more recently, statistics, for sampling from intractable multivariate distributions. Much recent research has been devoted to increasing the efficiency of simulation algorithms by altering the jumping rules for Metropolis-like algorithms. We study a very specific question: What are the most efficient symmetric jumping kernels for simulating a normal target distribution using the Metropolis algorithmã We provide a general theoretical result as the dimension of a class of canonical problems goes to ∞ and numerical approximations and simulations for low-dimensional Gaussian target distributions that show that the limiting results provide extremely accurate approximations in six and higher dimensions.