Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizationsMei‐Ling Ting Lee, Frank C. Kuo, G. À. Whitmore et al.|Proceedings of the National Academy of Sciences|2000 We present statistical methods for analyzing replicated cDNA microarray expression data and report the results of a controlled experiment. The study was conducted to investigate inherent variability in gene expression data and the extent to which replication in an experiment produces more consistent and reliable findings. We introduce a statistical model to describe the probability that mRNA is contained in the target sample tissue, converted to probe, and ultimately detected on the slide. We also introduce a method to analyze the combined data from all replicates. Of the 288 genes considered in this controlled experiment, 32 would be expected to produce strong hybridization signals because of the known presence of repetitive sequences within them. Results based on individual replicates, however, show that there are 55, 36, and 58 highly expressed genes in replicates 1, 2, and 3, respectively. On the other hand, an analysis by using the combined data from all 3 replicates reveals that only 2 of the 288 genes are incorrectly classified as expressed. Our experiment shows that any single microarray output is subject to substantial variability. By pooling data from replicates, we can provide a more reliable analysis of gene expression data. Therefore, we conclude that designing experiments with replications will greatly reduce misclassification rates. We recommend that at least three replicates be used in designing experiments by using cDNA microarrays, particularly when gene expression data from single specimens are being analyzed.
Third-Degree Stochastic DominanceG. À. Whitmore|American Economic Review|2016 Here F(x) and G(x) are less-than cumulative probability distributionis where x is a continuous or discrete random variable representing the outcome of a prospect. The closed interval [a, b] is the sample space of both prospects. The integral shown in Rule 2 and those shown throughout the paper are Stieltjes integrals. Recall that the Stieltjes integral fb f(x)dg(x) exists if one of the functions f and g is continuous and the other has finite variation in [a, b]. Let D1, D2, and D3 be three sets of utility functions ?(x). D1 is the set containing all utility functions with 4(x) and +1(x) continuous, and 41(x) >0 for all xE[a, b]. D2 is the set with ?(x), ?1(x), ?2(x) continuous, and q$j(x)>0, 02(x)?O for all xC[a, b]. D3 is the set with ?(x), ?1(x), ?2(X), ?3(X) continuous, and +1(x) > 04 2(x) O O for all xC[a, b]. Here +1(x) denotes the ith derivative of +(x). Hadar and Russell proved that Rule 1 is valid for all ,CD1 and Rutle 2 is valid for all ED2. The authors point out that the set of probability distributions that can be ordered by means of second-degree stochastic dominance is, in general, larger than that which can be ordered by means of first-degree stochastic dominance. Note that in Rule 2, they assume that +(x) is not only an increasing function of x but also exhibits weak global risk aversion, a condition guaranteed by requiring the second derivative of ?(x) to be nonpositive. In this paper, a condition which will be called third-degree stochastic dominance is considered. It is based on the following assumption about the form of the utility function ?(x). From a normative point of view, one expects the risk premium associated with an uncertain prospect to become smaller the greater is the individual's wealth. The plausibility and implications of this assumption h'ave been explored by John Pratt, as well as others. The risk premium of an uncertain prospect is that amount by which the certainty equivalent of the prospect differs from its expected value. In mathematical terms, given the prospect F(x) with expected value A, the corresponding risk premium -t is obtained by solving the following equation. rb
Threshold Regression for Survival Analysis: Modeling Event Times by a Stochastic Process Reaching a BoundaryMany researchers have investigated first hitting times as models for survival data. First hitting times arise naturally in many types of stochastic processes, ranging from Wiener processes to Markov chains. In a survival context, the state of the underlying process represents the strength of an item or the health of an individual. The item fails or the individual experiences a clinical endpoint when the process reaches an adverse threshold state for the first time. The time scale can be calendar time or some other operational measure of degradation or disease progression. In many applications, the process is latent (i.e., unobservable). Threshold regression refers to first-hitting-time models with regression structures that accommodate covariate data. The parameters of the process, threshold state and time scale may depend on the covariates. This paper reviews aspects of this topic and discusses fruitful avenues for future research.