Y

Yuedong Wang

Wenzhou Medical University

ORCID: 0000-0001-9202-6723

Publishes on Dialysis and Renal Disease Management, Statistical Methods and Inference, Statistical Methods and Bayesian Inference. 186 papers and 3.7k citations.

186Publications
3.7kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Smoothing Splines
Yuedong Wang|Unknown|2011
Cited by 260

A general class of powerful and flexible modeling techniques, spline smoothing has attracted a great deal of research attention in recent years and has been widely used in many application areas, from medicine to economics. Smoothing Splines: Methods and Applications covers basic smoothing spline models, including polynomial, periodic, spherical, t

Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy : the 1994 Neyman Memorial Lecture
Grace Wahba, Yuedong Wang, Chong Gu et al.|The Annals of Statistics|1995
Cited by 237Open Access

Let $y_i, i = 1, \dots, n$, be independent observations with the density of $y_i$ of the form $h(y_i, f_i) = \exp{y_i f_i - b(f_i) + c(y_i)]$, where b and c are given functions and b is twice continuously differentiable and bounded away from 0. Let $f_i = f(t(i))$, where $t = (t_1, \dots, t_d) \epsilon \mathsf{T}^{(1)} \otimes \dots \otimes \mathsf{T}^{(d)} = \mathsf{T}$, the $\mathsf{T}^{(\alpha)}$ are measurable spaces of rather general form and f is an unknown function on $\mathsf{T}$ with some assumed "smoothness" properties. Given ${y_i, t(i), i = 1, \dots, n}$, it is desired to estimate $f(t)$ for t in some region of interest contained in $\mathsf{T}$. We develop the fitting of smoothing spline ANOVA models to this data of the form $f(t) = C + \sum_{\alpha} f_{\alpha}(t_{\alpha}) + \sum_{\alpha < \beta} f_{\alpha \beta} (t_{\alpha}, t_{\beta}) + \dots$. The components of the decomposition satisfy side conditions which generalize the usual side conditions for parametric ANOVA. The estimate of f is obtained as the minimizer, in an appropriate function space, of $\mathsf{L}(y, f) + \sum_{\alpha} \lambda_{\alpha} J_{\alpha}(f_{\alpha}) + \sum_{\alpha <\beta} \lambda_{\alpha \beta} J_{\alpha \beta}(f_{\alpha \beta}) + \dots$, where $\mathsf{L}(y, f)$ is the negative log likelihood of $y = (y_1, \dots, y_n)'$ given f, the $J_{\alpha}, J_{\alpha \beta}, \dots$ are quadratic penalty functionals and the ANOVA decomposition is terminated in some manner. There are five major parts required to turn this program into a practical data analysis tool: (1) methods for deciding which terms in the ANOVA decomposition to include (model selection), (2) methods for choosing good values of the smoothing parameters $\lambda_{\alpha}, \lambda_{\alpha \beta}, \dots$, (3) methods for making confidence statements concerning the estimate, (4) numerical algorithms for the calculations and, finally, (5) public software. In this paper we carry out this program, relying on earlier work and filling in important gaps. The overall scheme is applied to Bernoulli data from the Wisconsin Epidemiologic Study of Diabetic Retinopathy to model the risk of progression of diabetic retinopathy as a function of glycosylated hemoglobin, duration of diabetes and body mass index. It is believed that the results have wide practical application to the analysis of data from large epidemiological studies.

Smoothing Spline Models with Correlated Random Errors
Yuedong Wang|Journal of the American Statistical Association|1998
Cited by 234

Abstract Spline-smoothing techniques are commonly used to estimate the mean function in a nonparametric regression model. Their performances depend greatly on the choice of smoothing parameters. Many methods of selecting smoothing parameters such as generalized maximum likelihood (GML), generalized cross-validation (GCV), and unbiased risk (UBR), have been developed under the assumption of independent observations. They tend to underestimate smoothing parameters when data are correlated. In this article, I assume that observations are correlated and that the correlation matrix depends on a parsimonious set of parameters. I extend the GML, GCV, and UBR methods to estimate the smoothing parameters and the correlation parameters simultaneously. I also relate a smoothing spline model to three mixed-effects models. These relationships show that the smoothing spline estimates evaluated at design points are best linear unbiased prediction (BLUP) estimates and that the GML estimates of the smoothing parameters and the correlation parameters are restricted maximum likelihood (REML) estimates. They also provide a way to fit a spline model with correlated errors using the SAS procedure proc mixed. Simulations are conducted to evaluate and compare the performance of the GML, GCV, UBR methods and the method proposed by Diggle and Hutchinson. The GML method is recommended, because it is stable and works well in all simulations. It performs better than other methods, especially when the sample size is not large. I illustrate my methods with applications to time series data and to spatial data.

Transcriptional Characterizations of Differences between Eutopic and Ectopic Endometrium
Yan Wu, Andre Kajdacsy‐Balla, Estil Strawn et al.|Endocrinology|2005
Cited by 207Open Access

Endometriosis, defined as the presence of endometrial glandular and stromal cells outside the uterine cavity, is a common gynecological disease with poorly understood pathogenesis. Using laser capture microdissection and a cDNA microarray with 9600 genes/expressed sequence tags (ESTs), we have conducted a comprehensive profiling of gene expression differences between the ectopic and eutopic endometrium taken from 12 women with endometriosis adjusted for menstrual phase and the location of the lesions. With dye-swapping and replicated arrays, we found 904 genes/ESTs that are differentially expressed. We validated the gene expression using real-time RT-PCR. We found that the expression patterns of these genes/ESTs correctly classified the 12 patients into ovarian and nonovarian endometriosis. We identified gene clusters that are location-specific. In addition, we identified several biological themes using Expression Analysis Systematic Explorer. Finally, we identified 79 pathways with over 100 genes with known functions, which include oxidative stress, focal adhesion, Wnt signaling, and MAPK signaling. The identification of these genes and their associated pathways provides new insight. Our findings will stimulate future investigations on molecular genetic mechanisms underlying the pathogenesis of endometriosis.

Mixed Effects Smoothing Spline Analysis of Variance
Yuedong Wang|Journal of the Royal Statistical Society Series B (Statistical Methodology)|1998
Cited by 193Open Access

Summary We propose a general family of nonparametric mixed effects models. Smoothing splines are used to model the fixed effects and are estimated by maximizing the penalized likelihood function. The random effects are generic and are modelled parametrically by assuming that the covariance function depends on a parsimonious set of parameters. These parameters and the smoothing parameter are estimated simultaneously by the generalized maximum likelihood method. We derive a connection between a nonparametric mixed effects model and a linear mixed effects model. This connection suggests a way of fitting a nonparametric mixed effects model by using existing programs. The classical two-way mixed models and growth curve models are used as examples to demonstrate how to use smoothing spline analysis-of-variance decompositions to build nonparametric mixed effects models. Similarly to the classical analysis of variance, components of these nonparametric mixed effects models can be interpreted as main effects and interactions. The penalized likelihood estimates of the fixed effects in a two-way mixed model are extensions of James–Stein shrinkage estimates to correlated observations. In an example three nested nonparametric mixed effects models are fitted to a longitudinal data set.