Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema

Peter J. Castaldi(Brigham and Women's Hospital), Jennifer Dy(Northeastern University), James C. Ross(Brigham and Women's Hospital), Yale Chang(Northeastern University), George R. Washko(Harvard University), Douglas Curran‐Everett(National Jewish Health), André Williams(National Jewish Health), David A. Lynch(University of Colorado Denver), Barry J. Make(National Jewish Health), James D. Crapo(University of Colorado Denver), R.P. Bowler(National Jewish Health), Elizabeth A. Regan(National Jewish Health), John E. Hokanson(University of Colorado Denver), Gregory L. Kinney(Colorado School of Public Health), MeiLan K. Han(Michigan Medicine), Xavier Soler(University of California, San Diego), Joseph W Ramsdell(University of California, San Diego), R. Graham Barr(Columbia University Irving Medical Center), Marilyn G. Foreman(Morehouse School of Medicine), Edwin J.R. van Beek(The Queen's Medical Research Institute), Richard Casaburi(UCLA Medical Center), G.J. Criner(Temple University), Sharon M. Lutz(University of Colorado Denver), S. I. Rennard(University of Colorado Anschutz Medical Campus), Stephanie A. Santorico(University of Colorado Denver), Frank C. Sciurba(University of Pittsburgh), Dawn L. DeMeo(Brigham and Women's Hospital), Craig P. Hersh(Harvard University), Edwin K. Silverman(Brigham and Women's Hospital), Michael H. Cho(Harvard University)
Thorax
February 21, 2014
Cited by 205Open Access
Full Text

Abstract

BACKGROUND: There is notable heterogeneity in the clinical presentation of patients with COPD. To characterise this heterogeneity, we sought to identify subgroups of smokers by applying cluster analysis to data from the COPDGene study. METHODS: We applied a clustering method, k-means, to data from 10 192 smokers in the COPDGene study. After splitting the sample into a training and validation set, we evaluated three sets of input features across a range of k (user-specified number of clusters). Stable solutions were tested for association with four COPD-related measures and five genetic variants previously associated with COPD at genome-wide significance. The results were confirmed in the validation set. FINDINGS: We identified four clusters that can be characterised as (1) relatively resistant smokers (ie, no/mild obstruction and minimal emphysema despite heavy smoking), (2) mild upper zone emphysema-predominant, (3) airway disease-predominant and (4) severe emphysema. All clusters are strongly associated with COPD-related clinical characteristics, including exacerbations and dyspnoea (p<0.001). We found strong genetic associations between the mild upper zone emphysema group and rs1980057 near HHIP, and between the severe emphysema group and rs8034191 in the chromosome 15q region (p<0.001). All significant associations were replicated at p<0.05 in the validation sample (12/12 associations with clinical measures and 2/2 genetic associations). INTERPRETATION: Cluster analysis identifies four subgroups of smokers that show robust associations with clinical characteristics of COPD and known COPD-associated genetic variants.


Related Papers

No related papers found

Powered by citation graph analysis