C

Chengkuan Chen

Broad Institute

Publishes on AI in cancer detection, Radiomics and Machine Learning in Medical Imaging, Cancer Genomics and Diagnostics. 14 papers and 1.7k citations.

14Publications
1.7kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
Richard J. Chen, Chengkuan Chen, Yicong Li et al.|2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)|2022
Cited by 504

Vision Transformers (ViTs) and their multi-scale and hierarchical variations have been successful at capturing image representations but their use has been generally studied for low-resolution images (e.g. 256 × 256, 384 × 384). For gigapixel whole-slide imaging (WSI) in computational pathology, WSIs can be as large as 150000 × 150000 pixels at 20 × magnification and exhibit a hierarchical structure of visual tokens across varying resolutions: from 16 × 16 images capturing individual cells, to 4096 × 4096 images characterizing interactions within the tissue microenvironment. We introduce a new ViT architecture called the Hierarchical Image Pyramid Transformer (HIPT), which leverages the natural hierarchical structure inherent in WSIs using two levels of self-supervised learning to learn high-resolution image representations. HIPT is pretrained across 33 cancer types using 10,678 gigapixel WSIs, 408,218 4096 × 4096 images, and 104M 256 × 256 images. We benchmark HIPT representations on 9 slide-level tasks, and demonstrate that: 1) HIPT with hierarchical pretraining outperforms current state-of-the-art methods for cancer subtyping and survival prediction, 2) self-supervised ViTs are able to model important inductive biases about the hierarchical structure of phenotypes in the tumor microenvironment.

A multimodal generative AI copilot for human pathology
Cited by 345Open Access

Abstract Computational pathology 1,2 has witnessed considerable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders 3,4 . However, despite the explosive growth of generative artificial intelligence (AI), there have been few studies on building general-purpose multimodal AI assistants and copilots 5 tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology. We built PathChat by adapting a foundational vision encoder for pathology, combining it with a pretrained large language model and fine-tuning the whole system on over 456,000 diverse visual-language instructions consisting of 999,202 question and answer turns. We compare PathChat with several multimodal vision-language AI assistants and GPT-4V, which powers the commercially available multimodal general-purpose AI assistant ChatGPT-4 (ref. 6 ). PathChat achieved state-of-the-art performance on multiple-choice diagnostic questions from cases with diverse tissue origins and disease models. Furthermore, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive vision-language AI copilot that can flexibly handle both visual and natural language inputs, PathChat may potentially find impactful applications in pathology education, research and human-in-the-loop clinical decision-making.

Fast and scalable search of whole-slide images via self-supervised deep learning
Chengkuan Chen, Ming Y. Lu, Drew F. K. Williamson et al.|Nature Biomedical Engineering|2022
Cited by 114Open Access

The adoption of digital pathology has enabled the curation of large repositories of gigapixel whole-slide images (WSIs). Computationally identifying WSIs with similar morphologic features within large repositories without requiring supervised training can have significant applications. However, the retrieval speeds of algorithms for searching similar WSIs often scale with the repository size, which limits their clinical and research potential. Here we show that self-supervised deep learning can be leveraged to search for and retrieve WSIs at speeds that are independent of repository size. The algorithm, which we named SISH (for self-supervised image search for histology) and provide as an open-source package, requires only slide-level annotations for training, encodes WSIs into meaningful discrete latent representations and leverages a tree data structure for fast searching followed by an uncertainty-based ranking algorithm for WSI retrieval. We evaluated SISH on multiple tasks (including retrieval tasks based on tissue-patch queries) and on datasets spanning over 22,000 patient cases and 56 disease subtypes. SISH can also be used to aid the diagnosis of rare cancer types for which the number of available WSIs is often insufficient to train supervised deep-learning models.

A multimodal whole-slide foundation model for pathology
Tong Ding, Sophia J. Wagner, Andrew H. Song et al.|Nature Medicine|2025
Cited by 56Open Access

The field of computational pathology has been transformed with recent advances in foundation models that encode histopathology region-of-interests (ROIs) into versatile and transferable feature representations via self-supervised learning. However, translating these advancements to address complex clinical challenges at the patient and slide level remains constrained by limited clinical data in disease-specific cohorts, especially for rare clinical conditions. We propose Transformer-based pathology Image and Text Alignment Network (TITAN), a multimodal whole-slide foundation model pretrained using 335,645 whole-slide images via visual self-supervised learning and vision-language alignment with corresponding pathology reports and 423,122 synthetic captions generated from a multimodal generative AI copilot for pathology. Without any fine-tuning or requiring clinical labels, TITAN can extract general-purpose slide representations and generate pathology reports that generalize to resource-limited clinical scenarios such as rare disease retrieval and cancer prognosis. We evaluate TITAN on diverse clinical tasks and find that it outperforms both ROI and slide foundation models across machine learning settings, including linear probing, few-shot and zero-shot classification, rare cancer retrieval, cross-modal retrieval and pathology report generation. Pretrained using 335,645 whole-slide images, a foundation model is developed to provide representations for slide- and patient-level tasks. It is capable of performing clinical tasks and generating reports even in data-scarce scenarios, such as rare cancer diagnosis and survival prediction, without requiring further fine-tuning.