Jacob Pfau

Artificial Intelligence in Dermatology: A Primer

Albert T. Young, Mulin Xiong, Jacob Pfau et al.|Journal of Investigative Dermatology|2020

Cited by 236Open Access

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Stephen Casper, Xander Davies, Claudia Shi et al.|arXiv (Cornell University)|2023

Cited by 89Open Access

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems.

Stress testing reveals gaps in clinic readiness of image-based diagnostic artificial intelligence models

Albert T. Young, Kristen Fernandez, Jacob Pfau et al.|npj Digital Medicine|2021

Cited by 51Open Access

Artificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational "stress tests". Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5-22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness.

Goal Misgeneralization in Deep Reinforcement Learning

Lauro Langosco, Jack C. Koch, Lee Sharkey et al.|arXiv (Cornell University)|2021

Cited by 22Open Access

We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL). Goal misgeneralization failures occur when an RL agent retains its capabilities out-of-distribution yet pursues the wrong goal. For instance, an agent might continue to competently avoid obstacles, but navigate to the wrong place. In contrast, previous works have typically focused on capability generalization failures, where an agent fails to do anything sensible at test time. We formalize this distinction between capability and goal generalization, provide the first empirical demonstrations of goal misgeneralization, and present a partial characterization of its causes.

Artificial Intelligence in Teledermatology

Mulin Xiong, Jacob Pfau, Albert T. Young et al.|Current Dermatology Reports|2019

Cited by 18

Is this you? Claim your profile.

Top publicationsby citations