Generative Prompt Controlled Diffusion for weakly supervised semantic segmentation
Abstract
Weakly supervised semantic segmentation (WSSS), aiming to train segmentation models solely using image-level labels, has received significant attention. Existing approaches mainly concentrate on creating high-quality pseudo labels by utilizing existing images and their corresponding image-level labels. However, a major challenge arises when the available dataset is limited, as the quality of pseudo labels degrades significantly. In this paper, we tackle this challenge from a different perspective by introducing a novel approach called Generative Prompt Controlled Diffusion (GPCD) for data augmentation . This approach enhances the current labeled datasets by augmenting them with a variety of images, achieved through controlled diffusion guided by Generative Pre-trained Transformer (GPT) prompts. In this process, the existing images and image-level labels provide the necessary control information , while GPT enriches the prompts to generate diverse backgrounds. Moreover, we make an original contribution by integrating data source information as tokens into the Vision Transformer (ViT) framework, which improves the ability of downstream WSSS models to recognize the origins of augmented images. Our proposed GPCD approach clearly surpasses existing state-of-the-art methods, with its advantages being more pronounced when the available data is scarce, thereby demonstrating the effectiveness of our method. Our source code will be released. • We first utilize conditional diffusion with GPT prompts to generate diverse images for augmenting WSSS with image-level labels. • We introduce an image selection strategy to retain high-quality data and filter out noisy images, preventing negative training impact. • We propose data source tokens to distinguish original and generated images, enriching ViT tokens for better adaptation. • Our framework outperforms SOTA, achieving a 5.3% segmentation improvement on PASCAL VOC 2012 with 5% training data.
Related Papers
No related papers found
Powered by citation graph analysis