T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos

Kai Kang(Chinese University of Hong Kong), Hongsheng Li(Chinese University of Hong Kong), Junjie Yan(Group Sense (China)), Xingyu Zeng(Group Sense (China)), Bin Yang(University of Toronto), Tong Xiao(Chinese University of Hong Kong), Cong Zhang(Shanghai Jiao Tong University), Zhe Wang(Chinese University of Hong Kong), Ruohui Wang(Chinese University of Hong Kong), Xiaogang Wang(Chinese University of Hong Kong), Wanli Ouyang(Chinese University of Hong Kong)
IEEE Transactions on Circuits and Systems for Video Technology
August 7, 2017
Cited by 569Open Access
Full Text

Abstract

The state-of-the-art performance for object detection has been significantly improved over the past two years. Besides the introduction of powerful deep neural networks, such as GoogleNet and VGG, novel object detection frameworks, such as R-CNN and its successors, Fast R-CNN, and Faster R-CNN, play an essential role in improving the state of the art. Despite their effectiveness on still images, those frameworks are not specifically designed for object detection from videos. Temporal and contextual information of videos are not fully investigated and utilized. In this paper, we propose a deep learning framework that incorporates temporal and contextual information from tubelets obtained in videos, which dramatically improves the baseline performance of existing still-image detection frameworks when they are applied to videos. It is called T-CNN, i.e., tubelets with convolutional neueral networks. The proposed framework won newly introduced an object-detection-from-video task with provided data in the ImageNet Large-Scale Visual Recognition Challenge 2015. Code is publicly available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/myfavouritekk/T-CNN</uri> .


Related Papers

No related papers found

Powered by citation graph analysis