Ego4D: Around the World in 3,000 Hours of Egocentric Video

Kristen Grauman(The University of Texas at Austin), Andrew Westbury, Eugene H. Byrne, Zachary Chavis(University of Minnesota System), Antonino Furnari(University of Catania), Rohit Girdhar, Jackson Hamburger, Hao Jiang(META Health), Miao Liu(Georgia Institute of Technology), Xingyu Liu(Carnegie Mellon University), Miguel Martin, Tushar Nagarajan(The University of Texas at Austin), Ilija Radosavovic(Berkeley College), Santhosh Kumar Ramakrishnan(The University of Texas at Austin), Fiona Ryan(Georgia Institute of Technology), Jayant Sharma(University of Minnesota System), Michael Wray(University of Bristol), Mengmeng Xu(King Abdullah University of Science and Technology), Zhongcong Xu(National University of Singapore), Chen Zhao(King Abdullah University of Science and Technology), Siddhant Bansal(International Institute of Information Technology, Hyderabad), Dhruv Batra, Vincent Cartillier(Georgia Institute of Technology), Sean Crane(Carnegie Mellon University), Tien Do(University of Minnesota System), Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni(University of Bristol), Qichen Fu(Carnegie Mellon University), Abrham Gebreselasie(Carnegie Mellon University Africa), Cristina González(Universidad de Los Andes), James Hillis(META Health), Xuhua Huang(Carnegie Mellon University), Yifei Huang(The University of Tokyo), Wenqi Jia(Georgia Institute of Technology), Weslie Khoo(Indiana University), Jachym Kolar, Satwik Kottur, Anurag Kumar(META Health), Federico Landini, Chao Li(META Health), Yanghao Li, Zhenqiang Li(The University of Tokyo), Karttikeya Mangalam(Berkeley College), Raghava Modhugu(International Institute of Information Technology, Hyderabad), Jonathan Munro(University of Bristol), Tullie Murrell, Takumi Nishiyasu(The University of Tokyo), Will Price(University of Bristol), Paola Ruiz Puentes(Universidad de Los Andes), Merey Ramazanova(King Abdullah University of Science and Technology), Leda Sarı(META Health), Kiran Somasundaram(META Health), Audrey Southerland(Georgia Institute of Technology), Yusuke Sugano(The University of Tokyo), Ruijie Tao(National University of Singapore), Minh Thanh Vo(META Health), Yuchen Wang(Indiana University), Xindi Wu(Carnegie Mellon University), Takuma Yagi(The University of Tokyo), Ziwei Zhao(Indiana University), Yunyi Zhu(National University of Singapore), Pablo Arbeláez(Universidad de Los Andes), David Crandall(Indiana University), Dima Damen(University of Bristol), Giovanni Maria Farinella(University of Catania), Christian Fuegen, Bernard Ghanem(King Abdullah University of Science and Technology), Vamsi Krishna Ithapu(META Health), C. V. Jawahar(International Institute of Information Technology, Hyderabad), Hanbyul Joo, Kris Kitani(Carnegie Mellon University), Haizhou Li(National University of Singapore), Richard Newcombe(META Health), Aude Oliva(Moscow Institute of Thermal Technology), Hyun Soo Park(University of Minnesota System), James M. Rehg(Georgia Institute of Technology), Yoichi Sato(The University of Tokyo), Jianbo Shi(California University of Pennsylvania), Mike Zheng Shou(National University of Singapore), Antonio Torralba(Moscow Institute of Thermal Technology), Lorenzo Torresani(Dartmouth Hospital), Mingfei Yan(META Health), Jitendra Malik(Berkeley College)
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
June 1, 2022
Cited by 544Open Access
Full Text

Abstract

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of dailylife activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards, with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/


Related Papers

No related papers found

Powered by citation graph analysis