The University of Bristol is part of an international consortium of 13 universities, in partnership with Facebook AI, who have collaborated to advance self-centered perception.
As a result of this initiative, they built the largest egocentric dataset in the world using cameras mounted on the head of the trade.
Advances in artificial intelligence (AI) and augmented reality (AR) require learning from the same data that humans process to perceive the world. Our eyes allow us to explore places, understand people, manipulate objects and enjoy activities – from the mundane act of opening a door to the exciting interaction of a soccer match with friends. .
Self-centered 4D live perception (Ego4D) is a large-scale dataset that compiles 3,025 hours of footage from the portable cameras of 855 participants in nine countries: United Kingdom, India, Japan, Singapore, Saudi Arabia, Colombia, Rwanda, Italy and the United States. The data captures a wide range of activities from the “egocentric” point of view, that is, from the point of view of the person performing the activity. The University of Bristol is the sole UK representative in this diverse and international effort, garnering 270 hours from 82 participants who captured images of their chosen activities of daily living – such as playing a musical instrument, the gardening, grooming their pets or assembling furniture.
“In the not-so-distant future, you might be wearing smart AR glasses that walk you through a recipe or how to fix your bike – they might even remind you where you left your keys,” the University’s lead researcher said. from Bristol and Computer Vision Professor Dima Damen.
“However, for AI to advance, it must understand the world and the experiences within it. AI attempts to learn about all aspects of human intelligence by digesting the data we perceive. to enable such automated learning, we must capture and record our daily experiences “through our eyes.” This is what Ego4D offers.
In addition to the captured images, a suite of references is available for researchers. A benchmark is a problem definition with labels collected manually to compare models. The EGO4D benchmarks are linked to the understanding of places, spaces, current actions, future actions as well as social interactions.
“Our five exciting new references provide a common goal for researchers to build fundamental research for real-world perception of visual and social contexts,” explains Professor Kristen Grauman of Facebook AI – Technical Manager.
The ambitious project was inspired by the University of Bristol’s successful EPIC-KITCHENS dataset, which recorded participants’ daily cooking activities in their homes and was, so far, the largest egocentric computer vision dataset. EPIC-KITCHENS pioneered the ‘pause and narrate’ approach to give an almost accurate time of where each action takes place in long and varied videos. Using this approach, the EGO4D consortium collected 2.5 million time-stamped claims of pending actions in the video, which is crucial for comparing the data collected.
Ego4D is a huge and diverse dataset, with benchmarks, that will prove invaluable to researchers working in the fields of augmented reality, assistive technology and robotics. The datasets will be publicly available in November this year for researchers who sign the Ego4D data usage agreement.
EGO4D team at the University of Bristol:
Prof Dima Damen – Computer vision teacher
Dr Michael Wray – Postdoctoral Research Fellow
Mr. Will Price – PhD student
Mr. Jonathan Munro – PhD student
Mr. Adriano Fragomeni – PhD student
Members of the consortium:
- University of Bristol, UK
- Carnegie Mellon University (Pittsburg, USA and Rwanda)
- Georgia Tech, United States
- Indiana University, United States
- International Institute of Information Technology, Hyderabad, India
- King Abdullah University of Science and Technology (KAUST), Saudi Arabia
- Massachusetts Institute of Technology, United States
- National University of Singapore, Singapore
- University of Los Andes, Colombia
- University of Catania, Italy
- University of Minnesota, USA
- University of Pennsylvania, United States
- University of Tokyo, Japan
EPIC-KITCHENS is a collaboration with the University of Toronto (Canada) and the University of Catania (Italy), led by the University of Bristol to collect and annotate the largest dataset (over 20 million images), capturing 45 individuals in their own homes, over several consecutive days.
The dataset was collected in 4 different countries and was narrated in 6 languages to help address vision and language challenges. It offers a series of challenges ranging from object recognition to action prediction and activity modeling in a realistic, unscripted daily setting.
The size of the publicly available datasets is crucial for the advancement of this field, which is of paramount importance for robotics, health and augmented reality.
Learn more about EPIC-KITCHENS in our blog: EPIC-KITCHENS: bringing useful AI closer to reality