映像に基づく人物行動理解の意味的深化

项目来源

日本学术振兴会基金(JSPS)

项目主持人

佐藤洋一

项目受资助机构

東京大学

立项年度

2024

立项时间

未公开

项目编号

24K02956

研究期限

未知 / 未知

项目级别

国家级

受资助金额

18590000.00日元

学科

知覚情報処理関連

学科代码

未公开

基金类别

基盤研究(B)

关键词

一人称視点映像解析 ;

参与者

古田諒佑

参与机构

未公开

项目标书摘要:Outline of Research at the Start:本研究では、外部知識として大規模言語モデルを活用することによる複雑な行動の深い意味理解と、3次元的なアフォーダンスに基づく手と物体のインタラクションの詳細な理解を軸として、深い意味レベルでの解釈に立脚した映像からの人物行動動理解の実現を目指す。これにより、従来の人物行動理解技術が抱える主要な課題、具体的には、身体動作レベルの表層的な解釈に留まり、行動の目的や複雑な行動における因果関係、さらには行動の裏にある理由や意図の理解が実現されていないという課題、さらに、詳細な行動理解のカギとなる手と物体のインタラクションの解析が2次元的かつ定性的なものに留まっているという課題の克服を図る。

  • 排序方式:
  • 1
  • /
  • 1.Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    • 关键词:
    • Video understanding; First-person video; Egocentric; Video-language; 3D;Body pose;NETWORKS; DATASET
    • Grauman, Kristen;Westbury, Andrew;Torresani, Lorenzo;Kitani, Kris;Malik, Jitendra;Afouras, Triantafyllos;Ashutosh, Kumar;Baiyya, Vijay;Bansal, Siddhant;Boote, Bikram;Byrne, Eugene;Chavis, Zach;Chen, Joya;Cheng, Feng;Chu, Fu-Jen;Crane, Sean;Dasgupta, Avijit;Dong, Jing;Escobar, Maria;Forigua, Cristhian;Gebreselasie, Abrham;Haresh, Sanjay;Huang, Jing;Islam, Md Mohaiminul;Jain, Suyog;Khirodkar, Rawal;Kukreja, Devansh;Liang, Kevin J.;Liu, Jia-Wei;Majumder, Sagnik;Mao, Yongsen;Martin, Miguel;Mavroudi, Effrosyni;Nagarajan, Tushar;Ragusa, Francesco;Ramakrishnan, Santhosh Kumar;Seminara, Luigi;Somayazulu, Arjun;Song, Yale;Su, Shan;Xue, Zihui;Zhang, Edward;Zhang, Jinxu;Castillo, Angela;Chen, Changan;Fu, Xinzhu;Furuta, Ryosuke;Gonzalez, Cristina;Gupta, Prince;Hu, Jiabo;Huang, Yifei;Huang, Yiming;Khoo, Weslie;Kumar, Anush;Kuo, Robert;Lakhavani, Sach;Liu, Miao;Luo, Mi;Luo, Zhengyi;Meredith, Brighid;Miller, Austin;Oguntola, Oluwatumininu;Pan, Xiaqing;Peng, Penny;Pramanick, Shraman;Ramazanova, Merey;Ryan, Fiona;Shan, Wei;Somasundaram, Kiran;Song, Chenan;Southerland, Audrey;Tateno, Masatoshi;Wang, Huiyu;Wang, Yuchen;Yagi, Takuma;Yan, Mingfei;Yang, Xitong;Yu, Zecheng;Zha, Shengxin Cindy;Zhao, Chen;Zhao, Ziwei;Zhu, Zhifan;Zhuo, Jeff;Arbelaez, Pablo;Bertasius, Gedas;Crandall, David;Damen, Dima;Engel, Jakob;Farinella, Giovanni Maria;Furnari, Antonino;Ghanem, Bernard;Hoffman, Judy;Jawahar, C. V.;Newcombe, Richard;Park, Hyun Soo;Rehg, James M.;Sato, Yoichi;Savva, Manolis;Shi, Jianbo;Shou, Mike Zheng;Wray, Michael
    • 《INTERNATIONAL JOURNAL OF COMPUTER VISION》
    • 2025年
    • 期刊

    We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from 1 to 42 minutes each and 1,286 hours of video combined. The multimodal nature of the dataset is unprecedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions-including a novel "expert commentary" done by coaches and teachers and tailored to the skilled-activity domain. To push the frontier of first-person video understanding of skilled human activity, we also present a suite of benchmark tasks and their annotations, including fine-grained activity understanding, proficiency estimation, cross-view translation, and 3D hand/body pose. All resources are open sourced to fuel new research in the community. https://ego-exo4d-data.org/

    ...
  • 排序方式:
  • 1
  • /