CAREER: Toward Video2Sim: Turning Real World Videos into Simulations

项目来源

美国国家科学基金(NSF)

项目主持人

Jia Deng

项目受资助机构

PRINCETON UNIVERSITY

立项年度

2019

立项时间

未公开

项目编号

1942981

研究期限

未知 / 未知

项目级别

国家级

受资助金额

319332.00美元

学科

未公开

学科代码

未公开

基金类别

Continuing grant

关键词

Robust Intelligence ; CAREER-Faculty Erly Career Dev ; ROBUST INTELLIGENCE

参与者

未公开

参与机构

未公开

项目标书摘要:This project develops new technology toward Video2Sim:automatically converting a video into a virtual world,where scenes are reconstructed,actions are re-enacted,and alternative outcomes are simulated by a computer.Such a system does not yet exist due to the limitations of existing technology,and as a result,virtual worlds need to be manually and laboriously constructed.Video2Sim is useful because virtual worlds can be used to train and evaluate AI systems.For example,videos of traffic accidents can be converted into simulations to test autonomous cars,or videos of kitchen scenes to test home robots.Simulation is more scalable and cost-effective than real world experiments and is particularly suited for machine learning algorithms that require a lot of training data.Furthermore,such an automated system can leverage a large number of videos to provide a comprehensive coverage of rare events,which is essential for evaluating and assuring the safety of autonomous systems.Therefore,Video2Sim has the potential to benefit a broad range of applications including robotics,healthcare,and transportation.Research in this project is integrated with K12,undergraduate,and graduate education through research training,course development and outreach events.This research develops key techniques toward a Video2Sim system with a focus on 3D shape and motion.This effort is organized into two thrusts:(1)reconstructing 3D shape and motion and(2)simulating dynamics and behavior.The goal of thrust 1 is to recover 3D shape and motion of a full scene from a monocular video,such that we can re-render the scene and re-enact the events from an arbitrary view.The focus is on developing methods to recover detailed 3D shape and 3D motion from arbitrary unconstrained videos.The goal of thrust 2 is to recover the underlying dynamics of a scene,such that we can not only re-enact the actual events but also simulate alternative outcomes.The focus is on developing methods to infer not only physical properties of passive objectives but also behavior models of agents,that is,entities that do not just move passively according to external forces but can plan and initiate their own actions.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • 排序方式:
  • 1
  • /
  • 1.VIEW SYNTHESIS WITH SCULPTED NEURAL POINTS

    • 关键词:
    • Rendering (computer graphics);Implicit representation;Neural representations;New approaches;Novel techniques;Parameterized;Point-based methods;Point-based rendering;Point-clouds;View synthesis;Visual qualities
    • Zuo, Yiming;Deng, Jia
    • 《11th International Conference on Learning Representations, ICLR 2023》
    • 2023年
    • May 1, 2023 - May 5, 2023
    • Kigali, Rwanda
    • 会议

    We address the task of view synthesis, generating novel views of a scene given a set of images as input. In many recent works such as NeRF (Mildenhall et al., 2020), the scene geometry is parameterized using neural implicit representations (i.e., MLPs). Implicit neural representations have achieved impressive visual quality but have drawbacks in computational efficiency. In this work, we propose a new approach that performs view synthesis using point clouds. It is the first point-based method that achieves better visual quality than NeRF while being 100× faster in rendering speed. Our approach builds on existing works on differentiable point-based rendering but introduces a novel technique we call "Sculpted Neural Points (SNP)", which significantly improves the robustness to errors and holes in the reconstructed point cloud. We further propose to use view-dependent point features based on spherical harmonics to capture non-Lambertian surfaces, and new designs in the point-based rendering pipeline that further boost the performance. Finally, we show that our system supports fine-grained scene editing. Code is available at https://github.com/princeton-vl/SNP. © 2023 11th International Conference on Learning Representations, ICLR 2023. All rights reserved.

    ...
  • 2.Infinite Photorealistic Worlds Using Procedural Generation

    • 关键词:
    • Optical flows;3D scenes;Dataset and evaluation;External sources;Natural phenomenon;Natural world;Object and scenes;Objects detection;Photo-realistic;Semantic segmentation;Training data
    • Raistrick, Alexander;Lipson, Lahav;Ma, Zeyu;Mei, Lingjie;Wang, Mingzhe;Zuo, Yiming;Kayan, Karhan;Wen, Hongyu;Han, Beining;Wang, Yihan;Newell, Alejandro;Law, Hei;Goyal, Ankit;Yang, Kaiyu;Deng, Jia
    • 《2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023》
    • 2023年
    • June 18, 2023 - June 22, 2023
    • Vancouver, BC, Canada
    • 会议

    We introduce Infinigen, a procedural generator of photorealistic 3D scenes of the natural world. Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source and allowing infinite variation and composition. Infinigen offers broad coverage of objects and scenes in the natural world including plants, animals, terrains, and natural phenomena such as fire, cloud, rain, and snow. Infinigen can be used to generate unlimited, diverse training data for a wide range of computer vision tasks including object detection, semantic segmentation, optical flow, and 3D reconstruction. We expect Infinigen to be a useful resource for computer vision research and beyond. Please visit infinigen.org for videos, code and pre-generated data. © 2023 IEEE.

    ...
  • 3.Coupled Iterative Refinement for 6D Multi-Object Pose Estimation

    • 关键词:
    • ;
    • Lipson, Lahav;Teed, Zachary;Goyal, Ankit;Deng, Jia
    • 《2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022》
    • 2022年
    • June 19, 2022 - June 24, 2022
    • New Orleans, LA, United states
    • 会议

    We address the task of 6D multi-object pose: given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object. We propose a new approach to 6D object pose estimation which consists of an end-to-end differentiable architecture that makes use of geometric knowledge. Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy. We use a novel differentiable layer to perform pose refinement by solving an optimization problem we refer to as Bidirectional Depth-Augmented Perspective-N-Point (BD-PnP). Our method achieves state-of-the-art accuracy on standard 6D Object Pose benchmarks. Code is available at https://github.com/princeton-vl/Coupled-Iterative-Refinement. © 2022 IEEE.

    ...
  • 4.Multiview Stereo with Cascaded Epipolar RAFT

    • 关键词:
    • 3D modeling ; Benchmarking ; Computer vision ; Three dimensional computer graphics;3;D vision ; 3D models ; 3d;modeling ; Depthmap ; Epipolar ; Multi;view stereo ; Multi;views ; Multiresolution fusion ; New approaches ; Point;clouds
    • MaZeyu;TeedZachary;DengJia
    • 《17th European Conference on Computer Vision, ECCV 2022》
    • 2022年
    • October 23, 2022 - October 27, 2022
    • Tel Aviv, Israel
    • 会议

    We address multiview stereo (MVS), an important 3D vision task that reconstructs a 3D model such as a dense point cloud from multiple calibrated images. We propose CER-MVS (Cascaded Epipolar RAFT Multiview Stereo), a new approach based on the RAFT (Recurrent All-Pairs Field Transforms) architecture developed for optical flow. CER-MVS introduces five new changes to RAFT: epipolar cost volumes, cost volume cascading, multiview fusion of cost volumes, dynamic supervision, and multiresolution fusion of depth maps. CER-MVS is significantly different from prior work in multiview stereo. Unlike prior work, which operates by updating a 3D cost volume, CER-MVS operates by updating a disparity field. Furthermore, we propose an adaptive thresholding method to balance the completeness and accuracy of the reconstructed point clouds. Experiments show that our approach achieves state-of-the-art performance on the DTU and Tanks-and-Temples benchmarks (both intermediate and advanced set). Code is available at https://github.com/princeton-vl/CER-MVS. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

    ...
  • 排序方式:
  • 1
  • /