复杂场景下三维人脸的重建与识别研究

项目来源

国家自然科学基金(NSFC)

项目主持人

赵启军

项目受资助机构

四川大学

项目编号

61773270

立项年度

2017

立项时间

未公开

研究期限

未知 / 未知

项目级别

国家级

受资助金额

61.00万元

学科

信息科学-人工智能-模式识别与数据挖掘

学科代码

F-F06-F0605

基金类别

面上项目

关键词

无约束人脸识别 ; 级联回归 ; 复杂场景 ; 姿态及表情鲁棒 ; 三维人脸重建 ; 3D Face Reconstruction ; Unconstrained Face Recognition ; Complex Scenarios ; Cascaded Regression ; Pose and Expression Robustness

参与者

陈虎；王洋；刘峰；曾丹；刘雨婷；施泽浩；梁洁；胡俊；骆旭

参与机构

四川大学；南方科技大学

项目标书摘要：人脸识别技术借助大数据和深度学习有了长足进步，但存在明显姿态和表情变化的复杂场景下效果仍不理想。三维人脸重建方法被用来矫正人脸姿态和辅助二维人脸识别，一定程度上提高了人脸识别率。然而，现有三维人脸重建方法一般采用参数化模型拟合，复杂度高，对姿态和表情鲁棒性差，且独立于人脸特征表示的学习过程，限制了其对人脸识别的提升作用。本项目针对复杂场景下三维人脸的重建和识别，围绕三维与二维人脸形状/纹理的映射关系、人脸三维重建与身份识别的相关性及联合学习模型等关键问题，采用级联回归和深度学习等技术路线，开展单图三维人脸重建、三面照三维人脸重建、图像集三维人脸重建和人脸三维重建与识别多任务学习等方法研究，构建面向复杂场景的三维人脸重建模型和人脸三维重建与对齐识别的多任务学习模型，提出复杂场景下的人脸三维重建与识别新方法。本项目成果将拓展三维人脸重建的研究、促进无约束人脸识别技术的发展与应用。

Application Abstract: 3D reconstruction methods have recently been applied to recover 3D face shapes from 2D images,which are then used to assist 2D face recognition.This effectively improves face recognition accuracy.However,existing 3D face reconstruction methods mostly have relatively high complexity and are fragile to pose and expression variations that are commonly seen in complex scenarios.Moreover,they are isolated from the facial feature representation learning process.This project aims to enhance the robustness of 3D face reconstruction to pose and expression variations,and to establish an end-to-end learning framework for joint 3D face reconstruction and face recognition.To this end,it will propose a variety of 3D face reconstruction methods based on single images,mugshot images,and 2D face image sets,and methods for joint 3D face reconstruction and recognition.Cascaded regression technique will be employed to predict 3D face shapes from 2D facial landmarks.Statistical facial texture models will be used to fuse textures in multiple face images.Deep convolutional neural networks will be utilized to implement the multi-task learning framework for joint 3D face reconstruction and recognition.The proposed new methods are believed to be beneficial to extend 3D face reconstruction research and to enhance face recognition performance in real-world complex scenarios.

项目受资助省

四川省

项目结题报告(全文)

人脸识别技术借助大数据和深度学习有了长足进步，但存在明显姿态和表情变化的复杂场景下效果仍不理想。三维人脸重建方法被用来矫正人脸姿态和辅助二维人脸识别，一定程度上提高了人脸识别率。然而，现有三维人脸重建方法一般采用参数化模型拟合，复杂度高，对姿态和表情鲁棒性差，且独立于人脸特征表示的学习过程，限制了其对人脸识别的提升作用。本项目针对复杂场景下三维人脸的重建和识别，围绕三维与二维人脸形状/纹理的映射关系、人脸三维重建与身份识别的相关性及联合学习模型等关键问题，开展了单图三维人脸重建、三面照三维人脸重建、图像集三维人脸重建和人脸三维重建与识别多任务学习等方法研究，构建了面向复杂场景的三维人脸重建模型和人脸三维重建与对齐识别的多任务学习模型，提出了复杂场景下的人脸三维重建与识别新方法，并在相关技术领域也开展了一些探索。项目组严格按照国家规定和既定计划开展研究工作，取得了一系列进展和成果，主要包括：建成包含不同精度三维人脸模型、不同场景二维人脸图像的千人规模多维人脸数据集；首次提出联合求解三维人脸重建和二维人脸对齐的新方法，并将重建的归一化三维人脸形状应用于辅助无约束二维人脸识别；提出基于回归的三面照三维人脸重建方法，并将其应用于公安刑侦中的任意视角人脸识别；首次在任意图像集三维人脸重建领域提出人脸特征解耦，有效提升了图像集三维人脸重建精度；首次实现三维人脸重建与人脸识别的多任务学习，提高人脸三维重建的准确度的同时也提高了无约束人脸识别准确率。除以上主要研究成果之外，项目组还在人脸识别相关领域取得了一些进展，例如基于三维建模的任意姿态二维人脸图像特征点定位、人脸去遮挡及遮挡人脸的身份识别、基于三维重建的无约束人脸表情识别、基于低质量三维数据的人脸识别、基于三维数据的人脸属性识别、三维点云分割和三维人脸稠密对齐、监控场景的目标检测和计数、显著性目标检测和图像细分类、鸟类三维重建和基于人脸几何分析的面瘫严重程度自动评估等。截至2021年12月31日，本项目已发表学术论文34篇(其中TPAMI论文1篇、中科院一区和二区论文各2篇、CCF-A类会议论文2篇及CCF-C类会议论文12篇),申请发明专利4项，出版专著和译著各2部，培养5名博士和18名硕士，超额完成了既定目标任务。本项目经费使用合理，成果有望拓展三维人三维人脸重建的研究、促进无约束人脸识别技术的发展与应用。

排序方式：时间相关性
显示方式：列表摘要

1.Geometric self-supervision for monocular 3D animal pose estimation

关键词：
Adversarial machine learning;Invertebrates;Self-supervised learning;3d animal pose estimation;3D pose estimation;Camera rotations;Data scarcity;Geometric consistency constraints;Geometric self-supervision;Monocular 3d pose estimation;Pose-estimation;Unsupervised method;View consistency

Dai, Xiaowei;Li, Shuiwang;Zhao, Qijun;Yang, Hongyu
《Pattern Recognition》
2025年
162卷
期
期刊

The limited research on 3D animal pose estimation is attributed to data scarcity and perspective ambiguities, despite its significant applications in various fields including biology, medicine, and animation. To resolve data scarcity, we put forward an unsupervised method for estimating 3D animal pose with only 2D pose available alone. To overcome perspective ambiguities, we propose canonical pose, camera, and view consistency losses to represent geometric consistency constraints for self-supervised learning. Specifically, the input 2D pose is fed into the pose generator network and camera network, and then regressed to the 3D canonical pose and camera rotation, respectively. In the training phase, the regressed 3D canonical pose is subjected to random re-projection to synthesize new 2D poses, which are also decomposed into 3D canonical pose and camera rotation to form geometric consistency constraints. Experimental results demonstrate that the proposed method achieves the best performance in unsupervised monocular 3D animal pose estimation. The corresponding code is available at: https://github.com/maicao2018/GeoSelfPose. © 2025 Elsevier Ltd

...

2.A geometry-aware generative model for face morphing attacks

关键词：
Adversarial machine learning;Adversarial networks;Attack detection;Automated face recognition;Digital manipulation;Face images;Face Morphing;Face recognition systems;Generative model;Morphing;Morphing attack

Deng, Zongyong;Zhao, Qijun;Ye, Libin;He, Qiaoyun;He, Zuyuan;Huang, Jie
《Knowledge-Based Systems》
2025年
314卷
期
期刊

Automated face recognition systems are vulnerable against various attacks, such as adversarial attacks, digital manipulation and physical spoofs. As a special case of digital manipulation attacks, face morphing draws increasing concerns due to such attacks generalizing well across diverse face recognition systems. However, the threat of face morphing attacks is underestimated due to the following characteristics of state-of-the-art morphing methods. (i) Their generated face images have low visual quality with artifacts, (ii) they fail to guarantee high similarity with contributing subjects, and (iii) they do not explicitly consider countering face morphing detection methods when constructing morphing attacks. Based on the observation that facial geometry information is vital in face recognition, we present in this paper a geometry-aware generative model (GAGM), which can realize more threatening attacks against human experts, face recognition and morphing attack detection. GAGM synthesizes morphs with the drive of both facial geometry and texture based on dual invertible networks, resulting in visually realistic and highly deceptive morphed face images. To circumvent morphing-attack detection, GAGM implements a fine-grained adversarial attack strategy to mislead the detection methods. Visualization results demonstrate that GAGM, compared to existing techniques, is capable of generating visually faultless facial morphs. Meanwhile, extensive quantitative experiments show that GAGM can significantly increase the attack success rate against face recognition and deceive various morphing attack detection models. © 2025 Elsevier B.V.

...

3.联合软阈值去噪和视频数据融合的低质量3维人脸识别

关键词：
3维人脸识别低质量3维人脸软阈值去噪联合渐变损失函数视频数据融合基金资助：国家自然科学基金项目（61773270）；嘉兴学院“百青计划”（CD70621004）；浙江省教育厅科研项目（Y202249424）～～；专辑：信息科技专题：计算机软件及计算机应用分类号：TP391.41 手机阅读

桑高丽;肖述笛;赵启军
《》
0年
卷
期
期刊

目的低质量3维人脸识别是近年来模式识别领域的热点问题；区别于传统高质量3维人脸识别，低质量、高噪声是低质量3维人脸识别面对的主要问题。围绕低质量3维人脸数据噪声大、依赖单张有限深度数据提取有效特征困难的问题，提出了一种联合软阈值去噪和视频数据融合的低质量3维人脸识别方法。方法首先，针对低质量3维人脸中存在的噪声问题，提出了一个即插即用的软阈值去噪模块，在网络提取特征的过程中对特征进行去噪处理。为了使网络提取的特征更具有判别性，结合softmax和Arcface（additive angular margin loss for deep face recognition）提出的联合渐变损失函数使网络提取更具有判别性特征。为了更好地利用多帧低质量视频数据实现人脸数据质量提升，提出了基于门控循环单元的视频数据融合模块，实现了视频帧数据间互补信息的有效融合，进一步提高了低质量3维人脸识别准确率。结果实验在两个公开数据集上与较新方法进行比较，在Lock3DFace（low-cost kinect 3D faces）开、闭集评估协议上，相比于性能第2的方法，平均识别率分别提高了0.28%和3.13%；在ExtendedMulti-Dim开集评估协议上，相比于性能第2的方法，平均识别率提高了1.03%。结论提出的低质量3维人脸识别方法，不仅能有效缓解低质量噪声带来的影响，还有效融合了多帧视频数据的互补信息，大幅提高了低质量3维人脸识别准确率。

...

4.基于Kinect的人体姿态估计优化和动画生成

关键词：
人体姿态估计;KINECT;虚拟人动画;仿真;防遮挡

赵威;李毅
《计算机应用》
2022年
卷
9期
期刊

为了生成更准确流畅的虚拟人动画,采用Kinect设备捕获三维人体姿态数据的同时,使用单目人体三维姿态估计算法对Kinect的彩色信息进行骨骼点数据推理,从而实时优化人体姿态估计效果,并驱动虚拟人物模型生成动画。首先,提出了一种时空优

...

5.Light field salient object detection:A review and benchmark

关键词：
light;field;salient;object;detection(SOD);deep;learning;BENCHMARKING

Keren Fu;Yao Jiang;Ge-Peng Ji;Tao Zhou;Qijun Zhao;Deng-Ping Fan
《计算可视媒体:英文版》
2022年
卷
4期
期刊

...

6.POOLING SCORES OF NEIGHBORING POINTS FOR IMPROVED 3D POINT CLOUD SEGMENTATION

赵晨曦；周玮皓；卢莉；赵启军；
《》
0年
卷
期
期刊

7.Siamese Network for RGB-D Salient Object Detection and Beyond

关键词：
Siamese network; RGB-D SOD; saliency detection; salient objectdetection; RGB-D semantic segmentation;IMAGE; SEGMENTATION; DEEP; FUSION; MODEL; CONVOLUTION; FRAMEWORK;CONTRAST; FEATURES; ENERGY

Fu, Keren;Fan, Deng-Ping;Ji, Ge-Peng;Zhao, Qijun;Shen, Jianbing;Zhu, Ce
《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》
2022年
44卷
9期
期刊

Existing RGB-D salient object detection (SOD) models usually treat RGB and depth as independent information and design separate networks for feature extraction from each. Such schemes can easily be constrained by a limited amount of training data or over-reliance on an elaborately designed training process. Inspired by the observation that RGB and depth modalities actually present certain commonality in distinguishing salient objects, a novel joint learning and densely cooperative fusion (JL-DCF) architecture is designed to learn from both RGB and depth inputs through a shared network backbone, known as the Siamese architecture. In this paper, we propose two effective components: joint learning (JL), and densely cooperative fusion (DCF). The JL module provides robust saliency feature learning by exploiting cross-modal commonality via a Siamese network, while the DCF module is introduced for complementary feature discovery. Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector with good generalization. As a result, JL-DCF significantly advances the state-of-the-art models by an average of similar to 2.0% (max F-measure) across seven challenging datasets. In addition, we show that JL-DCF is readily applicable to other related multi-modal detection tasks, including RGB-T (thermal infrared) SOD and video SOD, achieving comparable or even better performance against state-of-the-art methods. We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models on the task of RGB-D SOD. These facts further confirm that the proposed framework could offer a potential solution for various applications and provide more insight into the cross-modal complementarity task.

...

8.Learning residue-aware correlation filters and refining scale for real-time UAV tracking

关键词：
Air navigation;Aircraft detection;Antennas;Deep learning;Efficiency;Aviation Security;Correlation filters;Discriminative scale estimation;Filter-based;Grabcut;ITS applications;Real- time;Residue-aware correlation filter;Scale estimation;Unmanned aerial vehicle tracking

Li, Shuiwang;Liu, Yuting;Zhao, Qijun;Feng, Ziliang
《Pattern Recognition》
2022年
127卷
期
期刊

Unmanned aerial vehicle (UAV)-based tracking finds its applications in agriculture, aviation, navigation, transportation and public security, etc and develops rapidly recently. However, due to limitations of computing resources, battery capacity, requirement of low power and maximum load of UAV, the deployment of deep learning-based tracking algorithms in UAV is currently not feasible and therefore discriminative correlation filters (DCF)-based trackers have stood out in UAV tracking community for their high efficiency and appealing robustness on a single CPU. But confronted with difficult challenges the efficiency and accuracy of existing DCF-based approaches is still not satisfying. Inspired by the good optimization properties associated with residue representation, in this paper we exploit the residue nature inherent to videos and propose residue-aware correlation filters which demonstrate better convergence properties in filter learning. In addition, we propose a scale refinement strategy to improve the wildly adopted discriminative scale estimation in DCF-based trackers, which, in fact, greatly impacts the precision and accuracy of the trackers since accumulated scale error degrades the appearance model as online updating goes on. Extensive experiments are conducted on four UAV benchmarks, namely, UAV123@10fps, DTB70, UAVDT and Vistrone2018 (VisDrone2018-test-dev). The results show that our method achieves state-of-the-art performance in UAV tracking.

...

9.Light field salient object detection: A review and benchmark

关键词：
Deep learning ; Object detection ; Object recognition;Comprehensive information ; Deep learning ; Detection models ; Field record ; Light fields ; Natural scenes ; Research topics ; Saliency detection ; Salient object detection

FuKeren;JiangYao;JiGe-Peng;ZhouTao;ZhaoQijun;FanDeng-Ping
《Computational Visual Media》
2022年
8卷
4期
期刊

Salient object detection (SOD) is a long-standing research topic in computer vision with increasing interest in the past decade. Since light fields record comprehensive information of natural scenes that benefit SOD in a number of ways, using light field inputs to improve saliency detection over conventional RGB inputs is an emerging trend. This paper provides the first comprehensive review and a benchmark for light field SOD, which has long been lacking in the saliency community. Firstly, we introduce light fields, including theory and data forms, and then review existing studies on light field SOD, covering ten traditional models, seven deep learning-based models, a comparative study, and a brief review. Existing datasets for light field SOD are also summarized. Secondly, we benchmark nine representative light field SOD models together with several cutting-edge RGB-D SOD models on four widely used light field datasets, providing insightful discussions and analyses, including a comparison between light field SOD and RGB-D SOD models. Due to the inconsistency of current datasets, we further generate complete data and supplement focal stacks, depth maps, and multi-view images for them, making them consistent and uniform. Our supplemental data make a universal benchmark possible. Lastly, light field SOD is a specialised problem, because of its diverse data representations and high dependency on acquisition hardware, so it differs greatly from other saliency detection tasks. We provide nine observations on challenges and future directions, and outline several open issues. All the materials including models, datasets, benchmarking results, and supplemented light field datasets are publicly available at https://github.com/kerenfu/LFSOD-Survey. [Figure not available: see fulltext.]. © 2022, The Author(s).

...

10.引入标记分布的人脸表情图像生成

关键词：
标记分布;图像生成;生成对抗网络

杨静波;赵启军;吕泽均
《现代计算机》
2021年
卷
12期
期刊

随着生成对抗网络在图像生成领域的发展,人脸表情图像生成效果有了显著提升。然而,目前方法往往基于传统表情分类,忽略表情的复杂多样性。然而针对生成多样表情的数据库数据规模较小。为了解决这一问题,提出引入标记分布的人脸表情生成方法。方法在数据量较少的情况下,以标签分布对表情标签进行处理,基于生成对抗网络实现人脸表情图像生成,并在Oulu-CASIA数据库和CFEED数据库上对该方法进行验证。

...

排序方式：时间相关性
显示方式：列表摘要