复杂场景下三维人脸的重建与识别研究

项目来源

国家自然科学基金(NSFC)

项目主持人

赵启军

项目受资助机构

四川大学

项目编号

61773270

立项年度

2017

立项时间

未公开

研究期限

未知 / 未知

项目级别

国家级

受资助金额

61.00万元

学科

信息科学-人工智能-模式识别与数据挖掘

学科代码

F-F06-F0605

基金类别

面上项目

关键词

无约束人脸识别 ; 级联回归 ; 复杂场景 ; 姿态及表情鲁棒 ; 三维人脸重建 ; 3D Face Reconstruction ; Unconstrained Face Recognition ; Complex Scenarios ; Cascaded Regression ; Pose and Expression Robustness

参与者

陈虎；王洋；刘峰；曾丹；刘雨婷；施泽浩；梁洁；胡俊；骆旭

参与机构

四川大学；南方科技大学

项目标书摘要：人脸识别技术借助大数据和深度学习有了长足进步，但存在明显姿态和表情变化的复杂场景下效果仍不理想。三维人脸重建方法被用来矫正人脸姿态和辅助二维人脸识别，一定程度上提高了人脸识别率。然而，现有三维人脸重建方法一般采用参数化模型拟合，复杂度高，对姿态和表情鲁棒性差，且独立于人脸特征表示的学习过程，限制了其对人脸识别的提升作用。本项目针对复杂场景下三维人脸的重建和识别，围绕三维与二维人脸形状/纹理的映射关系、人脸三维重建与身份识别的相关性及联合学习模型等关键问题，采用级联回归和深度学习等技术路线，开展单图三维人脸重建、三面照三维人脸重建、图像集三维人脸重建和人脸三维重建与识别多任务学习等方法研究，构建面向复杂场景的三维人脸重建模型和人脸三维重建与对齐识别的多任务学习模型，提出复杂场景下的人脸三维重建与识别新方法。本项目成果将拓展三维人脸重建的研究、促进无约束人脸识别技术的发展与应用。

Application Abstract: 3D reconstruction methods have recently been applied to recover 3D face shapes from 2D images,which are then used to assist 2D face recognition.This effectively improves face recognition accuracy.However,existing 3D face reconstruction methods mostly have relatively high complexity and are fragile to pose and expression variations that are commonly seen in complex scenarios.Moreover,they are isolated from the facial feature representation learning process.This project aims to enhance the robustness of 3D face reconstruction to pose and expression variations,and to establish an end-to-end learning framework for joint 3D face reconstruction and face recognition.To this end,it will propose a variety of 3D face reconstruction methods based on single images,mugshot images,and 2D face image sets,and methods for joint 3D face reconstruction and recognition.Cascaded regression technique will be employed to predict 3D face shapes from 2D facial landmarks.Statistical facial texture models will be used to fuse textures in multiple face images.Deep convolutional neural networks will be utilized to implement the multi-task learning framework for joint 3D face reconstruction and recognition.The proposed new methods are believed to be beneficial to extend 3D face reconstruction research and to enhance face recognition performance in real-world complex scenarios.

项目受资助省

四川省

项目结题报告(全文)

人脸识别技术借助大数据和深度学习有了长足进步，但存在明显姿态和表情变化的复杂场景下效果仍不理想。三维人脸重建方法被用来矫正人脸姿态和辅助二维人脸识别，一定程度上提高了人脸识别率。然而，现有三维人脸重建方法一般采用参数化模型拟合，复杂度高，对姿态和表情鲁棒性差，且独立于人脸特征表示的学习过程，限制了其对人脸识别的提升作用。本项目针对复杂场景下三维人脸的重建和识别，围绕三维与二维人脸形状/纹理的映射关系、人脸三维重建与身份识别的相关性及联合学习模型等关键问题，开展了单图三维人脸重建、三面照三维人脸重建、图像集三维人脸重建和人脸三维重建与识别多任务学习等方法研究，构建了面向复杂场景的三维人脸重建模型和人脸三维重建与对齐识别的多任务学习模型，提出了复杂场景下的人脸三维重建与识别新方法，并在相关技术领域也开展了一些探索。项目组严格按照国家规定和既定计划开展研究工作，取得了一系列进展和成果，主要包括：建成包含不同精度三维人脸模型、不同场景二维人脸图像的千人规模多维人脸数据集；首次提出联合求解三维人脸重建和二维人脸对齐的新方法，并将重建的归一化三维人脸形状应用于辅助无约束二维人脸识别；提出基于回归的三面照三维人脸重建方法，并将其应用于公安刑侦中的任意视角人脸识别；首次在任意图像集三维人脸重建领域提出人脸特征解耦，有效提升了图像集三维人脸重建精度；首次实现三维人脸重建与人脸识别的多任务学习，提高人脸三维重建的准确度的同时也提高了无约束人脸识别准确率。除以上主要研究成果之外，项目组还在人脸识别相关领域取得了一些进展，例如基于三维建模的任意姿态二维人脸图像特征点定位、人脸去遮挡及遮挡人脸的身份识别、基于三维重建的无约束人脸表情识别、基于低质量三维数据的人脸识别、基于三维数据的人脸属性识别、三维点云分割和三维人脸稠密对齐、监控场景的目标检测和计数、显著性目标检测和图像细分类、鸟类三维重建和基于人脸几何分析的面瘫严重程度自动评估等。截至2021年12月31日，本项目已发表学术论文34篇(其中TPAMI论文1篇、中科院一区和二区论文各2篇、CCF-A类会议论文2篇及CCF-C类会议论文12篇),申请发明专利4项，出版专著和译著各2部，培养5名博士和18名硕士，超额完成了既定目标任务。本项目经费使用合理，成果有望拓展三维人三维人脸重建的研究、促进无约束人脸识别技术的发展与应用。

排序方式：时间相关性
显示方式：列表摘要

1.Geometric self-supervision for monocular 3D animal pose estimation

关键词：
Adversarial machine learning;Invertebrates;Self-supervised learning;3d animal pose estimation;3D pose estimation;Camera rotations;Data scarcity;Geometric consistency constraints;Geometric self-supervision;Monocular 3d pose estimation;Pose-estimation;Unsupervised method;View consistency

Dai, Xiaowei;Li, Shuiwang;Zhao, Qijun;Yang, Hongyu
《Pattern Recognition》
2025年
162卷
期
期刊

The limited research on 3D animal pose estimation is attributed to data scarcity and perspective ambiguities, despite its significant applications in various fields including biology, medicine, and animation. To resolve data scarcity, we put forward an unsupervised method for estimating 3D animal pose with only 2D pose available alone. To overcome perspective ambiguities, we propose canonical pose, camera, and view consistency losses to represent geometric consistency constraints for self-supervised learning. Specifically, the input 2D pose is fed into the pose generator network and camera network, and then regressed to the 3D canonical pose and camera rotation, respectively. In the training phase, the regressed 3D canonical pose is subjected to random re-projection to synthesize new 2D poses, which are also decomposed into 3D canonical pose and camera rotation to form geometric consistency constraints. Experimental results demonstrate that the proposed method achieves the best performance in unsupervised monocular 3D animal pose estimation. The corresponding code is available at: https://github.com/maicao2018/GeoSelfPose. © 2025 Elsevier Ltd

...

2.A geometry-aware generative model for face morphing attacks

关键词：
Adversarial machine learning;Adversarial networks;Attack detection;Automated face recognition;Digital manipulation;Face images;Face Morphing;Face recognition systems;Generative model;Morphing;Morphing attack

Deng, Zongyong;Zhao, Qijun;Ye, Libin;He, Qiaoyun;He, Zuyuan;Huang, Jie
《Knowledge-Based Systems》
2025年
314卷
期
期刊

Automated face recognition systems are vulnerable against various attacks, such as adversarial attacks, digital manipulation and physical spoofs. As a special case of digital manipulation attacks, face morphing draws increasing concerns due to such attacks generalizing well across diverse face recognition systems. However, the threat of face morphing attacks is underestimated due to the following characteristics of state-of-the-art morphing methods. (i) Their generated face images have low visual quality with artifacts, (ii) they fail to guarantee high similarity with contributing subjects, and (iii) they do not explicitly consider countering face morphing detection methods when constructing morphing attacks. Based on the observation that facial geometry information is vital in face recognition, we present in this paper a geometry-aware generative model (GAGM), which can realize more threatening attacks against human experts, face recognition and morphing attack detection. GAGM synthesizes morphs with the drive of both facial geometry and texture based on dual invertible networks, resulting in visually realistic and highly deceptive morphed face images. To circumvent morphing-attack detection, GAGM implements a fine-grained adversarial attack strategy to mislead the detection methods. Visualization results demonstrate that GAGM, compared to existing techniques, is capable of generating visually faultless facial morphs. Meanwhile, extensive quantitative experiments show that GAGM can significantly increase the attack success rate against face recognition and deceive various morphing attack detection models. © 2025 Elsevier B.V.

...

3.QEMesh: Employing A Quadric Error Metrics-Based Representation for Mesh Generation

关键词：
Computer aided design;Decoding;Diffusion;Distributed computer systems;Errors;Mobile telecommunication systems;Polonium compounds;Three dimensional computer graphics;3D content;3d generation;Content creation;Diffusion model;High quality;Local geometry;Mesh;Metric matrix;Quadric error metrics;Shape representation

Li, Jiaqi;Wang, Ruowei;Liu, Yu;Zhao, Qijun
《2025 IEEE International Conference on Multimedia and Expo, ICME 2025》
2025年
June 30, 2025 - July 4, 2025
Nantes, France
会议

Mesh generation plays a crucial role in 3D content creation, as mesh is widely used in various industrial applications. Recent works have achieved impressive results but still face several issues, such as unrealistic patterns or pits on surfaces, thin parts missing, and incomplete structures. Most of these problems stem from the choice of shape representation or the capabilities of the generative network. To alleviate these, we extend PoNQ, a Quadric Error Metrics (QEM)-based representation, and propose a novel model, QEMesh, for high-quality mesh generation. PoNQ divides the shape surface into tiny patches, each represented by a point with its normal and QEM matrix, which preserves fine local geometry information. In our QEMesh, we regard these elements as generable parameters and design a unique latent diffusion model containing a novel multi-decoder VAE for PoNQ parameters generation. Given the latent code generated by the diffusion model, three parameter decoders produce several PoNQ parameters within each voxel cell, and an occupancy decoder predicts which voxel cells containing parameters to form the final shape. Extensive evaluations demonstrate that our method generates results with watertight surfaces and is comparable to state-of-the-art methods in several main metrics. © 2025 IEEE.

...

4.Identity-Agnostic Learning for Deepfake Face Detection

关键词：
;

Zhou, Xuan;Deng, Zongyong;Zhao, Qijun
《2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025》
2025年
April 6, 2025 - April 11, 2025
Hyderabad, India
会议

Despite the promising results obtained by existing deepfake face detection methods for within-dataset detection, they often fail to generalize effectively to new datasets. We hypothesize that identity, a significant feature in facial recognition, is a key factor affecting deepfake detection models' cross-dataset performance. In the feature space learned by a real/fake classifier, facial features may cluster based on identity rather than their authenticity, which undermines the classifier's ability to distinguish between real and fake images. This paper introduces a novel training approach called Identity-Agnostic Learning (IAL) for deepfake face detection. IAL trains the detection model with identity-agnostic manner. It thus guides model to pay attention to the identity-irrelevant features. Experimental results demonstrate that our method effectively enhances the overall generalizability of deepfake face detection models. © 2025 IEEE.

...

5.MTFusion: Reconstructing Any 3D Object from Single Image Using Multi-word Textual Inversion

关键词：
3D modeling;Image texture;Three dimensional computer graphics;3D models;3D object;3D reconstruction;3d-modeling;Diffusion model;Multi-word;Single images;Standing problems;Textual description;Textual inversion

Liu, Yu;Wang, Ruowei;Li, Jiaqi;Xu, Zixiang;Zhao, Qijun
《7th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2024》
2025年
October 18, 2024 - October 20, 2024
Urumqi, China
会议

Reconstructing 3D models from single-view images is a long-standing problem in computer vision. The latest advances for single-image 3D reconstruction extract a textual description from the input image and further utilize it to synthesize 3D models. However, existing methods focus on capturing a single key attribute of the image (e.g., object type, artistic style) and fail to consider the multi-perspective information required for accurate 3D reconstruction, such as object shape and material properties. Besides, the reliance on Neural Radiance Fields hinders their ability to reconstruct intricate surfaces and texture details. In this work, we propose MTFusion, which leverages both image data and textual descriptions for high-fidelity 3D reconstruction. Our approach consists of two stages. First, we adopt a novel multi-word textual inversion technique to extract a detailed text description capturing the image’s characteristics. Then, we use this description and the image to generate a 3D model with FlexiCubes. Additionally, MTFusion enhances FlexiCubes by employing a special decoder network for Signed Distance Functions, leading to faster training and finer surface representation. Extensive evaluations demonstrate that our MTFusion surpasses existing image-to-3D methods on a wide range of synthetic and real-world images. Furthermore, the ablation study proves the effectiveness of our network designs. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

...

6.MUPO-Net: A Multilevel Dual-domain Progressive Enhancement Network with Embedded Attention for CT Metal Artifact Reduction

关键词：
;

Yao, Xiaoli;Tan, Jia;Deng, Zijian;Xiong, Deng;Zhao, Qijun;Wu, Min
《2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025》
2025年
April 6, 2025 - April 11, 2025
Hyderabad, India
会议

Metal implants in patients cause severe streaking artifacts in computed tomography (CT) images, significantly compromising image quality. Deep learning methods have been successfully applied to metal artifact reduction (MAR) in CT, but often result in overly smooth images, failing to reconstruct complex details accurately. In this paper, we propose a multilevel dual-domain progressive enhancement network with embedded attention for MAR, termed MUPO-Net. Our approach constructs a Contrast Weight Mapping (CWM) module that generates a weighted heatmap, allocating weights to different regions based on the influence of metal artifacts, and an ASR-Net (Attention-Embedded Sinogram Restoration Network) that utilizes these weights to better remove artifacts in sinogram domain. Additionally, an Image Detail Enhancement Network (IDE-Net) is proposed to restore fine texture details in CT images through multi-scale feature fusion. Extensive experiments on both synthetic and clinical datasets demonstrate the superior effectiveness of MUPO-Net compared to the state-of-the-art MAR techniques. © 2025 IEEE.

...

7.Granularity-Aware Contrastive Learning for Fine-Grained Action Recognition

关键词：
Artificial intelligence;Classification (of information);Contrastive Learning;Learning systems;Action recognition;Fine grained;Language model;Learning paradigms;Performance;Pre-training;Target labels;Video representation learning;Video representations;Vision-language model

Zhang, Hailun;Wang, Xinrui;Zhao, Qijun
《2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025》
2025年
April 6, 2025 - April 11, 2025
Hyderabad, India
会议

The contrastive learning paradigm has been widely used for image-language pre-training and extended to videotext tuning. These approaches aim to maximize the similarity between positive sample pairs while minimizing that of negative ones through an alignment objective. Their performance is highly affected by the definition of positive and negative pairs which depends on the granularity of label classification. This effect is particularly apparent in video action recognition, where different fine-grained actions may belong to a shared coarse label. Therefore, indiscriminately treating a video sample and labels that are not identical at the fine-grained level but share the same coarse label as negative pairs leads to pushing the sample apart from the cluster of its basic coarse action. Such conflict can potentially prevent the model from pulling the sample and its target label closer. For a balanced understanding of coarse and fine-grained distinctions, we propose the Granularity- Aware Contrastive Learning (GACon) framework to improve contrastive learning for fine-grained action recognition. This is achieved through (i) a refined definition of sample-label relation and alignment objectives, and (ii) the exchange of coarse and finegrained information between two granularity-distinct experts. Experiments on four benchmarks of fine-grained action recognition show the superiority of our proposed GACon compared to existing approaches. © 2025 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.

...

8.Hierarchical Generative Network for Face Morphing Attacks

关键词：
Computer vision;Face recognition;Image enhancement;Attack methods;Face Morphing;Face recognition systems;Facial regions;Global consistency;Global informations;Human observers;Morphing;Multiple identities;System verifications

He, Zuyuan;Deng, Zongyong;He, Qiaoyun;Zhao, Qijun
《18th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2024》
2024年
May 27, 2024 - May 31, 2024
Istanbul, Turkey
会议

Face morphing attacks circumvent face recognition systems (FRSs) by creating a morphed image that contains multiple identities. However, existing face morphing attack methods either sacrifice image quality or compromise the identity preservation capability. Consequently, these attacks fail to bypass FRSs verification well while still managing to deceive human observers. These methods typically rely on global information from contributing images, ignoring the detailed information from effective facial regions. To address the above issues, we propose a novel morphing attack method to improve the quality of morphed images and better preserve the contributing identities. Our proposed method leverages the hierarchical generative network to capture both local detailed and global consistency information. Additionally, a mask-guided image blending module is dedicated to removing artifacts from areas outside the face to improve the image's visual quality. The proposed attack method is compared to state-of-the-art methods on three public datasets in terms of FRSs' vulnerability, attack detectability, and image quality. The results show our method's potential threat of deceiving FRSs while being capable of passing multiple morphing attack detection (MAD) scenarios. © 2024 IEEE.

...

9.GenUDC: High Quality 3D Mesh Generation with Unsigned Dual Contouring Representation

关键词：
3D modeling;Complex networks;Contour followers;Digital elevation model;3d generation;3d mesh generations;Complexes structure;Contouring;Diffusion model;Dual contouring;High quality;Mesh;Mesh representation;Tetrahedral grids

Wang, Ruowei;Li, Jiaqi;Zeng, Dan;Ma, Xueqi;Xu, Zixiang;Zhang, Jianwei;Zhao, Qijun
《32nd ACM International Conference on Multimedia, MM 2024》
2024年
October 28, 2024 - November 1, 2024
Melbourne, VIC, Australia
会议

Generating high-quality meshes with complex structures and realistic surfaces is the primary goal of 3D generative models. Existing methods typically employ sequence data or deformable tetrahedral grids for mesh generation. However, sequence-based methods have difficulty producing complex structures with many faces due to memory limits. The deformable tetrahedral grid-based method MeshDiffusion fails to recover realistic surfaces due to the inherent ambiguity in deformable grids. We propose the GenUDC framework to address these challenges by leveraging the Unsigned Dual Contouring (UDC) as the mesh representation. UDC discretizes a mesh in a regular grid and divides it into the face and vertex parts, recovering both complex structures and fine details. As a result, the one-to-one mapping between UDC and mesh resolves the ambiguity problem. In addition, GenUDC adopts a two-stage, coarse-to-fine generative process for 3D mesh generation. It first generates the face part as a rough shape and then the vertex part to craft a detailed shape. Extensive evaluations demonstrate the superiority of UDC as a mesh representation and the favorable performance of GenUDC in mesh generation. The code and trained models are available at https://github.com/TrepangCat/GenUDC. © 2024 ACM.

...

10.低标注成本的人群计数关键技术研究

关键词：
标注成本;人群计数;点标注;区域人数标注;无标注数据

刘雨婷
指导老师：四川大学杨红雨
0年
学位论文

人群计数作为智能视频监控系统的重要任务之一,在公共安全及商业领域都有十分重要的应用价值,近年来已经成为机器视觉和人工智能领域的研究热点。其主要目标是针对人群场景的输入图像,估计出场景的总人数。人群计数技术可以自动、高效地辅助公共场所中人群监管,预防人群拥堵、踩踏等异常事件的发生。同时,该技术也可以应用到其他相关领域,如车辆计数,城市规划、生态资源调配。近年来,基于深度学习的人群计数技术取得了显著的发展和进步,然而,当前大部分研究工作都是以提升人群计数算法在人群互遮挡、多尺度、非均匀人群分布等视觉挑战下的计数准确率为目标,较少探讨人群计数算法的数据标注成本与计数准确率的权衡问题。目前的人群计数算法的数据标注方式是逐一详尽地标记出场景中全部人员,对于上千人场景的密集人群图片,这样的标注方式极其耗时、耗力。本文主要针对人群计数算法的高标注成本问题进行了深入分析,重点研究了利用低标注成本的点标注、区域人数标注及无标注数据有效实现人群计数。概括而言,本文的主要研究成果包括:（1）提出“点入框出”人群检测及计数算法。基于密度图回归的人群计数方法无法获得人员个体的尺度及位置输出,而在诸如人群追踪、仿真、异常行为预测等高层人群分析任务中人员个体的位置及尺度至关重要。目标检测方法可以获得人员个体的位置及尺度输出,然而,检测方法依赖于复杂、高成本的矩形框式标注。因此本文提出了使用简单、低成本的人头点式标注的“点入框出”人群检测及计数算法框架,该算法能够准确预测出人群场景中个体的尺度及位置并统计人群数目。该算法有效地利用人群场景先验知识,首先由人头点式弱标注生成人头矩形框式伪标注,再由伪标注在线更新,迭代选取较准确的矩形框伪标注,联合局部尺度约束回归损失函数及课程学习策略训练检测网络。多个人群计数数据集上的实验结果证明了本文提出算法的有效性。此外,在WIDER FACE人脸及TRANCOS车辆等目标计数数据集的实验结果证明了本文提出方法的通用性。（2）提出基于概率有序分类的人群计数算法。现有人群计数方法使用人头点或人头矩形框形式的实例级数据标注,实际部署时不高效。本文因此设计了基于概率有序分类的人群计数算法框架,该框架将子区域人数转化为密度等级类别学习分类网络。算法利用类别间潜在的序次知识,用序次先验约束结合隐表示概率建模,优化分类网络隐表示的学习;算法利用人数统计知识,提出可学习权重幅值分类器处理样本不平衡分布时的有偏学习问题;算法测试时将分类网络的预测转化为人数值实现人群计数。在Shanghai Tech,UCF-QNRF等通用人群计数数据集的实验结果证明了本文提出方法的有效性。（3）提出基于检测-回归双向知识迁移的跨域人群计数算法。现有的人群计数方法直接迁移到未见场合（目标域）数据上时性能大幅下降,为提升方法的跨域表现同时避免陷入高代价的数据标注困局,本文研究使用无标注目标域数据的跨域人群计数新方法。提出基于检测-回归双向知识迁移的跨域人群计数算法框架,该算法框架充分利用了检测及回归两类计数模型的互补性,通过在源域数据上建模检测及回归模型间知识的双向变换过程,在目标域数据上迁移检测及回归模型的互补知识的方式,提升它们在目标域的计数性能。本文首先分析了检测及回归两类模型输出结果的互补性质,然后详细探讨了源域上检测及回归模型输出结果间双向变换的求解方法,及目标域上实现检测及回归模型互补知识迁移的学习方法。多个数据集的跨域实验结果证明了本文提出方法的有效性。

...

排序方式：时间相关性
显示方式：列表摘要