复杂场景下三维人脸的重建与识别研究
项目来源
项目主持人
项目受资助机构
项目编号
立项年度
立项时间
研究期限
项目级别
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
项目受资助省
项目结题报告(全文)
1.QEMesh: Employing A Quadric Error Metrics-Based Representation for Mesh Generation
- 关键词:
- Computer aided design;Decoding;Diffusion;Distributed computer systems;Errors;Mobile telecommunication systems;Polonium compounds;Three dimensional computer graphics;3D content;3d generation;Content creation;Diffusion model;High quality;Local geometry;Mesh;Metric matrix;Quadric error metrics;Shape representation
- Li, Jiaqi;Wang, Ruowei;Liu, Yu;Zhao, Qijun
- 《2025 IEEE International Conference on Multimedia and Expo, ICME 2025》
- 2025年
- June 30, 2025 - July 4, 2025
- Nantes, France
- 会议
Mesh generation plays a crucial role in 3D content creation, as mesh is widely used in various industrial applications. Recent works have achieved impressive results but still face several issues, such as unrealistic patterns or pits on surfaces, thin parts missing, and incomplete structures. Most of these problems stem from the choice of shape representation or the capabilities of the generative network. To alleviate these, we extend PoNQ, a Quadric Error Metrics (QEM)-based representation, and propose a novel model, QEMesh, for high-quality mesh generation. PoNQ divides the shape surface into tiny patches, each represented by a point with its normal and QEM matrix, which preserves fine local geometry information. In our QEMesh, we regard these elements as generable parameters and design a unique latent diffusion model containing a novel multi-decoder VAE for PoNQ parameters generation. Given the latent code generated by the diffusion model, three parameter decoders produce several PoNQ parameters within each voxel cell, and an occupancy decoder predicts which voxel cells containing parameters to form the final shape. Extensive evaluations demonstrate that our method generates results with watertight surfaces and is comparable to state-of-the-art methods in several main metrics. © 2025 IEEE.
...2.Identity-Agnostic Learning for Deepfake Face Detection
- 关键词:
- ;
- Zhou, Xuan;Deng, Zongyong;Zhao, Qijun
- 《2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025》
- 2025年
- April 6, 2025 - April 11, 2025
- Hyderabad, India
- 会议
Despite the promising results obtained by existing deepfake face detection methods for within-dataset detection, they often fail to generalize effectively to new datasets. We hypothesize that identity, a significant feature in facial recognition, is a key factor affecting deepfake detection models' cross-dataset performance. In the feature space learned by a real/fake classifier, facial features may cluster based on identity rather than their authenticity, which undermines the classifier's ability to distinguish between real and fake images. This paper introduces a novel training approach called Identity-Agnostic Learning (IAL) for deepfake face detection. IAL trains the detection model with identity-agnostic manner. It thus guides model to pay attention to the identity-irrelevant features. Experimental results demonstrate that our method effectively enhances the overall generalizability of deepfake face detection models. © 2025 IEEE.
...3.MTFusion: Reconstructing Any 3D Object from Single Image Using Multi-word Textual Inversion
- 关键词:
- 3D modeling;Image texture;Three dimensional computer graphics;3D models;3D object;3D reconstruction;3d-modeling;Diffusion model;Multi-word;Single images;Standing problems;Textual description;Textual inversion
- Liu, Yu;Wang, Ruowei;Li, Jiaqi;Xu, Zixiang;Zhao, Qijun
- 《7th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2024》
- 2025年
- October 18, 2024 - October 20, 2024
- Urumqi, China
- 会议
Reconstructing 3D models from single-view images is a long-standing problem in computer vision. The latest advances for single-image 3D reconstruction extract a textual description from the input image and further utilize it to synthesize 3D models. However, existing methods focus on capturing a single key attribute of the image (e.g., object type, artistic style) and fail to consider the multi-perspective information required for accurate 3D reconstruction, such as object shape and material properties. Besides, the reliance on Neural Radiance Fields hinders their ability to reconstruct intricate surfaces and texture details. In this work, we propose MTFusion, which leverages both image data and textual descriptions for high-fidelity 3D reconstruction. Our approach consists of two stages. First, we adopt a novel multi-word textual inversion technique to extract a detailed text description capturing the image’s characteristics. Then, we use this description and the image to generate a 3D model with FlexiCubes. Additionally, MTFusion enhances FlexiCubes by employing a special decoder network for Signed Distance Functions, leading to faster training and finer surface representation. Extensive evaluations demonstrate that our MTFusion surpasses existing image-to-3D methods on a wide range of synthetic and real-world images. Furthermore, the ablation study proves the effectiveness of our network designs. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
...4.MUPO-Net: A Multilevel Dual-domain Progressive Enhancement Network with Embedded Attention for CT Metal Artifact Reduction
- 关键词:
- ;
- Yao, Xiaoli;Tan, Jia;Deng, Zijian;Xiong, Deng;Zhao, Qijun;Wu, Min
- 《2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025》
- 2025年
- April 6, 2025 - April 11, 2025
- Hyderabad, India
- 会议
Metal implants in patients cause severe streaking artifacts in computed tomography (CT) images, significantly compromising image quality. Deep learning methods have been successfully applied to metal artifact reduction (MAR) in CT, but often result in overly smooth images, failing to reconstruct complex details accurately. In this paper, we propose a multilevel dual-domain progressive enhancement network with embedded attention for MAR, termed MUPO-Net. Our approach constructs a Contrast Weight Mapping (CWM) module that generates a weighted heatmap, allocating weights to different regions based on the influence of metal artifacts, and an ASR-Net (Attention-Embedded Sinogram Restoration Network) that utilizes these weights to better remove artifacts in sinogram domain. Additionally, an Image Detail Enhancement Network (IDE-Net) is proposed to restore fine texture details in CT images through multi-scale feature fusion. Extensive experiments on both synthetic and clinical datasets demonstrate the superior effectiveness of MUPO-Net compared to the state-of-the-art MAR techniques. © 2025 IEEE.
...5.Granularity-Aware Contrastive Learning for Fine-Grained Action Recognition
- 关键词:
- Artificial intelligence;Classification (of information);Contrastive Learning;Learning systems;Action recognition;Fine grained;Language model;Learning paradigms;Performance;Pre-training;Target labels;Video representation learning;Video representations;Vision-language model
- Zhang, Hailun;Wang, Xinrui;Zhao, Qijun
- 《2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025》
- 2025年
- April 6, 2025 - April 11, 2025
- Hyderabad, India
- 会议
The contrastive learning paradigm has been widely used for image-language pre-training and extended to videotext tuning. These approaches aim to maximize the similarity between positive sample pairs while minimizing that of negative ones through an alignment objective. Their performance is highly affected by the definition of positive and negative pairs which depends on the granularity of label classification. This effect is particularly apparent in video action recognition, where different fine-grained actions may belong to a shared coarse label. Therefore, indiscriminately treating a video sample and labels that are not identical at the fine-grained level but share the same coarse label as negative pairs leads to pushing the sample apart from the cluster of its basic coarse action. Such conflict can potentially prevent the model from pulling the sample and its target label closer. For a balanced understanding of coarse and fine-grained distinctions, we propose the Granularity- Aware Contrastive Learning (GACon) framework to improve contrastive learning for fine-grained action recognition. This is achieved through (i) a refined definition of sample-label relation and alignment objectives, and (ii) the exchange of coarse and finegrained information between two granularity-distinct experts. Experiments on four benchmarks of fine-grained action recognition show the superiority of our proposed GACon compared to existing approaches. © 2025 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
...6.Hierarchical Generative Network for Face Morphing Attacks
- 关键词:
- Computer vision;Face recognition;Image enhancement;Attack methods;Face Morphing;Face recognition systems;Facial regions;Global consistency;Global informations;Human observers;Morphing;Multiple identities;System verifications
- He, Zuyuan;Deng, Zongyong;He, Qiaoyun;Zhao, Qijun
- 《18th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2024》
- 2024年
- May 27, 2024 - May 31, 2024
- Istanbul, Turkey
- 会议
Face morphing attacks circumvent face recognition systems (FRSs) by creating a morphed image that contains multiple identities. However, existing face morphing attack methods either sacrifice image quality or compromise the identity preservation capability. Consequently, these attacks fail to bypass FRSs verification well while still managing to deceive human observers. These methods typically rely on global information from contributing images, ignoring the detailed information from effective facial regions. To address the above issues, we propose a novel morphing attack method to improve the quality of morphed images and better preserve the contributing identities. Our proposed method leverages the hierarchical generative network to capture both local detailed and global consistency information. Additionally, a mask-guided image blending module is dedicated to removing artifacts from areas outside the face to improve the image's visual quality. The proposed attack method is compared to state-of-the-art methods on three public datasets in terms of FRSs' vulnerability, attack detectability, and image quality. The results show our method's potential threat of deceiving FRSs while being capable of passing multiple morphing attack detection (MAD) scenarios. © 2024 IEEE.
...7.GenUDC: High Quality 3D Mesh Generation with Unsigned Dual Contouring Representation
- 关键词:
- 3D modeling;Complex networks;Contour followers;Digital elevation model;3d generation;3d mesh generations;Complexes structure;Contouring;Diffusion model;Dual contouring;High quality;Mesh;Mesh representation;Tetrahedral grids
- Wang, Ruowei;Li, Jiaqi;Zeng, Dan;Ma, Xueqi;Xu, Zixiang;Zhang, Jianwei;Zhao, Qijun
- 《32nd ACM International Conference on Multimedia, MM 2024》
- 2024年
- October 28, 2024 - November 1, 2024
- Melbourne, VIC, Australia
- 会议
Generating high-quality meshes with complex structures and realistic surfaces is the primary goal of 3D generative models. Existing methods typically employ sequence data or deformable tetrahedral grids for mesh generation. However, sequence-based methods have difficulty producing complex structures with many faces due to memory limits. The deformable tetrahedral grid-based method MeshDiffusion fails to recover realistic surfaces due to the inherent ambiguity in deformable grids. We propose the GenUDC framework to address these challenges by leveraging the Unsigned Dual Contouring (UDC) as the mesh representation. UDC discretizes a mesh in a regular grid and divides it into the face and vertex parts, recovering both complex structures and fine details. As a result, the one-to-one mapping between UDC and mesh resolves the ambiguity problem. In addition, GenUDC adopts a two-stage, coarse-to-fine generative process for 3D mesh generation. It first generates the face part as a rough shape and then the vertex part to craft a detailed shape. Extensive evaluations demonstrate the superiority of UDC as a mesh representation and the favorable performance of GenUDC in mesh generation. The code and trained models are available at https://github.com/TrepangCat/GenUDC. © 2024 ACM.
...8.Animal Pose Refinement in 2D Images with 3D Constraints
- 关键词:
- Computer vision;Random errors;2D images;Animal images;Complex background;Learn+;Pose corresponding;Pose prior knowledge;Pose refinement;Pose-estimation;Synthetic data;Wild animals
- Dai, Xiaowei;Li, Shuiwang;Zhao, Qijun;Yang, Hongyu
- 《33rd British Machine Vision Conference Proceedings, BMVC 2022》
- 2022年
- November 21, 2022 - November 24, 2022
- London, United kingdom
- 会议
Animal pose has many potential applications in various fields. However, uncontrollable illumination, complex backgrounds and random occlusions in in-the-wild animal images often lead to large errors in pose estimation. To address this problem, we propose a method for refining the initial animal pose with 3D prior constraints. First, we learn a 3D pose dictionary from synthetic data with each atom providing 3D pose prior knowledge. Then, the 3D pose dictionary is used to linearly represent the potential 3D pose corresponding to the 2D pose that has been initially estimated for the animal in 2D image. Finally, the representation coefficients are optimized to minimize the difference between the initially-estimated 2D pose and the 2D-projection of the potential 3D pose. Moreover, to deal with the data scarcity, we construct 2D and 3D animal pose datasets, which are used to evaluate algorithm performance and learn 3D pose dictionary, respectively. Experimental results show that the proposed method is capable to utilize 3D pose knowledge well and is effective in improving 2D animal pose estimation. © 2022. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.
...9.Weakly Labeled Semi-Supervised Sound Event Detection with Multi-Scale Residual Attention
- 关键词:
- Classification (of information);Computer vision;Convolutional neural networks;Events classification;Feature information;Frequency scale;Multi-scale features;Multi-scales;Multiple scale;Semi-supervised;Sound event detection;Sound events;Time frequency
- Tang, Maolin;Zhao, Qijun;Liu, Zhengxi
- 《2021 International Joint Conference on Neural Networks, IJCNN 2021》
- 2021年
- July 18, 2021 - July 22, 2021
- Virtual, Shenzhen, China
- 会议
Different sound events have different time-frequency scale characteristics, which are useful for sound event detection (SED), but not yet effectively exploited. In this paper, we aim to adaptively select multi-scale feature information that is conducive to classification of sound events. We propose a novel module, namely multi-scale residual attention (MSRA), which is composed of multi-scale residual convolutional block and selective multiscale attention block. Multi-scale residual convolution block extracts features at multiple scales, among which selective multiscale attention block adaptively selects the features that are helpful for event classification. Experimental results prove that our method outperforms the state-of-the-art model by 3.7% on Task 4 of the DCASE 2018 Challenge dataset.© 2021 IEEE....10.Towards Silhouette-Aware Human Detection in Depth Images
- 关键词:
- Automata theory;Learning systems;Textures;Deep learning;Image texture;Dataset;Depth;Depth image;Detection;Detection models;Human detection;RGB images;Silhouette-aware;Synthesis method;Training data
- Luo, Huan;Li, Shuiwang;Zhao, Qijun
- 《2021 International Joint Conference on Neural Networks, IJCNN 2021》
- 2021年
- July 18, 2021 - July 22, 2021
- Virtual, Shenzhen, China
- 会议
Detecting humans in depth images attracts increasing attention thanks to the advantage of depth modality in privacy protection. However, this task is challenging because the number of available training data is limited to date and the depth images, unlike RGB images, are short of rich texture features. Although many image synthesis methods and deep learning methods have been proposed and proven successful, especially for RGB images, it is unsatisfactory to directly apply them to depth images because of the intrinsical differences between the modalities. In view of that silhouette of humans becomes an essential discriminative cue in depth images in the absence of texture information, which is not well utilized by existing methods, in this paper, we thus propose a silhouette-aware network (SAN) to train the detection model and a depth image synthesis method that represses spurious silhouette to augment the training data. Besides, to further increase the diversity of training data, we collect a dataset of scene depth images (SDI), including both indoor and outdoor scenes, as background images when synthesizing training data. Experimental results show that (i) our proposed synthesis method can generate more realistic depth images and thus benefits the training of detection models, (ii) our collected SDI dataset can effectively enhance data diversity and thus improves the effectiveness of the obtained detection models, and (iii) our proposed silhouette-aware network (SAN) can effectively boost the human detection accuracy. Our dataset is available at https://pan.baidu.com/s/13hpuziavBNjS8KATClpXww, password: r9id© 2021 IEEE....
