基于几何代数的生理机能智能检测与评估系统的研究
项目来源
项目主持人
项目受资助机构
立项年度
立项时间
项目编号
项目级别
研究期限
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
项目受资助省
项目结题报告(全文)
1.Dynamic prototype with discriminative representation for rapid adaptation in new organ segmentation
- 关键词:
- Image segmentation;Learning systems;Attention mechanisms;Discriminative representation;Few-shot segmentation;Medical domains;Organ segmentation;Prototype learning;Prototype-based learning;Rapid adaptation;Self-attention mechanism;Shot segmentation
- Wang, Hailing;Chen, Yu;Zhang, Xinyue;Cao, Guitao;Cao, Wenming
- 《Pattern Recognition》
- 2026年
- 173卷
- 期
- 期刊
Recent work in label-efficient prototype-based learning have demonstrated significant potential for rapid adaptation in new organ segmentation. However, a prevalent challenge in prototypical extraction within the medical domain is semantic bias. To address this issue, we propose a Dynamic Prototype with Discriminative Representation Network (DPDRNet), to enhance the effectiveness of semantic class prototype for new organ. Specifically, we introduce a self-attention mechanism to generate dynamic prototype, enhancing the efficient utilization of local information. This is accomplished by capturing interdependencies among pixel-level prototypes from limited labeled samples. Subsequently, we design a prototype contrastive learning method to maintain the discriminative representation of dynamic prototype in the high-level feature space. This method enhances the correlation between dynamic prototype and foreground features while simultaneously increasing the distinction from background features. By incorporating a self-attention mechanism with contrastive learning, the proposed dynamic prototype exhibits enhanced generalization capabilities, facilitating more precise segmentation of new organ structures. Experimental results demonstrate that our method achieves effective performance on Cardiac and Abdominal MRI segmentation tasks. © 2025 Elsevier Ltd
...2.Dual-decoder collaborative learning with multi-hybrid view augmentation for self-supervised 3D action recognition
- 关键词:
- Skeleton-based action recognition; Self-supervised representationlearning; Contrastive learning; Masked autoencoder; Masked skeletonmodeling
- Cao, Wenming;Wu, Yingfei;Yin, Xinpeng
- 《PATTERN RECOGNITION》
- 2026年
- 172卷
- 期
- 期刊
Self-supervised methods, including contrastive learning and masked skeleton modeling, have demonstrated considerable potential in the field of skeleton-based action recognition. While contrastive learning captures finegrained details at the instance level, masked skeleton modeling emphasizes joint-level features. Recent studies have begun to combine these two approaches. However, existing combination methods primarily focus on integrating the tasks within the skeleton space. Moreover, existing contrastive learning methods often fail to exploit the comprehensive interaction information in skeletal structures, resulting in suboptimal performance when recognizing actions involving multiple individuals. To overcome these limitations, we introduce the Dual-Decoder Collaborative Learning (DDC) with Multi-Hybrid View Augmentation (MHGNA) method, which connects these two tasks across multiple spaces. Specifically, the masked skeleton modeling task provides diverse views for the contrastive learning task in the skeleton space, while the contrastive method aligns the features generated by both tasks within the feature space. We further present an innovative view augmentation method that enhances the model's capacity to understand human interaction relationships by shuffling and replacing data across temporal, spatial, and personal dimensions. Extensive experiments on four downstream tasks across three largescale datasets demonstrate that DDC exhibits stronger representational capabilities compared to state-of-the-art methods. Our code is available at https://github.com/Yingfei-Wu/DDC.
...3.Dual Knowledge-Aware Guidance forSource-Free Domain Adaptive Fundus Image Segmentation
- 关键词:
- Balancing;Calibration;Domain Knowledge;Knowledge management;Knowledge transfer;Semantics;Boundary information;Domain adaptation;Domain-invariant knowledge;Domain-specific knowledge;Fundus image;Images segmentations;Pseudo-label calibration;Source models;Source-free domain adaptation;Target domain
- Chen, Yu;Wang, Hailing;Wu, Chunwei;Cao, Guitao
- 《28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025》
- 2026年
- September 23, 2025 - September 27, 2025
- Daejeon, Korea, Republic of
- 会议
Source-free domain adaptation (SFDA), where only a pre-trained source model is available to adapt to the target domain, has gained widespread application in the medical field. Most existing methods overlook low-quality pseudo-labels, i.e., pseudo-labels with boundary semantic confusion, when learning target domain-specific knowledge, leading to the loss of crucial boundary information. Furthermore, focusing solely on the specific knowledge can drive the model shifts in an uncontrollable direction, resulting in model degradation. To address these issues, we propose Dual Knowledge-aware Guidance (DKG), a novel SFDA method that integrates domain-specific knowledge with domain-invariant knowledge to improve transfer performance. Specifically, the pseudo-label calibration scheme is proposed to reduce semantic bias in high-uncertainty pixels, preserving the boundary information of target domain-specific knowledge. To ensure stable training, we propose a domain-invariant knowledge-based loss strategy, leveraging a confidence-guided mechanism and a consistency constraint. Additionally, we also introduce a dynamic balancing loss to address class imbalance. Extensive experiments on cross-domain fundus image segmentation show that DKG achieves state-of-the-art performance. Code is available at https://github.com/Hanshuqian/DKG © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
...4.Spectral–spatial representation progressive learning via segmented attention for 3D skeleton-based motion prediction
- 关键词:
- Algebra;Arts computing;Bone;Extraction;Motion estimation;Spectrum analysis;Three dimensional computer graphics;3D skeleton;Feature information;Motion generation;Motion prediction;Progressive learning;Recombination factors;Soft attention;Spatial representations;Spectra's;Spectral–spatial representation
- Cao, Wenming;Zhang, Jianhua;Zhong, Jianqi
- 《Applied Soft Computing》
- 2025年
- 184卷
- 期
- 期刊
Recently, GCNs-based methods have demonstrated impressive performance in human behavior prediction tasks. We believe that human motion modeling can explained as motion correlation extraction from the combination of the active and static motion parts analysis. However, existing methodologies fail to address the issue that feature information associated with static regions may overshadow feature information from dynamic regions, ultimately affecting the extraction of network features. Moreover, the unique low-pass feature pre-retention processing mechanism of GCN on the spectrum will lead to the attitude of some sequences remaining unchanged during the prediction process and further hurt the prediction. In this paper, we propose a Spectral–Spatial Representation Progressive Learning network to solve the problem above. Firstly, we propose a segmented attention block to compare the input observation sequence with the static contrast standard to obtain the motion region and the rest region. Then, we design the Spectrum Deconstruction Recombination Factor block(SDRF) to extract the global bandpass spectrum of human bone joints. The joint features of different regions are encoded by graph convolution and high-frequency feature filter coding based on geometric algebra. Specifically, a spectral–spatial interaction block is presented in each SDRF, focusing on the diversity of motion sequence frequency domain and spatial domain map, and realizes the fine extraction of historical pose sequence features from the two levels of space and spectral domain. Experimental results demonstrate that our approach outperforms state-of-art algorithms by 2.4%, 5.3% and 4.7% in terms of 3D mean per joint position error on Human 3.6M, CMU Mocap and 3DPW datasets, respectively. © 2025
...5.SS-Mixer: MLP-Based 3D Human Motion Prediction with Spatial-Spectral Attention
- 关键词:
- Complex networks;Convolution;Dynamics;Forecasting;Low pass filters;Mixer circuits;Mixers (machinery);Mixing;Motion capture;Motion estimation ;Network layers;Active motion;Convolutional networks;Graph convolutional network;Human motions;Mixing mechanisms;Motion generation;Motion prediction;Multilayers perceptrons;Spatial-spectral mixing mechanism;Spectral mixing
- Zhang, Jianhua;Zhong, Jianqi;Cao, Wenming
- 《6th Asia-Pacific Conference on Image Processing, Electronics and Computers, IPEC 2025》
- 2025年
- May 16, 2025 - May 18, 2025
- Dalian, China
- 会议
Traditional graph convolutional network (GCN)-based methods for 3D human motion prediction have demonstrated great potential. However, these methods face two critical limitations: first, they require a large number of trainable parameters due to the complex network structure; second, they fail to differentiate between active motion regions and static regions, leading to suboptimal feature extraction. To address these issues, we propose Spatial-Spectral MLPs (SS-Mixer), a novel architecture designed to efficiently capture spatial and spectral features for human motion prediction. The SS-Mixer introduces an attention-based segmentation mechanism to distinguish active motion regions from static regions, allowing the network to prioritize critical features. Furthermore, we decompose the input skeleton into multiple scales, modeling the dynamics of each part independently to enhance feature diversity. By incorporating a hybrid spatial-spectral mixing mechanism, SS-Mixer captures the diversity in motion sequences across both spatial and spectral domains, improving prediction performance. The integration of spectral decomposition into the mixing process addresses the low-pass filtering issue in GCNs, ensuring robust representation learning for dynamic motions. Extensive experiments on three challenging datasets - Human3.6M, and 3DPW - demonstrate the superiority of SS-Mixer. Our model demonstrates outstanding performance in terms of 3D mean per joint position error (MPJPE) across these datasets, achieving significant improvements compared to state-of-the-art methods. These results validate the exceptional ability of SS-Mixer in enhancing prediction accuracy while maintaining computational efficiency. These results highlight the effectiveness of SS-Mixer in balancing computational efficiency and predictive accuracy while addressing the limitations of existing GCN-based approaches. © 2025 IOS Press.
...6.Progressively deeper attention networks for 3D human motion prediction
- 关键词:
- Human motion prediction; Transformer; GCNs; Motion dependencies learning
- Huang, Jiangtao;He, Dong;Cao, Wenming;Zhong, Jianqi
- 《MULTIMEDIA SYSTEMS》
- 2025年
- 31卷
- 5期
- 期刊
Human motion prediction is a significant challenge with broad applications in fields such as robotics, human-computer interaction, and healthcare. Despite the progress achieved by recent deep learning approaches, existing methods often struggle to effectively capture the complex spatial relationships and long-term temporal dependencies inherent in human motion. To address the issue, we propose the Progressive Deeper Attention Network (PDANet), which incorporates multiple GCN-Attention modules of varying depths. This architecture enables the model to extract more comprehensive information from sequential data. Additionally, we enhance the model's performance through two key improvements: (1) the introduction of joint-relative velocity and temporally perturbed features to distinguish complex motion semantics between dynamic and static joints; and (2) the design of a Multi-Dimensional Joint Fusion (MDJF) module, which employs the Gumbel Softmax method to dynamically learn the optimal fusion strategy for multi-semantic sequences. Extensive experiments demonstrate the effectiveness of our model. The proposed approach outperforms state-of-the-art methods by 2.8%, 4.7%, and 18.8% in terms of MPJPE for human motion prediction on the Human3.6M, AMASS, and 3DPW datasets, respectively.
...7.FLCL: Feature-Level Contrastive Learning for Few-Shot Image Classification
- 关键词:
- Contrastive learning; Few shot learning; Training; Feature extraction;Measurement; Metalearning; Data augmentation; Vectors; Data models;Adaptation models; few-shot learning; data augmentation; imageclassification
- Cao, Wenming;Zeng, Jiewen;Liu, Qifan
- 《IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING》
- 2025年
- 13卷
- 3期
- 期刊
Few-shot classification is the task of recognizing unseen classes using a limited number of samples. In this paper, we propose a new contrastive learning method called Feature-Level Contrastive Learning (FLCL). FLCL conducts contrastive learning at the feature level and leverages the subtle relationships between positive and negative samples to achieve more effective classification. Additionally, we address the challenges of requiring a large number of negative samples and the difficulty of selecting high-quality negative samples in traditional contrastive learning methods. For feature learning, we design a Feature Enhancement Coding (FEC) module to analyze the interactions and correlations between nonlinear features, enhancing the quality of feature representations. In the metric stage, we propose a centered hypersphere projection metric to map feature vectors onto the hypersphere, improving the comparison between the support and query sets. Experimental results on four few-shot classification benchmark datasets demonstrate that our method, while simple in design, outperforms previous methods and achieves state-of-the-art performance. A detailed ablation study further confirms the effectiveness of each component of our model.
...8.Linearformer: Tri-Net Multi-Layer DVF Medical Image Registration
- 关键词:
- Angiography;Deep neural networks;Electroencephalography;Functional neuroimaging;Image registration;Linearization;Mammography;Multilayer neural networks;Transillumination;Accurate registration;Brain MRI;Convolutional neural network;Deep learning;Deformable medical image registration;Linearformer;Medical image registration;Multi-layers;Similarity measure;Transformer modeling
- Anwar, Muhammad;Yan, Zhiyue;Cao, Wenming
- 《Expert Systems》
- 2025年
- 42卷
- 7期
- 期刊
In medical imaging, accurate registration is crucial for reliable analysis. While transformer models demonstrate potential, their application to large datasets like OASIS is constrained by substantial memory requirements, quadratic complexity and the challenge of managing complex deformations. To overcome these challenges, Linearformer is introduced, an efficient transformer-based model with Linear-ProbSparse self-attention for optimised time and memory, along with TNM DVF, a Pyramid-based framework for unsupervised non-rigid registration. Evaluated on OASIS and LPBA40 brain MRI datasets, the model outperforms state-of-the-art methods in Dice score and Jacobian metrics, surpassing TransMatch by 0.6% and 1.9% on the two datasets while maintaining a comparable voxel folding percentage. © 2025 John Wiley & Sons Ltd.
...9.Asymmetric Context-Guided Adaptive Alignment Network for Skeleton-Based Action Recognition
- 关键词:
- Skeleton; Image reconstruction; Transformers; Three-dimensionaldisplays; Data models; Adaptation models; Feature extraction;Computational modeling; Solid modeling; Representation learning;Self-supervised learning; skeleton-based action recognition; maskedmodeling; alignment
- Cao, Wenming;Qian, Liangxi;Zhang, Yicha;Li, Xuelong;Yin, Xinpeng
- 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》
- 2025年
- 35卷
- 6期
- 期刊
In skeleton-based action recognition, self-supervised pre-training paradigms have been extensively investigated. Particularly, masked autoencoders-like methods based on masked target reconstruction have pushed the performance of pre-training to a new height, which are committed to choose a better target for reconstruction. In this work, we propose an asymmetric context-guided adaptive alignment network (ACA(2)Net) for self-supervised skeleton-based action recognition by utilizing a transformer-based teacher encoder guiding the student encoder to learn richer action contextual information. To tackle the misalignment from the asymmetry, we devise an adaptive alignment module to better align the student representations to the teacher's. Additionally, considering that the differential operation for temporal motion might cause the prior loss related to the changes of direction, we propose a motion compass-aware masking strategy with fusion prior supplemented by motion and direction intensity. Extensive experiments on NTU-60, NTU-120, and PKU-MMD datasets demonstrate that our proposed ACA(2)Net outperforms previous MAE-like methods.
...10.Progressive Feature Reconstruction Network for Zero-Shot Learning
- 关键词:
- Visualization; Semantics; Image reconstruction; Feature extraction; Zeroshot learning; Whales; Training; Vectors; Data models; Benchmarktesting; Zero-shot learning; feature reconstruction; attributeinformation
- Hu, Linchun;Cao, Wenming;Zhang, Zhenqi;Liang, Yuchuang
- 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》
- 2025年
- 35卷
- 6期
- 期刊
Zero-shot learning (ZSL) aims to transfer the knowledge learned in the seen classes to the unseen classes through semantic knowledge. However, to ensure the model's versatility on different datasets, existing methods divide the image into blocks of the same size, resulting in the loss of information between attributes. More importantly, existing methods ignore that not every image contains all attributes corresponding to that class. In this paper, we propose a progressive feature reconstruction network, called PFRN. PFRN consists of an attribute relation sub-net and an attention-based feature reconstruction sub-net. Specifically, the attribute relation sub-net first adopts the attribute-related region module to obtain the attribute-related regions in the visual features, which are input to the attribute relation discovery module to find the relationships between attributes. The attention-based feature reconstruction sub-net obtains the fine-grained features based on attributes by the attribute attention module and uses the feature reconstruction module to randomly lose some attributes to reconstruct the new visual features of the missing attributes. The new visual features are fed back into the network for training. Finally, the attribute information learned by the attribute relation sub-net is fused to the visual embedding learned by the attention-based features reconstruction sub-net, and the ideal visual semantic interaction is performed with the semantic vector classified by ZSL. Extensive experiments on three ZSL benchmark datasets demonstrate the significant generalization performance of our proposed method over the state-of-the-art methods.
...
