基于几何代数的生理机能智能检测与评估系统的研究
项目来源
项目主持人
项目受资助机构
立项年度
立项时间
项目编号
项目级别
研究期限
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
项目受资助省
项目结题报告(全文)
1.Dynamic prototype with discriminative representation for rapid adaptation in new organ segmentation
- 关键词:
- Image segmentation;Learning systems;Attention mechanisms;Discriminative representation;Few-shot segmentation;Medical domains;Organ segmentation;Prototype learning;Prototype-based learning;Rapid adaptation;Self-attention mechanism;Shot segmentation
- Wang, Hailing;Chen, Yu;Zhang, Xinyue;Cao, Guitao;Cao, Wenming
- 《Pattern Recognition》
- 2026年
- 173卷
- 期
- 期刊
Recent work in label-efficient prototype-based learning have demonstrated significant potential for rapid adaptation in new organ segmentation. However, a prevalent challenge in prototypical extraction within the medical domain is semantic bias. To address this issue, we propose a Dynamic Prototype with Discriminative Representation Network (DPDRNet), to enhance the effectiveness of semantic class prototype for new organ. Specifically, we introduce a self-attention mechanism to generate dynamic prototype, enhancing the efficient utilization of local information. This is accomplished by capturing interdependencies among pixel-level prototypes from limited labeled samples. Subsequently, we design a prototype contrastive learning method to maintain the discriminative representation of dynamic prototype in the high-level feature space. This method enhances the correlation between dynamic prototype and foreground features while simultaneously increasing the distinction from background features. By incorporating a self-attention mechanism with contrastive learning, the proposed dynamic prototype exhibits enhanced generalization capabilities, facilitating more precise segmentation of new organ structures. Experimental results demonstrate that our method achieves effective performance on Cardiac and Abdominal MRI segmentation tasks. © 2025 Elsevier Ltd
...2.Dual-decoder collaborative learning with multi-hybrid view augmentation for self-supervised 3D action recognition
- 关键词:
- Skeleton-based action recognition; Self-supervised representationlearning; Contrastive learning; Masked autoencoder; Masked skeletonmodeling
- Cao, Wenming;Wu, Yingfei;Yin, Xinpeng
- 《PATTERN RECOGNITION》
- 2026年
- 172卷
- 期
- 期刊
Self-supervised methods, including contrastive learning and masked skeleton modeling, have demonstrated considerable potential in the field of skeleton-based action recognition. While contrastive learning captures finegrained details at the instance level, masked skeleton modeling emphasizes joint-level features. Recent studies have begun to combine these two approaches. However, existing combination methods primarily focus on integrating the tasks within the skeleton space. Moreover, existing contrastive learning methods often fail to exploit the comprehensive interaction information in skeletal structures, resulting in suboptimal performance when recognizing actions involving multiple individuals. To overcome these limitations, we introduce the Dual-Decoder Collaborative Learning (DDC) with Multi-Hybrid View Augmentation (MHGNA) method, which connects these two tasks across multiple spaces. Specifically, the masked skeleton modeling task provides diverse views for the contrastive learning task in the skeleton space, while the contrastive method aligns the features generated by both tasks within the feature space. We further present an innovative view augmentation method that enhances the model's capacity to understand human interaction relationships by shuffling and replacing data across temporal, spatial, and personal dimensions. Extensive experiments on four downstream tasks across three largescale datasets demonstrate that DDC exhibits stronger representational capabilities compared to state-of-the-art methods. Our code is available at https://github.com/Yingfei-Wu/DDC.
...3.Spectral–spatial representation progressive learning via segmented attention for 3D skeleton-based motion prediction
- 关键词:
- Algebra;Arts computing;Bone;Extraction;Motion estimation;Spectrum analysis;Three dimensional computer graphics;3D skeleton;Feature information;Motion generation;Motion prediction;Progressive learning;Recombination factors;Soft attention;Spatial representations;Spectra's;Spectral–spatial representation
- Cao, Wenming;Zhang, Jianhua;Zhong, Jianqi
- 《Applied Soft Computing》
- 2025年
- 184卷
- 期
- 期刊
Recently, GCNs-based methods have demonstrated impressive performance in human behavior prediction tasks. We believe that human motion modeling can explained as motion correlation extraction from the combination of the active and static motion parts analysis. However, existing methodologies fail to address the issue that feature information associated with static regions may overshadow feature information from dynamic regions, ultimately affecting the extraction of network features. Moreover, the unique low-pass feature pre-retention processing mechanism of GCN on the spectrum will lead to the attitude of some sequences remaining unchanged during the prediction process and further hurt the prediction. In this paper, we propose a Spectral–Spatial Representation Progressive Learning network to solve the problem above. Firstly, we propose a segmented attention block to compare the input observation sequence with the static contrast standard to obtain the motion region and the rest region. Then, we design the Spectrum Deconstruction Recombination Factor block(SDRF) to extract the global bandpass spectrum of human bone joints. The joint features of different regions are encoded by graph convolution and high-frequency feature filter coding based on geometric algebra. Specifically, a spectral–spatial interaction block is presented in each SDRF, focusing on the diversity of motion sequence frequency domain and spatial domain map, and realizes the fine extraction of historical pose sequence features from the two levels of space and spectral domain. Experimental results demonstrate that our approach outperforms state-of-art algorithms by 2.4%, 5.3% and 4.7% in terms of 3D mean per joint position error on Human 3.6M, CMU Mocap and 3DPW datasets, respectively. © 2025
...4.Progressively deeper attention networks for 3D human motion prediction
- 关键词:
- Human motion prediction; Transformer; GCNs; Motion dependencies learning
- Huang, Jiangtao;He, Dong;Cao, Wenming;Zhong, Jianqi
- 《MULTIMEDIA SYSTEMS》
- 2025年
- 31卷
- 5期
- 期刊
Human motion prediction is a significant challenge with broad applications in fields such as robotics, human-computer interaction, and healthcare. Despite the progress achieved by recent deep learning approaches, existing methods often struggle to effectively capture the complex spatial relationships and long-term temporal dependencies inherent in human motion. To address the issue, we propose the Progressive Deeper Attention Network (PDANet), which incorporates multiple GCN-Attention modules of varying depths. This architecture enables the model to extract more comprehensive information from sequential data. Additionally, we enhance the model's performance through two key improvements: (1) the introduction of joint-relative velocity and temporally perturbed features to distinguish complex motion semantics between dynamic and static joints; and (2) the design of a Multi-Dimensional Joint Fusion (MDJF) module, which employs the Gumbel Softmax method to dynamically learn the optimal fusion strategy for multi-semantic sequences. Extensive experiments demonstrate the effectiveness of our model. The proposed approach outperforms state-of-the-art methods by 2.8%, 4.7%, and 18.8% in terms of MPJPE for human motion prediction on the Human3.6M, AMASS, and 3DPW datasets, respectively.
...5.FLCL: Feature-Level Contrastive Learning for Few-Shot Image Classification
- 关键词:
- Contrastive learning; Few shot learning; Training; Feature extraction;Measurement; Metalearning; Data augmentation; Vectors; Data models;Adaptation models; few-shot learning; data augmentation; imageclassification
- Cao, Wenming;Zeng, Jiewen;Liu, Qifan
- 《IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING》
- 2025年
- 13卷
- 3期
- 期刊
Few-shot classification is the task of recognizing unseen classes using a limited number of samples. In this paper, we propose a new contrastive learning method called Feature-Level Contrastive Learning (FLCL). FLCL conducts contrastive learning at the feature level and leverages the subtle relationships between positive and negative samples to achieve more effective classification. Additionally, we address the challenges of requiring a large number of negative samples and the difficulty of selecting high-quality negative samples in traditional contrastive learning methods. For feature learning, we design a Feature Enhancement Coding (FEC) module to analyze the interactions and correlations between nonlinear features, enhancing the quality of feature representations. In the metric stage, we propose a centered hypersphere projection metric to map feature vectors onto the hypersphere, improving the comparison between the support and query sets. Experimental results on four few-shot classification benchmark datasets demonstrate that our method, while simple in design, outperforms previous methods and achieves state-of-the-art performance. A detailed ablation study further confirms the effectiveness of each component of our model.
...6.Linearformer: Tri-Net Multi-Layer DVF Medical Image Registration
- 关键词:
- Angiography;Deep neural networks;Electroencephalography;Functional neuroimaging;Image registration;Linearization;Mammography;Multilayer neural networks;Transillumination;Accurate registration;Brain MRI;Convolutional neural network;Deep learning;Deformable medical image registration;Linearformer;Medical image registration;Multi-layers;Similarity measure;Transformer modeling
- Anwar, Muhammad;Yan, Zhiyue;Cao, Wenming
- 《Expert Systems》
- 2025年
- 42卷
- 7期
- 期刊
In medical imaging, accurate registration is crucial for reliable analysis. While transformer models demonstrate potential, their application to large datasets like OASIS is constrained by substantial memory requirements, quadratic complexity and the challenge of managing complex deformations. To overcome these challenges, Linearformer is introduced, an efficient transformer-based model with Linear-ProbSparse self-attention for optimised time and memory, along with TNM DVF, a Pyramid-based framework for unsupervised non-rigid registration. Evaluated on OASIS and LPBA40 brain MRI datasets, the model outperforms state-of-the-art methods in Dice score and Jacobian metrics, surpassing TransMatch by 0.6% and 1.9% on the two datasets while maintaining a comparable voxel folding percentage. © 2025 John Wiley & Sons Ltd.
...7.Asymmetric Context-Guided Adaptive Alignment Network for Skeleton-Based Action Recognition
- 关键词:
- Skeleton; Image reconstruction; Transformers; Three-dimensionaldisplays; Data models; Adaptation models; Feature extraction;Computational modeling; Solid modeling; Representation learning;Self-supervised learning; skeleton-based action recognition; maskedmodeling; alignment
- Cao, Wenming;Qian, Liangxi;Zhang, Yicha;Li, Xuelong;Yin, Xinpeng
- 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》
- 2025年
- 35卷
- 6期
- 期刊
In skeleton-based action recognition, self-supervised pre-training paradigms have been extensively investigated. Particularly, masked autoencoders-like methods based on masked target reconstruction have pushed the performance of pre-training to a new height, which are committed to choose a better target for reconstruction. In this work, we propose an asymmetric context-guided adaptive alignment network (ACA(2)Net) for self-supervised skeleton-based action recognition by utilizing a transformer-based teacher encoder guiding the student encoder to learn richer action contextual information. To tackle the misalignment from the asymmetry, we devise an adaptive alignment module to better align the student representations to the teacher's. Additionally, considering that the differential operation for temporal motion might cause the prior loss related to the changes of direction, we propose a motion compass-aware masking strategy with fusion prior supplemented by motion and direction intensity. Extensive experiments on NTU-60, NTU-120, and PKU-MMD datasets demonstrate that our proposed ACA(2)Net outperforms previous MAE-like methods.
...8.Progressive Feature Reconstruction Network for Zero-Shot Learning
- 关键词:
- Visualization; Semantics; Image reconstruction; Feature extraction; Zeroshot learning; Whales; Training; Vectors; Data models; Benchmarktesting; Zero-shot learning; feature reconstruction; attributeinformation
- Hu, Linchun;Cao, Wenming;Zhang, Zhenqi;Liang, Yuchuang
- 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》
- 2025年
- 35卷
- 6期
- 期刊
Zero-shot learning (ZSL) aims to transfer the knowledge learned in the seen classes to the unseen classes through semantic knowledge. However, to ensure the model's versatility on different datasets, existing methods divide the image into blocks of the same size, resulting in the loss of information between attributes. More importantly, existing methods ignore that not every image contains all attributes corresponding to that class. In this paper, we propose a progressive feature reconstruction network, called PFRN. PFRN consists of an attribute relation sub-net and an attention-based feature reconstruction sub-net. Specifically, the attribute relation sub-net first adopts the attribute-related region module to obtain the attribute-related regions in the visual features, which are input to the attribute relation discovery module to find the relationships between attributes. The attention-based feature reconstruction sub-net obtains the fine-grained features based on attributes by the attribute attention module and uses the feature reconstruction module to randomly lose some attributes to reconstruct the new visual features of the missing attributes. The new visual features are fed back into the network for training. Finally, the attribute information learned by the attribute relation sub-net is fused to the visual embedding learned by the attention-based features reconstruction sub-net, and the ideal visual semantic interaction is performed with the semantic vector classified by ZSL. Extensive experiments on three ZSL benchmark datasets demonstrate the significant generalization performance of our proposed method over the state-of-the-art methods.
...9.Optimizing human motion prediction through decoupled motion spatio-temporal trends
- 关键词:
- 3D human motion forecasting; Deep learning; Time series
- Pan, Huan;Ji, Ruiya;Cao, Wenming;Huang, Zhao;Zhong, Jianqi
- 《MULTIMEDIA SYSTEMS》
- 2025年
- 31卷
- 2期
- 期刊
Recent advancements in deep learning and artificial intelligence have underscored the importance of human motion prediction in fields such as intelligent robotics, autonomous driving, and human-computer interaction. Current human motion prediction methods primarily focus on network structure and feature extraction innovations, often overlooking the underlying logic of spatio-temporal changes in motion data. This oversight can result in potential conflicts within the coupled modeling of spatial and temporal dependencies, potentially obscuring the spatio-temporal logic of human motion. In this paper, we address this issue by decoupling the spatio-temporal features, employing time series modeling for preliminary prediction, and introducing velocity data as a learning branch to capture joint dependencies. This velocity-based information more clearly represents quantitative indices related to human movement, enhancing the model's pattern recognition capability. We map the trajectory change rules to the joint change trends for future moments, thereby refining the prediction results. Additionally, we enhance local semantic information through a patching method and ensure the independence of multi-scale representations of spatial and temporal dimensions using a two-branch framework. We propose a multi-layer perceptron (MLP)-based network structure, DCMixer, designed to learn multi-scale dynamic information and perform internal feature extraction. Our approach achieves spatio-temporal fusion with greater kinematic logic, significantly improving model performance. We evaluated our method on three public datasets, demonstrating superior prediction performance compared to state-of-the-art methods. The code is publicly available at https://github.com/Dabanshou/STTSN.
...10.STHRA: selective transformer hierarchical reciprocal attention-based deformable medical image registration
- 关键词:
- Similarity measures; Deformable medical image registration;Convolutional neural networks; Transformer; Deep learning
- Anwar, Muhammad;Yan, Zhiyue;Cao, Wenming;Hussain, Naeem
- 《MULTIMEDIA SYSTEMS》
- 2025年
- 31卷
- 2期
- 期刊
Deformable medical image registration is an essential process requiring characteristics to be extracted and aligned from two images to provide exact correspondence, a necessity for accurate registration. The Transformer can improve predictive capacities, as demonstrated by recent experiments. Nevertheless, there are significant obstacles when directly applying it to big databases like OASIS. These include significant memory needs, quadratic temporal complexity, and intrinsic limitations in the encoder-decoder architecture. Even with the development of advanced registration models, achieving precise and effective deformable registration is still difficult, particularly in situations with significant volumetric deformations. We use a Selective Transformer (ST) and Hierarchical Reciprocal Attention (HRA) to address these challenges. To minimize computing complexity and optimize resource allocation for more effective processing, ST assists in calculating the diversity of voxels and chooses those with a broad range of diversity. Using an encoder-decoder architecture, HRA uses high-level features to link layers, allowing information to flow from a high to a lower level and vice versa. We use Reciprocal Attention (RA) instead of skip connections to facilitate the flow of information between the feature extractor and the feature reconstructor. This method maximizes the model's capacity to accurately capture and anticipate deformations by thoroughly integrating complex spatial data and abstract representations. We utilized two well-known pre-aligned brain MRI datasets, OASIS and LPBA40, to benchmark our model against other established registration techniques. Our evaluations frequently demonstrate that our network surpasses state-of-the-art methods across various metrics, including Dice score and Jacobian.
...
