基于几何代数的生理机能智能检测与评估系统的研究
项目来源
项目主持人
项目受资助机构
立项年度
立项时间
项目编号
研究期限
项目级别
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
项目受资助省
项目结题报告(全文)
1.Dual Knowledge-Aware Guidance forSource-Free Domain Adaptive Fundus Image Segmentation
- 关键词:
- Balancing;Calibration;Domain Knowledge;Knowledge management;Knowledge transfer;Semantics;Boundary information;Domain adaptation;Domain-invariant knowledge;Domain-specific knowledge;Fundus image;Images segmentations;Pseudo-label calibration;Source models;Source-free domain adaptation;Target domain
- Chen, Yu;Wang, Hailing;Wu, Chunwei;Cao, Guitao
- 《28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025》
- 2026年
- September 23, 2025 - September 27, 2025
- Daejeon, Korea, Republic of
- 会议
Source-free domain adaptation (SFDA), where only a pre-trained source model is available to adapt to the target domain, has gained widespread application in the medical field. Most existing methods overlook low-quality pseudo-labels, i.e., pseudo-labels with boundary semantic confusion, when learning target domain-specific knowledge, leading to the loss of crucial boundary information. Furthermore, focusing solely on the specific knowledge can drive the model shifts in an uncontrollable direction, resulting in model degradation. To address these issues, we propose Dual Knowledge-aware Guidance (DKG), a novel SFDA method that integrates domain-specific knowledge with domain-invariant knowledge to improve transfer performance. Specifically, the pseudo-label calibration scheme is proposed to reduce semantic bias in high-uncertainty pixels, preserving the boundary information of target domain-specific knowledge. To ensure stable training, we propose a domain-invariant knowledge-based loss strategy, leveraging a confidence-guided mechanism and a consistency constraint. Additionally, we also introduce a dynamic balancing loss to address class imbalance. Extensive experiments on cross-domain fundus image segmentation show that DKG achieves state-of-the-art performance. Code is available at https://github.com/Hanshuqian/DKG © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
...2.SS-Mixer: MLP-Based 3D Human Motion Prediction with Spatial-Spectral Attention
- 关键词:
- Complex networks;Convolution;Dynamics;Forecasting;Low pass filters;Mixer circuits;Mixers (machinery);Mixing;Motion capture;Motion estimation ;Network layers;Active motion;Convolutional networks;Graph convolutional network;Human motions;Mixing mechanisms;Motion generation;Motion prediction;Multilayers perceptrons;Spatial-spectral mixing mechanism;Spectral mixing
- Zhang, Jianhua;Zhong, Jianqi;Cao, Wenming
- 《6th Asia-Pacific Conference on Image Processing, Electronics and Computers, IPEC 2025》
- 2025年
- May 16, 2025 - May 18, 2025
- Dalian, China
- 会议
Traditional graph convolutional network (GCN)-based methods for 3D human motion prediction have demonstrated great potential. However, these methods face two critical limitations: first, they require a large number of trainable parameters due to the complex network structure; second, they fail to differentiate between active motion regions and static regions, leading to suboptimal feature extraction. To address these issues, we propose Spatial-Spectral MLPs (SS-Mixer), a novel architecture designed to efficiently capture spatial and spectral features for human motion prediction. The SS-Mixer introduces an attention-based segmentation mechanism to distinguish active motion regions from static regions, allowing the network to prioritize critical features. Furthermore, we decompose the input skeleton into multiple scales, modeling the dynamics of each part independently to enhance feature diversity. By incorporating a hybrid spatial-spectral mixing mechanism, SS-Mixer captures the diversity in motion sequences across both spatial and spectral domains, improving prediction performance. The integration of spectral decomposition into the mixing process addresses the low-pass filtering issue in GCNs, ensuring robust representation learning for dynamic motions. Extensive experiments on three challenging datasets - Human3.6M, and 3DPW - demonstrate the superiority of SS-Mixer. Our model demonstrates outstanding performance in terms of 3D mean per joint position error (MPJPE) across these datasets, achieving significant improvements compared to state-of-the-art methods. These results validate the exceptional ability of SS-Mixer in enhancing prediction accuracy while maintaining computational efficiency. These results highlight the effectiveness of SS-Mixer in balancing computational efficiency and predictive accuracy while addressing the limitations of existing GCN-based approaches. © 2025 IOS Press.
...3.Counterfactual Thinking Driven Emotion Regulation for Image Sentiment Recognition
- 关键词:
- Emotion Recognition;Feature Selection;Psychology computing;Quality control;Affective Computing;Counterfactuals;Effective tool;Emotion predictions;Emotion regulations;Generation tools;Psychological theory;Recognition methods;Region-based;Regulation networks
- Zhang, Xinyue;Wang, Zhaoxia;Wang, Hailing;Cao, Guitao
- 《34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025》
- 2025年
- August 16, 2025 - August 22, 2025
- Montreal, QC, Canada
- 会议
Image sentiment recognition (ISR) facilitates the practical application of affective computing on rapidly growing social platforms. Nowadays, region-based ISR methods that use affective regions to guide emotion prediction have gained significant attention. However, existing methods lack a causality-based mechanism to guide affective region generation and effective tools to quantitatively evaluate their quality. Inspired by the psychological theory of Emotion Regulation, we propose a counterfactual thinking driven emotion regulation network (CTERNet), which simulates the Emotion Regulation Theory by modeling the entire process of ISR based on human causality-driven mechanisms. Specifically, we first use multi-scale perception for feature extraction to simulate the stage of situation selection. Next, we combine situation modification, attentional deployment, and cognitive change into a counterfactual thinking based cognitive reappraisal module, which learns both affective regions (factual) and other potential affective regions (counterfactual). In the response modulation stage, we compare the factual and counterfactual outcomes to encourage the network to discover the most emotionally representative regions, thereby quantifying the quality of affective regions for ISR tasks. Experimental results demonstrate that our method outperforms or matches the state-of-the-art approaches, proving its effectiveness in addressing the key challenges of region-based ISR. © 2025 International Joint Conferences on Artificial Intelligence. All rights reserved.
...4.CBA:-2 Cross-Domain Consistency and Bidirectional Alignment for Cross-Modal Domain-Incremental Learning
- 关键词:
- Computer vision;Domain Knowledge;Learning systems;Modal analysis;Cross-domain;Cross-modal;Cross-modal attention;Domain consistency;Domain-incremental learning;Global knowledge;Incremental learning;Language model;Modal domain;Vision-language model
- Huang, Weiyi;Xi, Xidong;Wang, Hailing;Cao, Guitao
- 《2025 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2025》
- 2025年
- October 5, 2025 - October 8, 2025
- Hybrid, Vienna, Austria
- 会议
In Cross-Modal Domain-Incremental learning, the primary challenge lies in learning from varying data distributions and maintaining its performance on prior domains. However, existing methods often overlook the importance of shared knowledge across domains and the interaction between modalities is still insufficient. To address these issues, we propose Cross-Domain Consistency and Bidirectional Alignment (C2BA), a novel framework that enhances the model's generalization ability and improves the cross-modal integration in VLMs through two key components. We design a Cross-domain Global Consistency Constraint (CGCC) to stabilize domain-invariant representations during incremental training, preventing excessive shifts of shared distributions toward new domains. In addition, we design a Bidirectional Cross-Modal Attention (BCMA) module, which enables effective interaction between visual and textual features through a bidirectional attention mechanism, thereby reducing cross-modal discrepancies. Experiments on three benchmark datasets demonstrate that our method outperforms state-of-the-art exemplar-free and even exemplar-based approaches, achieving superior generalization and cross-modal interaction. © 2025 IEEE.
...5.Refining Long-Term Predictions: Two-Stage Spatial-Temporal Feature Learning for 3D Human Motion Prediction
- 关键词:
- ;3D skeleton;Auto-regressive;Feature learning;GCN;Human motion prediction;Human motions;Hybrid regressive mechanism;Long-term prediction;Motion prediction;Spatial-temporal features
- Cao, Wenming;Yang, Yixin;Zhong, Jianqi;Zhang, Yicha
- 《2025 IEEE International Conference on Big Data and Smart Computing, BigComp 2025》
- 2025年
- February 9, 2025 - February 12, 2025
- Kota Kinabalu, Malaysia
- 会议
3D skeleton-based human motion prediction is critical for human-machine interactions but remains challenging. Recent RNN-based approaches achieve good performance but suffer from error accumulation due to their sequential prediction. To overcome this, we propose a Hybrid Regressive Network with Better Guesses Decision, combining autoregressive and non-autoregressive strategies to improve accuracy. The Better Guesses Decision unit enhances long-term forecasting through Better Guess Learning and Better Prediction Decision. Our Multimapping Parsing Unit maps motion sequences into geometric algebra and Euclidean spaces, providing comprehensive modeling of motion dependencies. Experiments on Human3.6M datasets show that our method achieves state-of-the-art performance. © 2025 IEEE.
...6.A Novel Framework for Inverse Problems: Fixed-Point Iteration Using Consistency Models
- 关键词:
- ;
- Wang, Xinke;Cao, Guitao;Wang, Hailing
- 《2025 International Joint Conference on Neural Networks, IJCNN 2025》
- 2025年
- June 30, 2025 - July 5, 2025
- Rome, Italy
- 会议
Inverse problems play a crucial role in science and engineering, especially in the field of computer vision, where tasks such as deblurring, super-resolution, and colorization can be formally modeled as inverse problems. Consistency models excel in generation speed while maintaining high quality, making them a promising family of generative models. However, existing sampling methods struggle to achieve high-quality results when applying consistency models to image inverse problems. To address this limitation, we propose the Consistency Inverse Reconstruction Sampling (CIRS) framework, which incorporates two modes: CIRS-Hybrid and CIRS-Pure. In CIRS-Hybrid, the posterior formula of inverse problems is utilized by estimating the prior term using a diffusion denoiser and the likelihood term with a consistency model, enabling reconstruction under dual-model guidance. To overcome the complexities of dual-model tuning and inefficiencies caused by employing a diffusion denoiser, we introduce CIRS-Pure, which relies solely on a consistency model. By eliminating the iterative noise addition and denoising steps, the iterative procedure is transformed into a fixed-point iteration, achieving efficient and high-quality restoration. Extensive experiments demonstrate that CIRS-Pure outperforms state-of-the-art methods in zero-shot image restoration tasks such as image deblurring and colorization while achieving competitive performance in super-resolution. © 2025 IEEE.
...7.A Generic Autoregressive Predictive Feedback Framework forSkeleton-Based Action Recognition
- 关键词:
- Feedback;Action recognition;Auto-regressive;Autoregressive predictive;Global models;Long-term temporal dependence;Motion sequences;Ordering constraints;State-space models;Stationary components;Temporal dependence
- Yin, Xinpeng;Hu, Jing;Cao, Wenming
- 《17th Asian Conference on Computer Vision, ACCV 2024》
- 2025年
- December 8, 2024 - December 12, 2024
- Hanoi, Viet nam
- 会议
Prior works in skeleton-based action recognition have struggled with overcoming sequence order constraints while achieving comprehensive global modeling of temporal dependencies. However, most focus on enhancing the model’s fitting ability across different temporal scales, overlooking the temporal non-stationary characteristics inherent in motion sequences. In this paper, we explore the adaptation of state-space modeling (SSM), typically suited for stationary sequences, to motion sequences. Addressing the challenge posed by the trendiness of motion sequences and the stability requirement of SSM, we integrate SSM into a generalized Autoregressive Predictive Feedback (APF) framework. Our approach involves segmenting motion sequences into trend and stationary components. We introduce the Non-Independent Multi-channel Processing (NiMc-P) module to capture implicit relationships among 3D coordinates and propose the Independent Multi-joint SSM (IMj-S) module to model temporal dependencies within stationary components. Throughout this process, state space matrices drive the feedback mechanism. Experiments conducted on the NTU-RGB+D 60 and NTU-RGB+D 120 datasets demonstrate the efficiency and versatility of APF. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
...8.Self-supported Prototype Rectification for Few-shot Medical Image Segmentation
- 关键词:
- Electric rectifiers;Medical imaging;Structured Query Language;Few-shot learning;Intra class;Labeled images;Many to many;Medical image segmentation;Prototype rectification;Query images;Self-support;Semantic segmentation;Semantics Information
- Li, Zhaoxu;Wang, Hailing;Cao, Guitao
- 《2024 International Joint Conference on Neural Networks, IJCNN 2024》
- 2024年
- June 30, 2024 - July 5, 2024
- Yokohama, Japan
- 会议
Few-shot semantic segmentation aims to quickly adapt to pixel-wise predictions for novel classes with only a few labeled images. Recent works rely on prototypical learning, where prototypes obtained from support images are applied to the segmentation of query images. However, there are inherent intra-class appearance differences between support images and query images, and the prototypes extracted from a small number of support images contain limited deep semantic information, which makes it difficult to accurately guide the segmentation of query images. To alleviate this problem, we propose a Self-Supported Prototype Rectification Network. Specifically, we introduce a Pseudo Mask Generation (PMG) module to generate a pseudo query mask by means of many-to-many prototype matching. We design a Prototype Rectification (PR) module with a learnable parameter ? to balance self-supported rectified prototype between support prototype obtained from support image and query prototype extracted from query features with pseudo query mask. Furthermore, we introduce a prototype-based multi-class segmentation approach mitigate the issue of confusion area prediction among different organs for query images in multi-organ segmentation scenario. Our method outperforms other SOTAs on two widely used datasets: CHAOST2 and MS-CMR. © 2024 IEEE.
...9.DLE: Document Illumination Correction with Dynamic Light Estimation
- 关键词:
- Image enhancement;Photodegradation;Photomasks;Adversarial networks;Background light;Document images;Down-stream;Illumination correction;Image degradation;Light estimations;Multi-modal;Natural environments;Subnetworks
- Quan, Jiahao;Wang, Hailing;Wu, Chunwei;Cao, Guitao
- 《2024 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2024》
- 2024年
- October 6, 2024 - October 10, 2024
- Kuching, Malaysia
- 会议
Document images captured through mobile devices in natural environments are often affected by various types of illumination degradation. The degradation diminishes the clarity and readability of document images, thereby complicating their application to OCR downstream tasks. Existing methods typically address only one or a limited number of degradation types and do not consider the diversity of image degradation types. Additionally, these methods typically involve a pre-trained fixed sub-network to estimate background light or shadows, which lacks flexibility and adaptability. To overcome these challenges, this study proposes a novel framework named DLE, which comprises a two-loop generative adversarial network and a multi-modal discriminator. Specifically, to improve the quality of image representation, a mask extractor is embedded before the image input generator. This forces the model to focus on the distinct features in the image, enhancing the representation of illumination anomalous and degraded regions. The mask extractor generates a luminance mask to evaluate the difference in illumination between the input and target images. Subsequently, the consistency loss computation incorporates a dynamic optimization of the mask extractor, strengthening its ability to estimate the illumination degradation part. Moreover, a pre-trained visual-language model is introduced into the multi-modal discriminator, leveraging its robust cross-modal alignment capability to improve the semantic consistency of the generated images with the preset input text. Extensive experiments demonstrate that our approach achieves the SOTA performance in terms of edit distance (ED) and character error rate (CER). © 2024 IEEE.
...10.HWSformer: History Window Serialization Based Transformer for Semantic Enrichment Driven Stock Market Prediction
- 关键词:
- Commerce;Costs;Electronic trading;Financial markets;Marketplaces;Natural language processing systems;Prediction models;Semantics;Time series;Performance;Price index;Semantic enrichment;Stock index forecasting;Stock market prediction;Stock price;Stock price index forecasting;Time-series data;Transformer modeling;Transformer-based
- Hu, Yisheng;Cao, Guitao;Cheng, Dawei
- 《2024 International Joint Conference on Neural Networks, IJCNN 2024》
- 2024年
- June 30, 2024 - July 5, 2024
- Yokohama, Japan
- 会议
After the Transformer model demonstrated excellent performance in natural language processing (NLP) tasks and computer vision tasks, people have started to explore the use of Transformer models in the field of time series prediction. Because of the significant role of the stock market in the global economy, stock market prediction is of paramount importance for investors. Stock indices forecasting is one of the fields of stock market forecasting and researchers have also set their sights on Transformer. However, with limited semantic information available in time series data and the unique characteristics of the self-attention mechanism, the Transformer model has not gained widespread adoption in stock indices forecasting. In this paper, we propose a history window serialization based Transformer model (HWSformer) specifically designed for predicting stock price indices. Our innovation is to introduce the historical window serialization layer to solve the problem of limited semantic richness in time series data, which affects the validity of self-attention. Additionally, in order to capture the original distribution accurately and retain the valuable non-stationary information, we incorporate the Reversible Instance Normalization (RevIN) method. We conducted experiments on 12 stock price index datasets collected from multiple countries and demonstrated that HWSformer outperforms traditional Transformer models by approximately 20% and varying degrees of improvement compared to other recent variants of Transformers. © 2024 IEEE.
...
