Adaptive Video Streaming with Layered Neural Codecs for Both Machine and Human Vision
项目来源
项目主持人
项目受资助机构
立项年度
立项时间
项目编号
研究期限
项目级别
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
1.Style-Aware Music-to-Dance Generation viaMulti-Stage Unit Decomposition andRecombination
- 关键词:
- Audio acoustics;Cluster analysis;Clustering algorithms;Computer music;Mapping;Signal processing;Audio features;Complex mapping;Feature motion;Human motions;Multi-stage framework;Multi-stages;Music-to-dance;Recombination-based generation;Specific learning;Style-specific learning
- Gao, Yufei;Wu, Qian;He, Keren;Zhou, Jinjia
- 《32nd International Conference on Neural Information Processing, ICONIP 2025》
- 2026年
- November 20, 2025 - November 24, 2025
- Okinawa, Japan
- 会议
Generating dance animations from music presents significant challenges in artificial intelligence, requiring systems to capture the complex mapping between audio features and human motion.We introduce a novel multi-stage decomposition-recombination network for music-driven dance synthesis that addresses two critical limitations in existing approaches. First, unlike end-to-enddeep learning models that directly transform music into posture data—often resulting in unrealistic movements—our approach decomposes both music and dance into fundamental units and learns mappings between them, preserving the naturalness of human motion. Second, we establish that style-specific training deliversmore distinctive and stylistically coherent choreography than mixed-style approaches. Our framework implements a three-stage process: (1)an accumulation stage that constructs music and dance unit dictionaries through clustering techniques, (2) a learning stage that trains style-specific mapping models between these units, and (3)a creation stage that recombines units to generate coherentdance sequences. Quantitative evaluations show that our method outperforms the baseline approaches by 26.9% in Fréchet Inception Distance (FID) and 13.1% in music-dance correspondence scores. Qualitative analysis confirms our framework’s ability to generatedance sequences with both improved temporal coherence and clear stylistic characteristics across breaking, hip-hop, locking, and streetjazz styles. The proposed approach offers a significant advancementin realistic, style-specific music-to-dance synthesis. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
...2.HIGH-FREQUENCY SEMANTIC ENHANCEMENT IN COMPRESSED SCENARIOS FOR ROBUST VISUAL AND MACHINE VISION APPLICATIONS
- 关键词:
- Computer vision;Image coding;Machine Perception;Machine vision;Man machine systems;Object detection;Object recognition;Semantic Segmentation;Semantics;Growing demand;High frequency HF;Human vision;Machine-vision;Post-processing;Post-processing techniques;Semantic enhancements;Video coding for machine;Video processing;Vision applications
- He, Keren;Fu, Chen;Gao, Guangwei;Zhou, Jinjia
- 《32nd IEEE International Conference on Image Processing, ICIP 2025》
- 2025年
- September 14, 2025 - September 17, 2025
- Anchorage, AK, United states
- 会议
With the growing demand for video processing in both human and machine vision, optimizing post-processing techniques has become a crucial challenge. To address the limitations of current post-processing techniques in these domains, this paper introduces a novel post-processing method that enhances high-frequency information through semantic enhancement, significantly improving performance in both domains. We propose a High Semantic Extraction (HSE) model to capture more recognizable details, and design a High-Frequency Semantic Fusion (HFSF) strategy that preserves critical details while suppressing noise. Experimental results demonstrate that our method effectively enhances performance in object detection, semantic segmentation, and video quality, achieving a significant advancement in optimizing video processing for both human and machine vision. ©2025 IEEE.
...
