高清3D裸眼视频内容生成与编码

项目来源

国家自然科学基金(NSFC)

项目主持人

蒋刚毅

项目受资助机构

宁波大学

立项年度

2013

立项时间

未公开

项目编号

U1301257

项目级别

国家级

研究期限

未知 / 未知

受资助金额

255.00万元

学科

联合基金领域-电子信息领域

学科代码

L-L05

基金类别

联合基金项目-重点支持项目-NSFC-广东联合基金

关键词

3D视频内容生成 ; 3D视觉舒适度 ; 裸眼3D显示 ; 3D视频的体验质量 ; 3D视频编码 ; 3D video coding ; 3D Video Content Generation ; Autostereoscopic Display ; 3D visual comfort ; 3D video QoE

参与者

安平；张永兵；邵枫；张磊；韩军；王晓东；冯妮娜；蒋志迪

参与机构

上海大学；清华大学深圳研究生院

项目标书摘要：高清3D裸眼视频系统能提供立体感、临场感等全新视觉体验，高质量3D内容生成、高效编码是其走向应用的关键。与单视点视频系统相比，高清3D裸眼视频系统存在3D节目观看时可能的视疲劳、3D内容匮乏与制作过程复杂、3D数据海量、整体系统用户3D视觉体验质量等问题。现有方法在3D内容采集与重建很少考虑裸眼3D显示的舒适性、3D视频编码失真的视觉感知，也很少考虑从3D系统体验质量来设计各个环节。.本项目从研究影响3D视觉舒适性、编码失真的感知特性、用户体验质量的因素出发，在设计主观感知实验、统计分析各因素影响的基础上，建立数学模型，并对3D舒适度、3D感知失真、3D视觉体验质量进行定量分析与客观描述；提出基于视觉舒适度模型约束的3D内容采集与重建、基于感知失真测度模型的高效3D视频编码、基于用户体验质量预测模型的3D系统设计等理论与方法，以获得最佳用户体验质量的3D内容、高效率的3D视频压缩。

Application Abstract: High definition 3D video systems with autostereoscopic display can provide new visual experiences such as stereoscopic perception,sense of immediacy,etc..High quality 3D content generation and high efficient coding are keys of applying the systems into applications.However,compared with mono-view video system,there are still very important problems to be solved,such as visual discomfort when watching 3D programs,lack of 3D contents,high complexity of 3D content generation and compression,huge amount of 3D data,user’s 3D visual quality of experience(QoE)for the whole system.So far,the exiting 3D content generation and reconstruction methods have seldom considered comfort degree of autostereoscopic display,visual perception degradation created by 3D video coding distortion,and user’s 3D visual QoE in designing each part of 3D system as well..In this project,the factors influencing 3D visual comfort,perception characteristics of coding distortion,and user’s 3D visual QoE will be investigated firstly,the corresponding mathematical models will be established by means of subjective perception experiments and statistical analysis of effectiveness of these facts so as to quantitatively describe 3D visual comfort degree,perception characteristics of coding distortion,and user's 3D visual QoE.Then,the theories and methods for 3D contents generation and reconstruction within the constraint of visual comfort model,high efficient 3D video coding based on perception distortion metric,and user’s QoE prediction model based 3D system design will be proposed to obtain 3D contents with the optimal user’s QoE(or visual comfort)and achieve high efficient performance of 3D video coding.

项目受资助省

浙江省

项目结题报告(全文)

高清3D裸眼视频系统利用人眼双目视觉感知特性形成立体感、临场感，让观众更真实直观地感受世界，是新一代视频技术的发展方向。本项目致力于求解用户视觉体验质量与3D内容生成、3D视频编码等科学问题。从研究影响3D内容失真与3D视觉舒适性的视觉感知特性要素出发，通过设计主观视觉感知实验、统计分析各因素影响，对3D视觉失真测度、视觉舒适度等3D视觉体验质量进行定量分析与描述，提出了基于人眼视觉感知特性的用户体验质量评价理论与方法，并应用于基于3D视觉舒适度评价约束的3D内容采集与重建、基于感知失真评价模型的高效3D视频编码、基于用户体验质量评价的3D视频系统集成等，为实现高质量3D内容生成、高效率3D视频编码、高性能3D视频系统设计等提供了可借鉴的理论与方法，形成了相关专利技术；构建了基于彩色+深度的实时双目3D视频原型系统、高逼真3D实时成像与显示系统等面向不同应用的3D视频原型系统。为实现高质量的3D内容生成及其高效编码压缩提供了相关理论与方法。本项目发表学术论文125篇，其中国际SCI期刊论文72篇,IEEE Transactions、Optical Express等顶尖期刊长文19篇；在本领域权威国际会议上发表论文41篇，出版学术著作1部。获授权发明专利35件(含授权美国发明专利4件)。部分成果获省部级科技奖3项(一、二、三等奖各1项)、参与获国家科技进步二等奖1项。项目组主要成员获国家自然科学基金优秀青年科学基金项目、浙江省自然科学基金杰出青年基金项目、“广东特支计划”科技创新青年拔尖人才项目等；共培养博士和硕士毕业生37名，建立了一支3D视频研究领域的优秀科研队伍。

排序方式：时间相关性
显示方式：列表摘要

1.Deep Light Field Spatial Super-Resolution Using Heterogeneous Imaging

关键词：
Cameras; Spatial resolution; Superresolution; Visualization; Imagereconstruction; Light fields; Training; Light field; heterogeneousimaging; spatial super-resolution; pyramid reconstruction;RESOLUTION; CAMERAS

Chen, Yeyao;Jiang, Gangyi;Yu, Mei;Xu, Haiyong;Ho, Yo-Sung
《IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS》
2023年
29卷
10期
期刊

Light field (LF) imaging expands traditional imaging techniques by simultaneously capturing the intensity and direction information of light rays, and promotes many visual applications. However, owing to the inherent trade-off between the spatial and angular dimensions, LF images acquired by LF cameras usually suffer from low spatial resolution. Many current approaches increase the spatial resolution by exploring the four-dimensional (4D) structure of the LF images, but they have difficulties in recovering fine textures at a large upscaling factor. To address this challenge, this paper proposes a new deep learning-based LF spatial super-resolution method using heterogeneous imaging (LFSSR-HI). The designed heterogeneous imaging system uses an extra high-resolution (HR) traditional camera to capture the abundant spatial information in addition to the LF camera imaging, where the auxiliary information from the HR camera is utilized to super-resolve the LF image. Specifically, an LF feature alignment module is constructed to learn the correspondence between the 4D LF image and the 2D HR image to realize information alignment. Subsequently, a multi-level spatial-angular feature enhancement module is designed to gradually embed the aligned HR information into the rough LF features. Finally, the enhanced LF features are reconstructed into a super-resolved LF image using a simple feature decoder. To improve the flexibility of the proposed method, a pyramid reconstruction strategy is leveraged to generate multi-scale super-resolution results in one forward inference. The experimental results show that the proposed LFSSR-HI method achieves significant advantages over the state-of-the-art methods in both qualitative and quantitative comparisons. Furthermore, the proposed method preserves more accurate angular consistency.

...

2.三维视频编码中深度失真模型研究

关键词：
3DV 深度视频虚拟视失真感知编码基金资助：国家自然科学基金项目“自由视点多视点视频编码及3D立体显示基础理论与关键技术研究”（项目编号:60832003）；国家自然科学基金项目“面向绘制质量的深度提取及其编码方法研究”（项目编号:61172096）；国家自然科学基金项目“高清3D裸眼视频内容生成与编码”（项目编号:U1301257）。；专辑：信息科技专题：电信技术 DOI：10.27300/d.cnki.gshau.2019.000538 分类号：TN919.81 导师：安平手机阅读

《》
0年
卷
期
期刊

随着计算机和通信技术的迅速发展,三维视频（Three Dimension Video,3DV）逐渐替代二维视频（Two Dimension Video,2DV）成为下一代主流视频技术。人们观看3DV能获得丰富的立体感和沉浸感。自由立体显示技术的兴起,不仅使观众摆脱了眼镜的束缚,还向观众提供了视点交互选择功能。系统根据用户需求呈现相应视点的3DV。多视视频巨大的数据量,对信息传输基础设施形成挑战。深度增强型数据格式表示的3DV由少量参考视的彩色视频和深度视频组成,在接收端通过合成虚拟视的方式提供多视点视频。深度增强型数据格式3DV减轻了多视视频数据传输量,引起研究者的关注。深度视频用于控制虚拟视合成。研究深度视频的压缩失真对于合成的虚拟视质量的影响,具有重要意义。一方面,在3DV编码过程中,合理控制深度视频失真,能够改善虚拟视质量,提高3DV的视觉体验质量。另一方面,并非所有的深度失真都使虚拟视感知质量下降,利用深度视觉感知特性,抑制了恰可觉察深度差异（Just Noticable Depth Difference,JNDD）阈值以下的深度失真,可以有效提高3DV编码效率。本文对3DV编码中的深度失真机理进行了深入研究,主要学术贡献及创新点包括以下几方面:首先,研究了深度失真对虚拟视失真的影响,建立了基于深度的虚拟视失真模型。深度图划分为平坦块和非平坦块,平坦块使用频域方法整体计算虚拟视失真;非平坦块逐像素分析遮挡关系变化,计算失真代价。在非平坦块中,我们不仅分析误遮挡像素失真,还进一步考虑了误显露像素产生的褶皱失真。边缘区域在深度图中的比例虽小,但是对于虚拟视失真影响显著。为了准确分类深度图分块,我们采用基于视差的深度图编码块的分类准则,建立阈值函数,分类阈值根据拍摄参数和场景参数集调整。本文所提出的模型提高了模型估计性能,平均预测均方误差与实测均方误差差异降低到2.9。然后,以人类视觉系统（Human Visual System,HVS）的立体视觉生理结构和深度感知特性为依据,建立了修正的JNDD（modified JNDD,MJNDD）模型、恰可觉察视差差异（Just Noticeable Disparity Difference Model,JNDi D）模型和感知深度的JNDD模型（Just noticeable perceived depth difference,JNPDD）。MJNDD模型采用三段线性函数建模,比现有两段和四段模型预测准确性高,与主观测试数据的线性相关系数（Pearson Linear Correlation Coefficient,PLCC）达到0.99。JNDi D模型假设辐辏冲突中会聚占优势,为统一表示不同显示观看条件下的JNDi D模型提供了基础。JNPDD模型以自然场景下JNDD阈值为纽带,将各种显示观看条件下的JNDD阈值函数联系在一起,形成函数族。JNPDD阈值依据显示观看参数计算,可以跨显示器使用。最后,我们提出一种面向虚拟视失真的感知编码算法,应用基于深度的虚拟视失真模型修改深度编码率失真准则的失真测度,应用JNDi D模型滤波深度预测残差。实验结果证明该算法提高了3DV编码性能,在保持视觉感知质量的同时降低了码流速率。该算法从应用层面证实所提出的基于深度的虚拟视失真模型和JNDi D模型的有效性。

...

3.基于SLIC融合纹理和直方图的图像显著性检测

关键词：
SLIC算法颜色特征空间位置特征纹理特征直方图显著性检测基金资助：国家科技支撑计划基金（No.2012BAH67F01）；国家自然科学基金（No.U1301257）；浙江省自然科学基金（No.LY17F010005）；专辑：信息科技专题：计算机软件及计算机应用分类号：TP391.41 手机阅读

丁华;王晓东;章联军;陈晓爱;赖佩霞
《》
0年
卷
期
期刊

针对基于颜色直方图的显著图无法突出边缘轮廓和纹理细节的问题,结合图像的颜色特征、空间位置特征、纹理特征以及直方图,提出了一种基于SLIC融合纹理和直方图的图像显著性检测方法。该方法首先通过SLIC算法对图像进行超像素分割,提取基于颜色和空间位置的显著图;然后分别提取基于颜色直方图的显著图和基于纹理特征的显著图;最后将前两个阶段得到的显著图进行融合得到最终的显著图。此外,通过简单的阈值分割方法得到图像中的显著性目标。实验结果表明,与经典显著性检测算法相比,提出的算法性能明显优于其他算法性能。

...

4.A Novel No-Reference Quality Assessment Metric for Stereoscopic Images with Consideration of Comprehensive 3D Quality Information.

关键词：
machine learning; natural scene statistics; no reference; spatial domain; stereo visual information; stereoscopic image quality assessment; transform domain

Shen, Liquan;Yao, Yang;Geng, Xianqiu;Fang, Ruigang;Wu, Dapeng
《Sensors 》
2023年
23卷
13期
期刊

Recently, stereoscopic image quality assessment has attracted a lot attention. However, compared with 2D image quality assessment, it is much more difficult to assess the quality of stereoscopic images due to the lack of understanding of 3D visual perception. This paper proposes a novel no-reference quality assessment metric for stereoscopic images using natural scene statistics with consideration of both the quality of the cyclopean image and 3D visual perceptual information (binocular fusion and binocular rivalry). In the proposed method, not only is the quality of the cyclopean image considered, but binocular rivalry and other 3D visual intrinsic properties are also exploited. Specifically, in order to improve the objective quality of the cyclopean image, features of the cyclopean images in both the spatial domain and transformed domain are extracted based on the natural scene statistics (NSS) model. Furthermore, to better comprehend intrinsic properties of the stereoscopic image, in our method, the binocular rivalry effect and other 3D visual properties are also considered in the process of feature extraction. Following adaptive feature pruning using principle component analysis, improved metric accuracy can be found in our proposed method. The experimental results show that the proposed metric can achieve a good and consistent alignment with subjective assessment of stereoscopic images in comparison with existing methods, with the highest SROCC (0.952) and PLCC (0.962) scores being acquired on the LIVE 3D database Phase I.

...

5.基于深度学习的视频编码技术研究

关键词：
视频编码;深度学习;CNN;GAN;HEVC

金智鹏
指导老师：上海大学安平
0年
学位论文

自20世纪八十年代以来,视频编码（Video coding）技术蓬勃发展,广泛应用于远程教育、远程医疗、可视电话、视频会议、视频点播、交互式视频游戏、安全监控、虚拟实现等领域,对整个信息产业的发展起到了巨大的推动作用。现实生活中,无处不在的视频应用更是催生出海量的视频数据;特别是近年来人们强烈追求更清晰、更流畅、更逼真的视觉体验,使得视频数据呈现爆发式增长,对视频压缩效率提出了更高的要求。高效视频编码标准HEVC（High Efficiency Video Coding）相比上一代国际视频编码标准H.264/AVC,可以使1080P视频内容的压缩效率提高50%左右。在HEVC框架中,帧内帧间预测技术、环路滤波技术、快速编码技术是保障其压缩率、感知质量以及编码速度的三大重要技术领域。尽管针对帧内帧间预测、环路滤波和快速编码算法的改进工作已经有很多,但是HEVC编码性能仍不能达到最优,很大一部分原因是受到手工设计的特征提取和特征建模的性能限制。近年来,随着深度学习（Deep learning）技术的再次兴起及其在计算机视觉领域的广泛成功,视频编码技术开启了端到端自动建模的研究新领域。基于深度学习的预测编码技术可以有效提高视频压缩率,基于深度学习的环路滤波技术可以有效提高解码图的视觉舒适度,它们在直播推流、视频传输等领域具有广泛的应用价值和商业价值。尽管应用前景广阔,但是基于深度学习的视频编码技术还处于研究初期,相关研究成果还没有形成完整的体系,在深度神经网络结构、网络可解释性、训练数据集的构建、网络训练方法、网络计算效率、跨平台的代码兼容性等方面还需开展细致的研究。因此,开展基于深度学习的视频高效编码算法研究,完善基于深度学习的视频编码理论,具有重要的理论与现实意义。本文对基于深度学习的视频编码关键技术进行了深入研究,主要的学术贡献及创新点包括以下几个方面:1.提出基于感知对抗和渐进式网络的环路滤波算法框架（MPRNet）,性能超越了去块滤波、样本自适应偏移、自适应环路滤波等传统的环路滤波方法。本方法主要从三个层面上提高算法性能:将解码图像（未经环路滤波的图像）送入深度学习网络进行逐级渐进式的增强处理,有效解决了块效应、振铃效应、模糊等由编码导致的图像失真问题,提高了编码效率和主观视觉质量;采用对抗训练方式,结合均方误差损失、感知损失和对抗损失,能有效提高图像的视觉质量;采用可伸缩的多级渐进式的CNN网络,用多级均方误差损失来控制各级网络的拟合功能,实现由粗到细渐进式的增强处理效果,能在计算量和增强效果之间进行平衡。2.提出基于卷积自编码机的帧内预测算法框架（IPCED）,采用端到端数据驱动的方式进行帧内预测,可有效降低预测残差,提高编码率失真性能。本方法从以下三方面提高帧内预测性能:借鉴图像修复技术,用3个参考块进行第四象限的帧内预测,并引入GAN对抗损失和联合优化方式,提高预测准确率;提出多级直连的卷积编码器网络,把深层全局信息和浅层局部信息融合到一起,增强对参考块纹理特征的学习和表征能力;提出多级反卷积的解码器网络,逐级进行纹理重建（即帧内预测）,增强预测结果的纹理丰富程度,提高预测准确率。3.提出基于CNN的编码单元结构快速决策算法框架,将QTBT编码单元划分结构优化问题转化为多分类问题,从整体上判断32×32编码块的纹理复杂度（即划分深度范围）,而不是逐级判断是否需要划分为子块,从而显著提高了决策速度。本方法的创新特色有以下三点:直接预测32×32编码块的最浅和最深划分深度,有效解决了JVET中QTBT编码单元种类繁多的问题,并保持了良好的分类准确性;针对任务特点,设计了新型目标函数,包括Hingeloss和类别惩罚项,能有效提高分类准确;本方法是一种端到端的学习系统,将直接从编码单元中学习和提取分类特征,而不需要手动去设计和提取特征,也不需要时域和空域的相关性信息,有助于提高帧内编码的并行运算和独立解码性能。综上,本文以基于深度学习的视频编码技术为研究对象,对基于深度学习的环路滤波、帧内预测、和快速编码等关键技术进行了深入研究。实验结果表明,本文提出的多种算法均有效提高了视频编码效率。

...

6.基于迁移学习的蔬菜图像识别方法

关键词：
蔬菜图像识别卷积神经网络迁移学习小样本基金资助：国家科技支撑计划项目（2012BAH67F01）；国家自然科学基金（U1301257）；浙江省自然科学基金（LY17F010005）；专辑：基础科学信息科技专题：计算机软件及计算机应用自动化技术分类号：TP391.41TP18 手机阅读

赖佩霞;王晓东;章联军
《》
0年
卷
期
期刊

为解决蔬菜识别领域缺少带标签样本的问题,提出了一种基于迁移学习的图像识别方法.首先,将原始数据集利用数据增强扩大样本数据量后引入到大规模数据集上的预训练模型.针对迁移过程中高层特征的领域特定性导致的网络泛化性能差,通过加入两层自适应层参数初始化后重新训练得到基本模型;对该基本模型再利用参数冻结的迁移方式进一步调优参数,得到用于蔬菜图像识别的最终网络模型.实验表明,基于CaffeNet和ResNet10两个小型网络的迁移策略可以较好地处理小样本的蔬菜图像识别,训练得到的模型准确率分别为94.97%、96.69%.与其他迁移算法及传统的神经网络方法相比,该算法具有更高的识别性以及更强的鲁棒性.

...

7.No-Reference Light Field Image Quality Assessment Using Four-Dimensional Sparse Transform

关键词：
Feature extraction; Image coding; Frequency-domain analysis; Tensors;Principal component analysis; Periodic structures; Information filters;Light field image quality assessment; no-reference; 4D discrete cosinetransform; sub-aperture gradient image array; spatial-angular quality

Xiang, Jianjun;Jiang, Gangyi;Yu, Mei;Jiang, Zhidi;Ho, Yo-Sung
《IEEE TRANSACTIONS ON MULTIMEDIA》
2023年
25卷
期
期刊

Light field imaging can simultaneously capture the intensity and direction information of light rays in the real world. Light field image (LFI) with four-dimensional (4D) data suffers from quality degradation in the process of compression, reconstruction and processing. How to evaluate the visual quality of LFI is thought-provoking. This paper proposes a no-reference LFI quality assessment metric based on high-dimensional sparse transform. Firstly, LFI's sub-aperture gradient image array (SAGIA), which is still a 4D signal, is generated by high-pass filtering between adjacent SAIs. Then, SAGIA is transformed with 4D discrete cosine transform (4D-DCT). 4D-DCT coefficients of SAGIA can characterize the angular and spatial information of LFI. And the logarithmic amplitudes of the coefficients at the same position of SAGIA?s transformed 4D blocks are averaged as the coefficient energy. Subsequently, the 4D-DCT coefficients of SAGIA are divided into the spatial-angular frequency bands and spatial-angular orientation bands, and the corresponding energy features are extracted by converging the coefficient energy of the same band. In addition, the coefficients' amplitudes at the same position of blocks are fitted by the Weibull distribution. Then, the fitted parameters of each position are concatenated, and cropped with principal component analysis to obtain the compact features. Finally, the extracted features are pooled to predict the visual quality of the distorted LFIs. The experimental results demonstrate that the proposed method is more consistent with the subjective evaluation on three LFI databases, compared with the state-of-the-art image quality assessment methods and LFI quality assessment methods.

...

8.Multi-Angle Projection Based Blind Omnidirectional Image Quality Assessment

关键词：
Feature extraction; Distortion; Quality assessment; Image quality; Imagecolor analysis; Visualization; Resists; Omnidirectional image; blindquality assessment; multi-angle projection; tensor space;STATISTICS

Jiang, Hao;Jiang, Gangyi;Yu, Mei;Luo, Ting;Xu, Haiyong
《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》
2022年
32卷
7期
期刊

Most of the existing blind omnidirectional image quality assessment (BOIQA) methods are based on data-driven approach where the end-to-end neural network or deep learning tools are mainly used for feature extraction. However, it usually lacks interpretability and is difficult to discover the perceptual mechanism behind. In this paper, from the perspective of perception modeling, we propose a novel multi-angle projection based BOIQA (MP-BOIQA) method. Considering the omnibearing and near eye display characteristics with head mounted display, multiple color cubemap projection images with respect to different viewpoints are grouped as the color omnidirectional distortion (COD) units so as to simulate the user's viewing behavior in subjective quality assessment. In the designed multi-angle projection based feature extractor, tensor decomposition is implemented on each COD unit for dimensionality reduction, and piecewise exponential fitting is used to get the distribution of mean subtracted contrast normalized coefficients of the unit's feature matrices in tensor domain. Finally, the extracted features are pooled with random forest. The experimental results on three omnidirectional image quality datasets show that the MP-BOIQA method can deliver highly competitive performance compared with some representative full-reference quality assessment methods, as well as some state-of-the-art BOIQA methods.

...

9.一种基于三维离散余弦变换的立体视频质量客观评价方法

发明人:
授权日:}
专利

百度学术

10.一种立体视频实时深度估计系统硬件实现方法

发明人:
授权日:}
专利

百度学术

排序方式：时间相关性
显示方式：列表摘要