高清3D裸眼视频内容生成与编码

项目来源

国家自然科学基金(NSFC)

项目主持人

蒋刚毅

项目受资助机构

宁波大学

立项年度

2013

立项时间

未公开

项目编号

U1301257

研究期限

未知 / 未知

项目级别

国家级

受资助金额

255.00万元

学科

联合基金领域-电子信息领域

学科代码

L-L05

基金类别

联合基金项目-重点支持项目-NSFC-广东联合基金

关键词

3D视频内容生成 ; 3D视觉舒适度 ; 裸眼3D显示 ; 3D视频的体验质量 ; 3D视频编码 ; 3D video coding ; 3D Video Content Generation ; Autostereoscopic Display ; 3D visual comfort ; 3D video QoE

参与者

安平；张永兵；邵枫；张磊；韩军；王晓东；冯妮娜；蒋志迪

参与机构

上海大学；清华大学深圳研究生院

项目标书摘要：高清3D裸眼视频系统能提供立体感、临场感等全新视觉体验，高质量3D内容生成、高效编码是其走向应用的关键。与单视点视频系统相比，高清3D裸眼视频系统存在3D节目观看时可能的视疲劳、3D内容匮乏与制作过程复杂、3D数据海量、整体系统用户3D视觉体验质量等问题。现有方法在3D内容采集与重建很少考虑裸眼3D显示的舒适性、3D视频编码失真的视觉感知，也很少考虑从3D系统体验质量来设计各个环节。.本项目从研究影响3D视觉舒适性、编码失真的感知特性、用户体验质量的因素出发，在设计主观感知实验、统计分析各因素影响的基础上，建立数学模型，并对3D舒适度、3D感知失真、3D视觉体验质量进行定量分析与客观描述；提出基于视觉舒适度模型约束的3D内容采集与重建、基于感知失真测度模型的高效3D视频编码、基于用户体验质量预测模型的3D系统设计等理论与方法，以获得最佳用户体验质量的3D内容、高效率的3D视频压缩。

Application Abstract: High definition 3D video systems with autostereoscopic display can provide new visual experiences such as stereoscopic perception,sense of immediacy,etc..High quality 3D content generation and high efficient coding are keys of applying the systems into applications.However,compared with mono-view video system,there are still very important problems to be solved,such as visual discomfort when watching 3D programs,lack of 3D contents,high complexity of 3D content generation and compression,huge amount of 3D data,user’s 3D visual quality of experience(QoE)for the whole system.So far,the exiting 3D content generation and reconstruction methods have seldom considered comfort degree of autostereoscopic display,visual perception degradation created by 3D video coding distortion,and user’s 3D visual QoE in designing each part of 3D system as well..In this project,the factors influencing 3D visual comfort,perception characteristics of coding distortion,and user’s 3D visual QoE will be investigated firstly,the corresponding mathematical models will be established by means of subjective perception experiments and statistical analysis of effectiveness of these facts so as to quantitatively describe 3D visual comfort degree,perception characteristics of coding distortion,and user's 3D visual QoE.Then,the theories and methods for 3D contents generation and reconstruction within the constraint of visual comfort model,high efficient 3D video coding based on perception distortion metric,and user’s QoE prediction model based 3D system design will be proposed to obtain 3D contents with the optimal user’s QoE(or visual comfort)and achieve high efficient performance of 3D video coding.

项目受资助省

浙江省

项目结题报告(全文)

高清3D裸眼视频系统利用人眼双目视觉感知特性形成立体感、临场感，让观众更真实直观地感受世界，是新一代视频技术的发展方向。本项目致力于求解用户视觉体验质量与3D内容生成、3D视频编码等科学问题。从研究影响3D内容失真与3D视觉舒适性的视觉感知特性要素出发，通过设计主观视觉感知实验、统计分析各因素影响，对3D视觉失真测度、视觉舒适度等3D视觉体验质量进行定量分析与描述，提出了基于人眼视觉感知特性的用户体验质量评价理论与方法，并应用于基于3D视觉舒适度评价约束的3D内容采集与重建、基于感知失真评价模型的高效3D视频编码、基于用户体验质量评价的3D视频系统集成等，为实现高质量3D内容生成、高效率3D视频编码、高性能3D视频系统设计等提供了可借鉴的理论与方法，形成了相关专利技术；构建了基于彩色+深度的实时双目3D视频原型系统、高逼真3D实时成像与显示系统等面向不同应用的3D视频原型系统。为实现高质量的3D内容生成及其高效编码压缩提供了相关理论与方法。本项目发表学术论文125篇，其中国际SCI期刊论文72篇,IEEE Transactions、Optical Express等顶尖期刊长文19篇；在本领域权威国际会议上发表论文41篇，出版学术著作1部。获授权发明专利35件(含授权美国发明专利4件)。部分成果获省部级科技奖3项(一、二、三等奖各1项)、参与获国家科技进步二等奖1项。项目组主要成员获国家自然科学基金优秀青年科学基金项目、浙江省自然科学基金杰出青年基金项目、“广东特支计划”科技创新青年拔尖人才项目等；共培养博士和硕士毕业生37名，建立了一支3D视频研究领域的优秀科研队伍。

排序方式：时间相关性
显示方式：列表摘要

1.Deep Light Field Spatial Super-Resolution Using Heterogeneous Imaging

关键词：
Cameras; Spatial resolution; Superresolution; Visualization; Imagereconstruction; Light fields; Training; Light field; heterogeneousimaging; spatial super-resolution; pyramid reconstruction;RESOLUTION; CAMERAS

Chen, Yeyao;Jiang, Gangyi;Yu, Mei;Xu, Haiyong;Ho, Yo-Sung
《IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS》
2023年
29卷
10期
期刊

Light field (LF) imaging expands traditional imaging techniques by simultaneously capturing the intensity and direction information of light rays, and promotes many visual applications. However, owing to the inherent trade-off between the spatial and angular dimensions, LF images acquired by LF cameras usually suffer from low spatial resolution. Many current approaches increase the spatial resolution by exploring the four-dimensional (4D) structure of the LF images, but they have difficulties in recovering fine textures at a large upscaling factor. To address this challenge, this paper proposes a new deep learning-based LF spatial super-resolution method using heterogeneous imaging (LFSSR-HI). The designed heterogeneous imaging system uses an extra high-resolution (HR) traditional camera to capture the abundant spatial information in addition to the LF camera imaging, where the auxiliary information from the HR camera is utilized to super-resolve the LF image. Specifically, an LF feature alignment module is constructed to learn the correspondence between the 4D LF image and the 2D HR image to realize information alignment. Subsequently, a multi-level spatial-angular feature enhancement module is designed to gradually embed the aligned HR information into the rough LF features. Finally, the enhanced LF features are reconstructed into a super-resolved LF image using a simple feature decoder. To improve the flexibility of the proposed method, a pyramid reconstruction strategy is leveraged to generate multi-scale super-resolution results in one forward inference. The experimental results show that the proposed LFSSR-HI method achieves significant advantages over the state-of-the-art methods in both qualitative and quantitative comparisons. Furthermore, the proposed method preserves more accurate angular consistency.

...

2.三维视频编码中深度失真模型研究

关键词：
3DV 深度视频虚拟视失真感知编码基金资助：国家自然科学基金项目“自由视点多视点视频编码及3D立体显示基础理论与关键技术研究”（项目编号:60832003）；国家自然科学基金项目“面向绘制质量的深度提取及其编码方法研究”（项目编号:61172096）；国家自然科学基金项目“高清3D裸眼视频内容生成与编码”（项目编号:U1301257）。；专辑：信息科技专题：电信技术 DOI：10.27300/d.cnki.gshau.2019.000538 分类号：TN919.81 导师：安平手机阅读

《》
0年
卷
期
期刊

随着计算机和通信技术的迅速发展,三维视频（Three Dimension Video,3DV）逐渐替代二维视频（Two Dimension Video,2DV）成为下一代主流视频技术。人们观看3DV能获得丰富的立体感和沉浸感。自由立体显示技术的兴起,不仅使观众摆脱了眼镜的束缚,还向观众提供了视点交互选择功能。系统根据用户需求呈现相应视点的3DV。多视视频巨大的数据量,对信息传输基础设施形成挑战。深度增强型数据格式表示的3DV由少量参考视的彩色视频和深度视频组成,在接收端通过合成虚拟视的方式提供多视点视频。深度增强型数据格式3DV减轻了多视视频数据传输量,引起研究者的关注。深度视频用于控制虚拟视合成。研究深度视频的压缩失真对于合成的虚拟视质量的影响,具有重要意义。一方面,在3DV编码过程中,合理控制深度视频失真,能够改善虚拟视质量,提高3DV的视觉体验质量。另一方面,并非所有的深度失真都使虚拟视感知质量下降,利用深度视觉感知特性,抑制了恰可觉察深度差异（Just Noticable Depth Difference,JNDD）阈值以下的深度失真,可以有效提高3DV编码效率。本文对3DV编码中的深度失真机理进行了深入研究,主要学术贡献及创新点包括以下几方面:首先,研究了深度失真对虚拟视失真的影响,建立了基于深度的虚拟视失真模型。深度图划分为平坦块和非平坦块,平坦块使用频域方法整体计算虚拟视失真;非平坦块逐像素分析遮挡关系变化,计算失真代价。在非平坦块中,我们不仅分析误遮挡像素失真,还进一步考虑了误显露像素产生的褶皱失真。边缘区域在深度图中的比例虽小,但是对于虚拟视失真影响显著。为了准确分类深度图分块,我们采用基于视差的深度图编码块的分类准则,建立阈值函数,分类阈值根据拍摄参数和场景参数集调整。本文所提出的模型提高了模型估计性能,平均预测均方误差与实测均方误差差异降低到2.9。然后,以人类视觉系统（Human Visual System,HVS）的立体视觉生理结构和深度感知特性为依据,建立了修正的JNDD（modified JNDD,MJNDD）模型、恰可觉察视差差异（Just Noticeable Disparity Difference Model,JNDi D）模型和感知深度的JNDD模型（Just noticeable perceived depth difference,JNPDD）。MJNDD模型采用三段线性函数建模,比现有两段和四段模型预测准确性高,与主观测试数据的线性相关系数（Pearson Linear Correlation Coefficient,PLCC）达到0.99。JNDi D模型假设辐辏冲突中会聚占优势,为统一表示不同显示观看条件下的JNDi D模型提供了基础。JNPDD模型以自然场景下JNDD阈值为纽带,将各种显示观看条件下的JNDD阈值函数联系在一起,形成函数族。JNPDD阈值依据显示观看参数计算,可以跨显示器使用。最后,我们提出一种面向虚拟视失真的感知编码算法,应用基于深度的虚拟视失真模型修改深度编码率失真准则的失真测度,应用JNDi D模型滤波深度预测残差。实验结果证明该算法提高了3DV编码性能,在保持视觉感知质量的同时降低了码流速率。该算法从应用层面证实所提出的基于深度的虚拟视失真模型和JNDi D模型的有效性。

...

3.基于SLIC融合纹理和直方图的图像显著性检测

关键词：
SLIC算法颜色特征空间位置特征纹理特征直方图显著性检测基金资助：国家科技支撑计划基金（No.2012BAH67F01）；国家自然科学基金（No.U1301257）；浙江省自然科学基金（No.LY17F010005）；专辑：信息科技专题：计算机软件及计算机应用分类号：TP391.41 手机阅读

丁华;王晓东;章联军;陈晓爱;赖佩霞
《》
0年
卷
期
期刊

针对基于颜色直方图的显著图无法突出边缘轮廓和纹理细节的问题,结合图像的颜色特征、空间位置特征、纹理特征以及直方图,提出了一种基于SLIC融合纹理和直方图的图像显著性检测方法。该方法首先通过SLIC算法对图像进行超像素分割,提取基于颜色和空间位置的显著图;然后分别提取基于颜色直方图的显著图和基于纹理特征的显著图;最后将前两个阶段得到的显著图进行融合得到最终的显著图。此外,通过简单的阈值分割方法得到图像中的显著性目标。实验结果表明,与经典显著性检测算法相比,提出的算法性能明显优于其他算法性能。

...

4.A Novel No-Reference Quality Assessment Metric for Stereoscopic Images with Consideration of Comprehensive 3D Quality Information.

关键词：
machine learning; natural scene statistics; no reference; spatial domain; stereo visual information; stereoscopic image quality assessment; transform domain

Shen, Liquan;Yao, Yang;Geng, Xianqiu;Fang, Ruigang;Wu, Dapeng
《Sensors 》
2023年
23卷
13期
期刊

Recently, stereoscopic image quality assessment has attracted a lot attention. However, compared with 2D image quality assessment, it is much more difficult to assess the quality of stereoscopic images due to the lack of understanding of 3D visual perception. This paper proposes a novel no-reference quality assessment metric for stereoscopic images using natural scene statistics with consideration of both the quality of the cyclopean image and 3D visual perceptual information (binocular fusion and binocular rivalry). In the proposed method, not only is the quality of the cyclopean image considered, but binocular rivalry and other 3D visual intrinsic properties are also exploited. Specifically, in order to improve the objective quality of the cyclopean image, features of the cyclopean images in both the spatial domain and transformed domain are extracted based on the natural scene statistics (NSS) model. Furthermore, to better comprehend intrinsic properties of the stereoscopic image, in our method, the binocular rivalry effect and other 3D visual properties are also considered in the process of feature extraction. Following adaptive feature pruning using principle component analysis, improved metric accuracy can be found in our proposed method. The experimental results show that the proposed metric can achieve a good and consistent alignment with subjective assessment of stereoscopic images in comparison with existing methods, with the highest SROCC (0.952) and PLCC (0.962) scores being acquired on the LIVE 3D database Phase I.

...

5.基于迁移学习的蔬菜图像识别方法

关键词：
蔬菜图像识别卷积神经网络迁移学习小样本基金资助：国家科技支撑计划项目（2012BAH67F01）；国家自然科学基金（U1301257）；浙江省自然科学基金（LY17F010005）；专辑：基础科学信息科技专题：计算机软件及计算机应用自动化技术分类号：TP391.41TP18 手机阅读

赖佩霞;王晓东;章联军
《》
0年
卷
期
期刊

为解决蔬菜识别领域缺少带标签样本的问题,提出了一种基于迁移学习的图像识别方法.首先,将原始数据集利用数据增强扩大样本数据量后引入到大规模数据集上的预训练模型.针对迁移过程中高层特征的领域特定性导致的网络泛化性能差,通过加入两层自适应层参数初始化后重新训练得到基本模型;对该基本模型再利用参数冻结的迁移方式进一步调优参数,得到用于蔬菜图像识别的最终网络模型.实验表明,基于CaffeNet和ResNet10两个小型网络的迁移策略可以较好地处理小样本的蔬菜图像识别,训练得到的模型准确率分别为94.97%、96.69%.与其他迁移算法及传统的神经网络方法相比,该算法具有更高的识别性以及更强的鲁棒性.

...

6.No-Reference Light Field Image Quality Assessment Using Four-Dimensional Sparse Transform

关键词：
Feature extraction; Image coding; Frequency-domain analysis; Tensors;Principal component analysis; Periodic structures; Information filters;Light field image quality assessment; no-reference; 4D discrete cosinetransform; sub-aperture gradient image array; spatial-angular quality

Xiang, Jianjun;Jiang, Gangyi;Yu, Mei;Jiang, Zhidi;Ho, Yo-Sung
《IEEE TRANSACTIONS ON MULTIMEDIA》
2023年
25卷
期
期刊

Light field imaging can simultaneously capture the intensity and direction information of light rays in the real world. Light field image (LFI) with four-dimensional (4D) data suffers from quality degradation in the process of compression, reconstruction and processing. How to evaluate the visual quality of LFI is thought-provoking. This paper proposes a no-reference LFI quality assessment metric based on high-dimensional sparse transform. Firstly, LFI's sub-aperture gradient image array (SAGIA), which is still a 4D signal, is generated by high-pass filtering between adjacent SAIs. Then, SAGIA is transformed with 4D discrete cosine transform (4D-DCT). 4D-DCT coefficients of SAGIA can characterize the angular and spatial information of LFI. And the logarithmic amplitudes of the coefficients at the same position of SAGIA?s transformed 4D blocks are averaged as the coefficient energy. Subsequently, the 4D-DCT coefficients of SAGIA are divided into the spatial-angular frequency bands and spatial-angular orientation bands, and the corresponding energy features are extracted by converging the coefficient energy of the same band. In addition, the coefficients' amplitudes at the same position of blocks are fitted by the Weibull distribution. Then, the fitted parameters of each position are concatenated, and cropped with principal component analysis to obtain the compact features. Finally, the extracted features are pooled to predict the visual quality of the distorted LFIs. The experimental results demonstrate that the proposed method is more consistent with the subjective evaluation on three LFI databases, compared with the state-of-the-art image quality assessment methods and LFI quality assessment methods.

...

7.Multi-Angle Projection Based Blind Omnidirectional Image Quality Assessment

关键词：
Feature extraction; Distortion; Quality assessment; Image quality; Imagecolor analysis; Visualization; Resists; Omnidirectional image; blindquality assessment; multi-angle projection; tensor space;STATISTICS

Jiang, Hao;Jiang, Gangyi;Yu, Mei;Luo, Ting;Xu, Haiyong
《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》
2022年
32卷
7期
期刊

Most of the existing blind omnidirectional image quality assessment (BOIQA) methods are based on data-driven approach where the end-to-end neural network or deep learning tools are mainly used for feature extraction. However, it usually lacks interpretability and is difficult to discover the perceptual mechanism behind. In this paper, from the perspective of perception modeling, we propose a novel multi-angle projection based BOIQA (MP-BOIQA) method. Considering the omnibearing and near eye display characteristics with head mounted display, multiple color cubemap projection images with respect to different viewpoints are grouped as the color omnidirectional distortion (COD) units so as to simulate the user's viewing behavior in subjective quality assessment. In the designed multi-angle projection based feature extractor, tensor decomposition is implemented on each COD unit for dimensionality reduction, and piecewise exponential fitting is used to get the distribution of mean subtracted contrast normalized coefficients of the unit's feature matrices in tensor domain. Finally, the extracted features are pooled with random forest. The experimental results on three omnidirectional image quality datasets show that the MP-BOIQA method can deliver highly competitive performance compared with some representative full-reference quality assessment methods, as well as some state-of-the-art BOIQA methods.

...

8.FPGA implementation of Full HD real-time depth estimation

Hejian Li；Ping An；Guowei Teng；Zhaoyang Zhang；
《》
0年
卷
期
期刊

9.基于Logistic和Arnold变换的HEVC选择性加密方案

关键词：
高效视频编码;Logistic;Arnold变换;选择性加密;变换单元;语法元素

周怡钊;王晓东;章联军;兰琼琼
《计算机应用》
2019年
卷
10期
期刊

为了有效地保护视频信息,根据H.265/高效视频编码（HEVC）的特点,提出一种变换系数置乱和语法元素加密相结合的方案。针对变换单元（TU）,利用Arnold变换对4×4大小的TU进行置乱,同时设计了一种移位加密器,根据TU的直流电（DC）系数近似分布规律对加密器进行初始化,并用Arnold变换生成加密映射对8×8、16×16、32×32大小TU的DC系数进行移位加密。针对熵编码过程中部分采用旁路编码的语法元素,利用Logistic混沌序列进行加密。加密后的视频峰值信噪比（PSNR）和结构相似性（SSIM）分别平均下降了26.1 dB和0.51,压缩率仅降低了1.126%,也仅带来0.170%的编码时间增长。实验结果表明,在保证较好的加密效果、对比特率影响较小的前提下,所提方案具有较小的额外编码开销,适用于实时视频应用。

...

10.Collaborative Representation Cascade for Single-Image Super-Resolution

关键词：
Image reconstruction;Learning systems;Optical resolving power;Mapping;Multilayers;Image enhancement;Recovery;Bicubic interpolation;Collaborative representations;Enhancement framework;Feature space;Interpolated images;Number of principal components;Reconstructed image;Super resolution

Zhang, Yongbing;Zhang, Yulun;Zhang, Jian;Xu, Dong;Fu, Yun;Wang, Yisen;Ji, Xiangyang;Dai, Qionghai
《IEEE Transactions on Systems, Man, and Cybernetics: Systems》
2019年
49卷
5期
期刊

Most recent learning-based single-image super-resolution methods first interpolate the low-resolution (LR) input, from which overlapped LR features are then extracted to reconstruct their high-resolution (HR) counterparts and the final HR image. However, most of them neglect to take advantage of the intermediate recovered HR image to enhance image quality further. We conduct principal component analysis (PCA) to reduce LR feature dimension. Then we find that the number of principal components after conducting PCA in the LR feature space from the reconstructed images is larger than that from the interpolated images by using bicubic interpolation. Based on this observation, we present an unsophisticated yet effective framework named collaborative representation cascade (CRC) that learns multilayer mapping models between LR and HR feature pairs. In particular, we extract the features from the intermediate recovered image to upscale and enhance LR input progressively. In the learning phase, for each cascade layer, we use the intermediate recovered results and their original HR counterparts to learn single-layer mapping model. Then, we use this single-layer mapping model to super-resolve the original LR inputs. And the intermediate HR outputs are regarded as training inputs for the next cascade layer, until we obtain multilayer mapping models. In the reconstruction phase, we extract multiple sets of LR features from the LR image and intermediate recovered. Then, in each cascade layer, mapping model is utilized to pursue HR image. Our experiments on several commonly used image SR testing datasets show that our proposed CRC method achieves state-of-the-art image SR results, and CRC can also be served as a general image enhancement framework.
© 2013 IEEE.

...

排序方式：时间相关性
显示方式：列表摘要