高清3D裸眼视频内容生成与编码
项目来源
项目主持人
项目受资助机构
立项年度
立项时间
项目编号
项目级别
研究期限
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
项目受资助省
项目结题报告(全文)
1.Deep Light Field Spatial Super-Resolution Using Heterogeneous Imaging
- 关键词:
- Cameras; Spatial resolution; Superresolution; Visualization; Imagereconstruction; Light fields; Training; Light field; heterogeneousimaging; spatial super-resolution; pyramid reconstruction;RESOLUTION; CAMERAS
- Chen, Yeyao;Jiang, Gangyi;Yu, Mei;Xu, Haiyong;Ho, Yo-Sung
- 《IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS》
- 2023年
- 29卷
- 10期
- 期刊
Light field (LF) imaging expands traditional imaging techniques by simultaneously capturing the intensity and direction information of light rays, and promotes many visual applications. However, owing to the inherent trade-off between the spatial and angular dimensions, LF images acquired by LF cameras usually suffer from low spatial resolution. Many current approaches increase the spatial resolution by exploring the four-dimensional (4D) structure of the LF images, but they have difficulties in recovering fine textures at a large upscaling factor. To address this challenge, this paper proposes a new deep learning-based LF spatial super-resolution method using heterogeneous imaging (LFSSR-HI). The designed heterogeneous imaging system uses an extra high-resolution (HR) traditional camera to capture the abundant spatial information in addition to the LF camera imaging, where the auxiliary information from the HR camera is utilized to super-resolve the LF image. Specifically, an LF feature alignment module is constructed to learn the correspondence between the 4D LF image and the 2D HR image to realize information alignment. Subsequently, a multi-level spatial-angular feature enhancement module is designed to gradually embed the aligned HR information into the rough LF features. Finally, the enhanced LF features are reconstructed into a super-resolved LF image using a simple feature decoder. To improve the flexibility of the proposed method, a pyramid reconstruction strategy is leveraged to generate multi-scale super-resolution results in one forward inference. The experimental results show that the proposed LFSSR-HI method achieves significant advantages over the state-of-the-art methods in both qualitative and quantitative comparisons. Furthermore, the proposed method preserves more accurate angular consistency.
...2.三维视频编码中深度失真模型研究
- 关键词:
- 3DV 深度视频 虚拟视失真 感知编码 基金资助:国家自然科学基金项目“自由视点多视点视频编码及3D立体显示基础理论与关键技术研究”(项目编号:60832003); 国家自然科学基金项目“面向绘制质量的深度提取及其编码方法研究”(项目编号:61172096); 国家自然科学基金项目“高清3D裸眼视频内容生成与编码”(项目编号:U1301257)。; 专辑:信息科技 专题:电信技术 DOI:10.27300/d.cnki.gshau.2019.000538 分类号:TN919.81 导师:安平 手机阅读
- 0年
- 卷
- 期
- 期刊
随着计算机和通信技术的迅速发展,三维视频(Three Dimension Video,3DV)逐渐替代二维视频(Two Dimension Video,2DV)成为下一代主流视频技术。人们观看3DV能获得丰富的立体感和沉浸感。自由立体显示技术的兴起,不仅使观众摆脱了眼镜的束缚,还向观众提供了视点交互选择功能。系统根据用户需求呈现相应视点的3DV。多视视频巨大的数据量,对信息传输基础设施形成挑战。深度增强型数据格式表示的3DV由少量参考视的彩色视频和深度视频组成,在接收端通过合成虚拟视的方式提供多视点视频。深度增强型数据格式3DV减轻了多视视频数据传输量,引起研究者的关注。深度视频用于控制虚拟视合成。研究深度视频的压缩失真对于合成的虚拟视质量的影响,具有重要意义。一方面,在3DV编码过程中,合理控制深度视频失真,能够改善虚拟视质量,提高3DV的视觉体验质量。另一方面,并非所有的深度失真都使虚拟视感知质量下降,利用深度视觉感知特性,抑制了恰可觉察深度差异(Just Noticable Depth Difference,JNDD)阈值以下的深度失真,可以有效提高3DV编码效率。本文对3DV编码中的深度失真机理进行了深入研究,主要学术贡献及创新点包括以下几方面:首先,研究了深度失真对虚拟视失真的影响,建立了基于深度的虚拟视失真模型。深度图划分为平坦块和非平坦块,平坦块使用频域方法整体计算虚拟视失真;非平坦块逐像素分析遮挡关系变化,计算失真代价。在非平坦块中,我们不仅分析误遮挡像素失真,还进一步考虑了误显露像素产生的褶皱失真。边缘区域在深度图中的比例虽小,但是对于虚拟视失真影响显著。为了准确分类深度图分块,我们采用基于视差的深度图编码块的分类准则,建立阈值函数,分类阈值根据拍摄参数和场景参数集调整。本文所提出的模型提高了模型估计性能,平均预测均方误差与实测均方误差差异降低到2.9。然后,以人类视觉系统(Human Visual System,HVS)的立体视觉生理结构和深度感知特性为依据,建立了修正的JNDD(modified JNDD,MJNDD)模型、恰可觉察视差差异(Just Noticeable Disparity Difference Model,JNDi D)模型和感知深度的JNDD模型(Just noticeable perceived depth difference,JNPDD)。MJNDD模型采用三段线性函数建模,比现有两段和四段模型预测准确性高,与主观测试数据的线性相关系数(Pearson Linear Correlation Coefficient,PLCC)达到0.99。JNDi D模型假设辐辏冲突中会聚占优势,为统一表示不同显示观看条件下的JNDi D模型提供了基础。JNPDD模型以自然场景下JNDD阈值为纽带,将各种显示观看条件下的JNDD阈值函数联系在一起,形成函数族。JNPDD阈值依据显示观看参数计算,可以跨显示器使用。最后,我们提出一种面向虚拟视失真的感知编码算法,应用基于深度的虚拟视失真模型修改深度编码率失真准则的失真测度,应用JNDi D模型滤波深度预测残差。实验结果证明该算法提高了3DV编码性能,在保持视觉感知质量的同时降低了码流速率。该算法从应用层面证实所提出的基于深度的虚拟视失真模型和JNDi D模型的有效性。
...3.基于SLIC融合纹理和直方图的图像显著性检测
- 关键词:
- SLIC算法 颜色特征 空间位置特征 纹理特征 直方图 显著性检测 基金资助:国家科技支撑计划基金(No.2012BAH67F01); 国家自然科学基金(No.U1301257); 浙江省自然科学基金(No.LY17F010005); 专辑:信息科技 专题:计算机软件及计算机应用 分类号:TP391.41 手机阅读
- 丁华;王晓东;章联军;陈晓爱;赖佩霞
- 0年
- 卷
- 期
- 期刊
针对基于颜色直方图的显著图无法突出边缘轮廓和纹理细节的问题,结合图像的颜色特征、空间位置特征、纹理特征以及直方图,提出了一种基于SLIC融合纹理和直方图的图像显著性检测方法。该方法首先通过SLIC算法对图像进行超像素分割,提取基于颜色和空间位置的显著图;然后分别提取基于颜色直方图的显著图和基于纹理特征的显著图;最后将前两个阶段得到的显著图进行融合得到最终的显著图。此外,通过简单的阈值分割方法得到图像中的显著性目标。实验结果表明,与经典显著性检测算法相比,提出的算法性能明显优于其他算法性能。
...4.A Novel No-Reference Quality Assessment Metric for Stereoscopic Images with Consideration of Comprehensive 3D Quality Information.
- 关键词:
- machine learning; natural scene statistics; no reference; spatial domain; stereo visual information; stereoscopic image quality assessment; transform domain
- Shen, Liquan;Yao, Yang;Geng, Xianqiu;Fang, Ruigang;Wu, Dapeng
- 《Sensors 》
- 2023年
- 23卷
- 13期
- 期刊
Recently, stereoscopic image quality assessment has attracted a lot attention. However, compared with 2D image quality assessment, it is much more difficult to assess the quality of stereoscopic images due to the lack of understanding of 3D visual perception. This paper proposes a novel no-reference quality assessment metric for stereoscopic images using natural scene statistics with consideration of both the quality of the cyclopean image and 3D visual perceptual information (binocular fusion and binocular rivalry). In the proposed method, not only is the quality of the cyclopean image considered, but binocular rivalry and other 3D visual intrinsic properties are also exploited. Specifically, in order to improve the objective quality of the cyclopean image, features of the cyclopean images in both the spatial domain and transformed domain are extracted based on the natural scene statistics (NSS) model. Furthermore, to better comprehend intrinsic properties of the stereoscopic image, in our method, the binocular rivalry effect and other 3D visual properties are also considered in the process of feature extraction. Following adaptive feature pruning using principle component analysis, improved metric accuracy can be found in our proposed method. The experimental results show that the proposed metric can achieve a good and consistent alignment with subjective assessment of stereoscopic images in comparison with existing methods, with the highest SROCC (0.952) and PLCC (0.962) scores being acquired on the LIVE 3D database Phase I.
...5.基于深度学习的视频编码技术研究
- 关键词:
- 视频编码;深度学习;CNN;GAN;HEVC
- 金智鹏
- 指导老师:上海大学 安平
- 0年
- 学位论文
自20世纪八十年代以来,视频编码(Video coding)技术蓬勃发展,广泛应用于远程教育、远程医疗、可视电话、视频会议、视频点播、交互式视频游戏、安全监控、虚拟实现等领域,对整个信息产业的发展起到了巨大的推动作用。现实生活中,无处不在的视频应用更是催生出海量的视频数据;特别是近年来人们强烈追求更清晰、更流畅、更逼真的视觉体验,使得视频数据呈现爆发式增长,对视频压缩效率提出了更高的要求。高效视频编码标准HEVC(High Efficiency Video Coding)相比上一代国际视频编码标准H.264/AVC,可以使1080P视频内容的压缩效率提高50%左右。在HEVC框架中,帧内帧间预测技术、环路滤波技术、快速编码技术是保障其压缩率、感知质量以及编码速度的三大重要技术领域。尽管针对帧内帧间预测、环路滤波和快速编码算法的改进工作已经有很多,但是HEVC编码性能仍不能达到最优,很大一部分原因是受到手工设计的特征提取和特征建模的性能限制。近年来,随着深度学习(Deep learning)技术的再次兴起及其在计算机视觉领域的广泛成功,视频编码技术开启了端到端自动建模的研究新领域。基于深度学习的预测编码技术可以有效提高视频压缩率,基于深度学习的环路滤波技术可以有效提高解码图的视觉舒适度,它们在直播推流、视频传输等领域具有广泛的应用价值和商业价值。尽管应用前景广阔,但是基于深度学习的视频编码技术还处于研究初期,相关研究成果还没有形成完整的体系,在深度神经网络结构、网络可解释性、训练数据集的构建、网络训练方法、网络计算效率、跨平台的代码兼容性等方面还需开展细致的研究。因此,开展基于深度学习的视频高效编码算法研究,完善基于深度学习的视频编码理论,具有重要的理论与现实意义。本文对基于深度学习的视频编码关键技术进行了深入研究,主要的学术贡献及创新点包括以下几个方面:1.提出基于感知对抗和渐进式网络的环路滤波算法框架(MPRNet),性能超越了去块滤波、样本自适应偏移、自适应环路滤波等传统的环路滤波方法。本方法主要从三个层面上提高算法性能:将解码图像(未经环路滤波的图像)送入深度学习网络进行逐级渐进式的增强处理,有效解决了块效应、振铃效应、模糊等由编码导致的图像失真问题,提高了编码效率和主观视觉质量;采用对抗训练方式,结合均方误差损失、感知损失和对抗损失,能有效提高图像的视觉质量;采用可伸缩的多级渐进式的CNN网络,用多级均方误差损失来控制各级网络的拟合功能,实现由粗到细渐进式的增强处理效果,能在计算量和增强效果之间进行平衡。2.提出基于卷积自编码机的帧内预测算法框架(IPCED),采用端到端数据驱动的方式进行帧内预测,可有效降低预测残差,提高编码率失真性能。本方法从以下三方面提高帧内预测性能:借鉴图像修复技术,用3个参考块进行第四象限的帧内预测,并引入GAN对抗损失和联合优化方式,提高预测准确率;提出多级直连的卷积编码器网络,把深层全局信息和浅层局部信息融合到一起,增强对参考块纹理特征的学习和表征能力;提出多级反卷积的解码器网络,逐级进行纹理重建(即帧内预测),增强预测结果的纹理丰富程度,提高预测准确率。3.提出基于CNN的编码单元结构快速决策算法框架,将QTBT编码单元划分结构优化问题转化为多分类问题,从整体上判断32×32编码块的纹理复杂度(即划分深度范围),而不是逐级判断是否需要划分为子块,从而显著提高了决策速度。本方法的创新特色有以下三点:直接预测32×32编码块的最浅和最深划分深度,有效解决了JVET中QTBT编码单元种类繁多的问题,并保持了良好的分类准确性;针对任务特点,设计了新型目标函数,包括Hingeloss和类别惩罚项,能有效提高分类准确;本方法是一种端到端的学习系统,将直接从编码单元中学习和提取分类特征,而不需要手动去设计和提取特征,也不需要时域和空域的相关性信息,有助于提高帧内编码的并行运算和独立解码性能。综上,本文以基于深度学习的视频编码技术为研究对象,对基于深度学习的环路滤波、帧内预测、和快速编码等关键技术进行了深入研究。实验结果表明,本文提出的多种算法均有效提高了视频编码效率。
...6.基于迁移学习的蔬菜图像识别方法
- 关键词:
- 蔬菜图像识别 卷积神经网络 迁移学习 小样本 基金资助:国家科技支撑计划项目(2012BAH67F01); 国家自然科学基金(U1301257); 浙江省自然科学基金(LY17F010005); 专辑:基础科学 信息科技 专题:计算机软件及计算机应用 自动化技术 分类号:TP391.41TP18 手机阅读
- 赖佩霞;王晓东;章联军
- 0年
- 卷
- 期
- 期刊
为解决蔬菜识别领域缺少带标签样本的问题,提出了一种基于迁移学习的图像识别方法.首先,将原始数据集利用数据增强扩大样本数据量后引入到大规模数据集上的预训练模型.针对迁移过程中高层特征的领域特定性导致的网络泛化性能差,通过加入两层自适应层参数初始化后重新训练得到基本模型;对该基本模型再利用参数冻结的迁移方式进一步调优参数,得到用于蔬菜图像识别的最终网络模型.实验表明,基于CaffeNet和ResNet10两个小型网络的迁移策略可以较好地处理小样本的蔬菜图像识别,训练得到的模型准确率分别为94.97%、96.69%.与其他迁移算法及传统的神经网络方法相比,该算法具有更高的识别性以及更强的鲁棒性.
...7.No-Reference Light Field Image Quality Assessment Using Four-Dimensional Sparse Transform
- 关键词:
- Feature extraction; Image coding; Frequency-domain analysis; Tensors;Principal component analysis; Periodic structures; Information filters;Light field image quality assessment; no-reference; 4D discrete cosinetransform; sub-aperture gradient image array; spatial-angular quality
- Xiang, Jianjun;Jiang, Gangyi;Yu, Mei;Jiang, Zhidi;Ho, Yo-Sung
- 《IEEE TRANSACTIONS ON MULTIMEDIA》
- 2023年
- 25卷
- 期
- 期刊
Light field imaging can simultaneously capture the intensity and direction information of light rays in the real world. Light field image (LFI) with four-dimensional (4D) data suffers from quality degradation in the process of compression, reconstruction and processing. How to evaluate the visual quality of LFI is thought-provoking. This paper proposes a no-reference LFI quality assessment metric based on high-dimensional sparse transform. Firstly, LFI's sub-aperture gradient image array (SAGIA), which is still a 4D signal, is generated by high-pass filtering between adjacent SAIs. Then, SAGIA is transformed with 4D discrete cosine transform (4D-DCT). 4D-DCT coefficients of SAGIA can characterize the angular and spatial information of LFI. And the logarithmic amplitudes of the coefficients at the same position of SAGIA?s transformed 4D blocks are averaged as the coefficient energy. Subsequently, the 4D-DCT coefficients of SAGIA are divided into the spatial-angular frequency bands and spatial-angular orientation bands, and the corresponding energy features are extracted by converging the coefficient energy of the same band. In addition, the coefficients' amplitudes at the same position of blocks are fitted by the Weibull distribution. Then, the fitted parameters of each position are concatenated, and cropped with principal component analysis to obtain the compact features. Finally, the extracted features are pooled to predict the visual quality of the distorted LFIs. The experimental results demonstrate that the proposed method is more consistent with the subjective evaluation on three LFI databases, compared with the state-of-the-art image quality assessment methods and LFI quality assessment methods.
...8.Multi-Angle Projection Based Blind Omnidirectional Image Quality Assessment
- 关键词:
- Feature extraction; Distortion; Quality assessment; Image quality; Imagecolor analysis; Visualization; Resists; Omnidirectional image; blindquality assessment; multi-angle projection; tensor space;STATISTICS
- Jiang, Hao;Jiang, Gangyi;Yu, Mei;Luo, Ting;Xu, Haiyong
- 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》
- 2022年
- 32卷
- 7期
- 期刊
Most of the existing blind omnidirectional image quality assessment (BOIQA) methods are based on data-driven approach where the end-to-end neural network or deep learning tools are mainly used for feature extraction. However, it usually lacks interpretability and is difficult to discover the perceptual mechanism behind. In this paper, from the perspective of perception modeling, we propose a novel multi-angle projection based BOIQA (MP-BOIQA) method. Considering the omnibearing and near eye display characteristics with head mounted display, multiple color cubemap projection images with respect to different viewpoints are grouped as the color omnidirectional distortion (COD) units so as to simulate the user's viewing behavior in subjective quality assessment. In the designed multi-angle projection based feature extractor, tensor decomposition is implemented on each COD unit for dimensionality reduction, and piecewise exponential fitting is used to get the distribution of mean subtracted contrast normalized coefficients of the unit's feature matrices in tensor domain. Finally, the extracted features are pooled with random forest. The experimental results on three omnidirectional image quality datasets show that the MP-BOIQA method can deliver highly competitive performance compared with some representative full-reference quality assessment methods, as well as some state-of-the-art BOIQA methods.
...9.一种基于三维离散余弦变换的立体视频质量客观评价方法
- 发明人:
- 授权日:}
- 专利
10.一种立体视频实时深度估计系统硬件实现方法
- 发明人:
- 授权日:}
- 专利
