高清3D裸眼视频内容生成与编码
项目来源
项目主持人
项目受资助机构
立项年度
立项时间
项目编号
研究期限
项目级别
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
项目受资助省
项目结题报告(全文)
1.Deep Light Field Spatial Super-Resolution Using Heterogeneous Imaging
- 关键词:
- Cameras; Spatial resolution; Superresolution; Visualization; Imagereconstruction; Light fields; Training; Light field; heterogeneousimaging; spatial super-resolution; pyramid reconstruction;RESOLUTION; CAMERAS
- Chen, Yeyao;Jiang, Gangyi;Yu, Mei;Xu, Haiyong;Ho, Yo-Sung
- 《IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS》
- 2023年
- 29卷
- 10期
- 期刊
Light field (LF) imaging expands traditional imaging techniques by simultaneously capturing the intensity and direction information of light rays, and promotes many visual applications. However, owing to the inherent trade-off between the spatial and angular dimensions, LF images acquired by LF cameras usually suffer from low spatial resolution. Many current approaches increase the spatial resolution by exploring the four-dimensional (4D) structure of the LF images, but they have difficulties in recovering fine textures at a large upscaling factor. To address this challenge, this paper proposes a new deep learning-based LF spatial super-resolution method using heterogeneous imaging (LFSSR-HI). The designed heterogeneous imaging system uses an extra high-resolution (HR) traditional camera to capture the abundant spatial information in addition to the LF camera imaging, where the auxiliary information from the HR camera is utilized to super-resolve the LF image. Specifically, an LF feature alignment module is constructed to learn the correspondence between the 4D LF image and the 2D HR image to realize information alignment. Subsequently, a multi-level spatial-angular feature enhancement module is designed to gradually embed the aligned HR information into the rough LF features. Finally, the enhanced LF features are reconstructed into a super-resolved LF image using a simple feature decoder. To improve the flexibility of the proposed method, a pyramid reconstruction strategy is leveraged to generate multi-scale super-resolution results in one forward inference. The experimental results show that the proposed LFSSR-HI method achieves significant advantages over the state-of-the-art methods in both qualitative and quantitative comparisons. Furthermore, the proposed method preserves more accurate angular consistency.
...2.三维视频编码中深度失真模型研究
- 关键词:
- 3DV 深度视频 虚拟视失真 感知编码 基金资助:国家自然科学基金项目“自由视点多视点视频编码及3D立体显示基础理论与关键技术研究”(项目编号:60832003); 国家自然科学基金项目“面向绘制质量的深度提取及其编码方法研究”(项目编号:61172096); 国家自然科学基金项目“高清3D裸眼视频内容生成与编码”(项目编号:U1301257)。; 专辑:信息科技 专题:电信技术 DOI:10.27300/d.cnki.gshau.2019.000538 分类号:TN919.81 导师:安平 手机阅读
- 0年
- 卷
- 期
- 期刊
随着计算机和通信技术的迅速发展,三维视频(Three Dimension Video,3DV)逐渐替代二维视频(Two Dimension Video,2DV)成为下一代主流视频技术。人们观看3DV能获得丰富的立体感和沉浸感。自由立体显示技术的兴起,不仅使观众摆脱了眼镜的束缚,还向观众提供了视点交互选择功能。系统根据用户需求呈现相应视点的3DV。多视视频巨大的数据量,对信息传输基础设施形成挑战。深度增强型数据格式表示的3DV由少量参考视的彩色视频和深度视频组成,在接收端通过合成虚拟视的方式提供多视点视频。深度增强型数据格式3DV减轻了多视视频数据传输量,引起研究者的关注。深度视频用于控制虚拟视合成。研究深度视频的压缩失真对于合成的虚拟视质量的影响,具有重要意义。一方面,在3DV编码过程中,合理控制深度视频失真,能够改善虚拟视质量,提高3DV的视觉体验质量。另一方面,并非所有的深度失真都使虚拟视感知质量下降,利用深度视觉感知特性,抑制了恰可觉察深度差异(Just Noticable Depth Difference,JNDD)阈值以下的深度失真,可以有效提高3DV编码效率。本文对3DV编码中的深度失真机理进行了深入研究,主要学术贡献及创新点包括以下几方面:首先,研究了深度失真对虚拟视失真的影响,建立了基于深度的虚拟视失真模型。深度图划分为平坦块和非平坦块,平坦块使用频域方法整体计算虚拟视失真;非平坦块逐像素分析遮挡关系变化,计算失真代价。在非平坦块中,我们不仅分析误遮挡像素失真,还进一步考虑了误显露像素产生的褶皱失真。边缘区域在深度图中的比例虽小,但是对于虚拟视失真影响显著。为了准确分类深度图分块,我们采用基于视差的深度图编码块的分类准则,建立阈值函数,分类阈值根据拍摄参数和场景参数集调整。本文所提出的模型提高了模型估计性能,平均预测均方误差与实测均方误差差异降低到2.9。然后,以人类视觉系统(Human Visual System,HVS)的立体视觉生理结构和深度感知特性为依据,建立了修正的JNDD(modified JNDD,MJNDD)模型、恰可觉察视差差异(Just Noticeable Disparity Difference Model,JNDi D)模型和感知深度的JNDD模型(Just noticeable perceived depth difference,JNPDD)。MJNDD模型采用三段线性函数建模,比现有两段和四段模型预测准确性高,与主观测试数据的线性相关系数(Pearson Linear Correlation Coefficient,PLCC)达到0.99。JNDi D模型假设辐辏冲突中会聚占优势,为统一表示不同显示观看条件下的JNDi D模型提供了基础。JNPDD模型以自然场景下JNDD阈值为纽带,将各种显示观看条件下的JNDD阈值函数联系在一起,形成函数族。JNPDD阈值依据显示观看参数计算,可以跨显示器使用。最后,我们提出一种面向虚拟视失真的感知编码算法,应用基于深度的虚拟视失真模型修改深度编码率失真准则的失真测度,应用JNDi D模型滤波深度预测残差。实验结果证明该算法提高了3DV编码性能,在保持视觉感知质量的同时降低了码流速率。该算法从应用层面证实所提出的基于深度的虚拟视失真模型和JNDi D模型的有效性。
...3.基于SLIC融合纹理和直方图的图像显著性检测
- 关键词:
- SLIC算法 颜色特征 空间位置特征 纹理特征 直方图 显著性检测 基金资助:国家科技支撑计划基金(No.2012BAH67F01); 国家自然科学基金(No.U1301257); 浙江省自然科学基金(No.LY17F010005); 专辑:信息科技 专题:计算机软件及计算机应用 分类号:TP391.41 手机阅读
- 丁华;王晓东;章联军;陈晓爱;赖佩霞
- 0年
- 卷
- 期
- 期刊
针对基于颜色直方图的显著图无法突出边缘轮廓和纹理细节的问题,结合图像的颜色特征、空间位置特征、纹理特征以及直方图,提出了一种基于SLIC融合纹理和直方图的图像显著性检测方法。该方法首先通过SLIC算法对图像进行超像素分割,提取基于颜色和空间位置的显著图;然后分别提取基于颜色直方图的显著图和基于纹理特征的显著图;最后将前两个阶段得到的显著图进行融合得到最终的显著图。此外,通过简单的阈值分割方法得到图像中的显著性目标。实验结果表明,与经典显著性检测算法相比,提出的算法性能明显优于其他算法性能。
...4.A Novel No-Reference Quality Assessment Metric for Stereoscopic Images with Consideration of Comprehensive 3D Quality Information.
- 关键词:
- machine learning; natural scene statistics; no reference; spatial domain; stereo visual information; stereoscopic image quality assessment; transform domain
- Shen, Liquan;Yao, Yang;Geng, Xianqiu;Fang, Ruigang;Wu, Dapeng
- 《Sensors 》
- 2023年
- 23卷
- 13期
- 期刊
Recently, stereoscopic image quality assessment has attracted a lot attention. However, compared with 2D image quality assessment, it is much more difficult to assess the quality of stereoscopic images due to the lack of understanding of 3D visual perception. This paper proposes a novel no-reference quality assessment metric for stereoscopic images using natural scene statistics with consideration of both the quality of the cyclopean image and 3D visual perceptual information (binocular fusion and binocular rivalry). In the proposed method, not only is the quality of the cyclopean image considered, but binocular rivalry and other 3D visual intrinsic properties are also exploited. Specifically, in order to improve the objective quality of the cyclopean image, features of the cyclopean images in both the spatial domain and transformed domain are extracted based on the natural scene statistics (NSS) model. Furthermore, to better comprehend intrinsic properties of the stereoscopic image, in our method, the binocular rivalry effect and other 3D visual properties are also considered in the process of feature extraction. Following adaptive feature pruning using principle component analysis, improved metric accuracy can be found in our proposed method. The experimental results show that the proposed metric can achieve a good and consistent alignment with subjective assessment of stereoscopic images in comparison with existing methods, with the highest SROCC (0.952) and PLCC (0.962) scores being acquired on the LIVE 3D database Phase I.
...5.基于迁移学习的蔬菜图像识别方法
- 关键词:
- 蔬菜图像识别 卷积神经网络 迁移学习 小样本 基金资助:国家科技支撑计划项目(2012BAH67F01); 国家自然科学基金(U1301257); 浙江省自然科学基金(LY17F010005); 专辑:基础科学 信息科技 专题:计算机软件及计算机应用 自动化技术 分类号:TP391.41TP18 手机阅读
- 赖佩霞;王晓东;章联军
- 0年
- 卷
- 期
- 期刊
为解决蔬菜识别领域缺少带标签样本的问题,提出了一种基于迁移学习的图像识别方法.首先,将原始数据集利用数据增强扩大样本数据量后引入到大规模数据集上的预训练模型.针对迁移过程中高层特征的领域特定性导致的网络泛化性能差,通过加入两层自适应层参数初始化后重新训练得到基本模型;对该基本模型再利用参数冻结的迁移方式进一步调优参数,得到用于蔬菜图像识别的最终网络模型.实验表明,基于CaffeNet和ResNet10两个小型网络的迁移策略可以较好地处理小样本的蔬菜图像识别,训练得到的模型准确率分别为94.97%、96.69%.与其他迁移算法及传统的神经网络方法相比,该算法具有更高的识别性以及更强的鲁棒性.
...6.No-Reference Light Field Image Quality Assessment Using Four-Dimensional Sparse Transform
- 关键词:
- Feature extraction; Image coding; Frequency-domain analysis; Tensors;Principal component analysis; Periodic structures; Information filters;Light field image quality assessment; no-reference; 4D discrete cosinetransform; sub-aperture gradient image array; spatial-angular quality
- Xiang, Jianjun;Jiang, Gangyi;Yu, Mei;Jiang, Zhidi;Ho, Yo-Sung
- 《IEEE TRANSACTIONS ON MULTIMEDIA》
- 2023年
- 25卷
- 期
- 期刊
Light field imaging can simultaneously capture the intensity and direction information of light rays in the real world. Light field image (LFI) with four-dimensional (4D) data suffers from quality degradation in the process of compression, reconstruction and processing. How to evaluate the visual quality of LFI is thought-provoking. This paper proposes a no-reference LFI quality assessment metric based on high-dimensional sparse transform. Firstly, LFI's sub-aperture gradient image array (SAGIA), which is still a 4D signal, is generated by high-pass filtering between adjacent SAIs. Then, SAGIA is transformed with 4D discrete cosine transform (4D-DCT). 4D-DCT coefficients of SAGIA can characterize the angular and spatial information of LFI. And the logarithmic amplitudes of the coefficients at the same position of SAGIA?s transformed 4D blocks are averaged as the coefficient energy. Subsequently, the 4D-DCT coefficients of SAGIA are divided into the spatial-angular frequency bands and spatial-angular orientation bands, and the corresponding energy features are extracted by converging the coefficient energy of the same band. In addition, the coefficients' amplitudes at the same position of blocks are fitted by the Weibull distribution. Then, the fitted parameters of each position are concatenated, and cropped with principal component analysis to obtain the compact features. Finally, the extracted features are pooled to predict the visual quality of the distorted LFIs. The experimental results demonstrate that the proposed method is more consistent with the subjective evaluation on three LFI databases, compared with the state-of-the-art image quality assessment methods and LFI quality assessment methods.
...7.Multi-Angle Projection Based Blind Omnidirectional Image Quality Assessment
- 关键词:
- Feature extraction; Distortion; Quality assessment; Image quality; Imagecolor analysis; Visualization; Resists; Omnidirectional image; blindquality assessment; multi-angle projection; tensor space;STATISTICS
- Jiang, Hao;Jiang, Gangyi;Yu, Mei;Luo, Ting;Xu, Haiyong
- 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》
- 2022年
- 32卷
- 7期
- 期刊
Most of the existing blind omnidirectional image quality assessment (BOIQA) methods are based on data-driven approach where the end-to-end neural network or deep learning tools are mainly used for feature extraction. However, it usually lacks interpretability and is difficult to discover the perceptual mechanism behind. In this paper, from the perspective of perception modeling, we propose a novel multi-angle projection based BOIQA (MP-BOIQA) method. Considering the omnibearing and near eye display characteristics with head mounted display, multiple color cubemap projection images with respect to different viewpoints are grouped as the color omnidirectional distortion (COD) units so as to simulate the user's viewing behavior in subjective quality assessment. In the designed multi-angle projection based feature extractor, tensor decomposition is implemented on each COD unit for dimensionality reduction, and piecewise exponential fitting is used to get the distribution of mean subtracted contrast normalized coefficients of the unit's feature matrices in tensor domain. Finally, the extracted features are pooled with random forest. The experimental results on three omnidirectional image quality datasets show that the MP-BOIQA method can deliver highly competitive performance compared with some representative full-reference quality assessment methods, as well as some state-of-the-art BOIQA methods.
...8.FPGA implementation of Full HD real-time depth estimation
- Hejian Li;Ping An;Guowei Teng;Zhaoyang Zhang;
- 0年
- 卷
- 期
- 期刊
9.基于Logistic和Arnold变换的HEVC选择性加密方案
- 关键词:
- 高效视频编码;Logistic;Arnold变换;选择性加密;变换单元;语法元素
- 周怡钊;王晓东;章联军;兰琼琼
- 《计算机应用》
- 2019年
- 卷
- 10期
- 期刊
为了有效地保护视频信息,根据H.265/高效视频编码(HEVC)的特点,提出一种变换系数置乱和语法元素加密相结合的方案。针对变换单元(TU),利用Arnold变换对4×4大小的TU进行置乱,同时设计了一种移位加密器,根据TU的直流电(DC)系数近似分布规律对加密器进行初始化,并用Arnold变换生成加密映射对8×8、16×16、32×32大小TU的DC系数进行移位加密。针对熵编码过程中部分采用旁路编码的语法元素,利用Logistic混沌序列进行加密。加密后的视频峰值信噪比(PSNR)和结构相似性(SSIM)分别平均下降了26.1 dB和0.51,压缩率仅降低了1.126%,也仅带来0.170%的编码时间增长。实验结果表明,在保证较好的加密效果、对比特率影响较小的前提下,所提方案具有较小的额外编码开销,适用于实时视频应用。
...10.Collaborative Representation Cascade for Single-Image Super-Resolution
- 关键词:
- Image reconstruction;Learning systems;Optical resolving power;Mapping;Multilayers;Image enhancement;Recovery;Bicubic interpolation;Collaborative representations;Enhancement framework;Feature space;Interpolated images;Number of principal components;Reconstructed image;Super resolution
- Zhang, Yongbing;Zhang, Yulun;Zhang, Jian;Xu, Dong;Fu, Yun;Wang, Yisen;Ji, Xiangyang;Dai, Qionghai
- 《IEEE Transactions on Systems, Man, and Cybernetics: Systems》
- 2019年
- 49卷
- 5期
- 期刊
Most recent learning-based single-image super-resolution methods first interpolate the low-resolution (LR) input, from which overlapped LR features are then extracted to reconstruct their high-resolution (HR) counterparts and the final HR image. However, most of them neglect to take advantage of the intermediate recovered HR image to enhance image quality further. We conduct principal component analysis (PCA) to reduce LR feature dimension. Then we find that the number of principal components after conducting PCA in the LR feature space from the reconstructed images is larger than that from the interpolated images by using bicubic interpolation. Based on this observation, we present an unsophisticated yet effective framework named collaborative representation cascade (CRC) that learns multilayer mapping models between LR and HR feature pairs. In particular, we extract the features from the intermediate recovered image to upscale and enhance LR input progressively. In the learning phase, for each cascade layer, we use the intermediate recovered results and their original HR counterparts to learn single-layer mapping model. Then, we use this single-layer mapping model to super-resolve the original LR inputs. And the intermediate HR outputs are regarded as training inputs for the next cascade layer, until we obtain multilayer mapping models. In the reconstruction phase, we extract multiple sets of LR features from the LR image and intermediate recovered. Then, in each cascade layer, mapping model is utilized to pursue HR image. Our experiments on several commonly used image SR testing datasets show that our proposed CRC method achieves state-of-the-art image SR results, and CRC can also be served as a general image enhancement framework.
...
© 2013 IEEE.
