面向服务机器人的无监督领域自适应目标检测方法研究

项目来源

国家自然科学基金(NSFC)

项目主持人

叶茂

项目受资助机构

电子科技大学

立项年度

2017

立项时间

未公开

项目编号

61773093

项目级别

国家级

研究期限

未知 / 未知

受资助金额

66.00万元

学科

信息科学-人工智能-机器感知与机器视觉

学科代码

F-F06-F0604

基金类别

面上项目

关键词

深度神经网络 ; 回复式神经网络 ; 目标检测 ; 生成对抗网络 ; Object Detection ; Deep Neural Network ; Recurrent Neural Network ; Generative Adversarial Nets

参与者

李凡;邢冠宇;徐培;任东晓;李旭东;唐宋;张锋;刘丹;淦艳

参与机构

电子科技大学;四川大学;浙江科技学院;上海理工大学;南京邮电大学;重庆大学

项目标书摘要:当服务机器人工作于新场景时,因源场景与新场景数据分布的不一致,目标检测效果通常会下降。迁移学习是一个很好的解决手段,但目前绝大多数领域自适应目标检测方法要求在迁移的时候保留源样本,或者少量新场境样本有标签。这些要求服务机器人通常无法满足。针对这种情况,在以往深度学习和机器视觉研究基础上,本课题提出通过网络调控和一致特征学习等方法实现目标检测器对新场景的无监督迁移。研究内容有:基于网络调控的目标检测神经网络迁移方法研究;基于无监督一致特征学习的目标检测神经网络迁移方法研究;结合上下文信息与融入多模态信息的无监督领域自适应目标检测方法研究。创新之处有:不保留源训练集、没有新场景目标标签,基于无监督学习的目标检测网络迁移模型;基于调控网络的迁移学习框架;基于网络调控和无监督学习的上下文和多模态信息的融合方法。取得的成果不仅能丰富目标检测方法和机器学习理论,也有很重要的社会与经济价值。

Application Abstract: When a service robot is working in a new scene,because of the inconsistency of the data distributions between the source and the new scenes,the detection performance of object detector always will drop rapidly.Transfer learning is a good solution;however,almost all of present domain adaption methods require that the training set is kept or some of target samples are labeled.These requirements for service robots are usually not met.From the previous research experiences in the fields of deep learning and machine vision,we propose to study the unsupervised transfer methods which are based on network control and consistent feature learning.The research contents include:1network control framework for transferring neural network based object detector,2unsupervised consistent feature learning model for transferring neural network based object detector,3unsupervised context feature learning methods and the corresponding object detector transferring methods,4unsupervised multi-modal information learning methods and the corresponding object detector transferring methods.The main innovation points are:1network control framework for transferring learning,2unsupervised consistent feature learning model,3the combing methods which absorb context and multi-modal information based on network control and unsupervised learning.The results obtained through these studies are expected not only to enrich the methods of object detection of robot and machine learning theory,but also to make important contributions to social and economic development.

项目受资助省

四川省

项目结题报告

面向服务机器人的无监督领域自适应目标检测方法研究结题报告(全文)

  • 排序方式:
  • 7
  • /
  • 1.Training generative adversarial networks by auxiliary adversarial example regulator

    • 关键词:
    • Image processing;Stability;Adversarial example;Baseline models;Images synthesis;Network regulation;Network-based modeling;Real images;Training framework
    • Gan, Yan;Ye, Mao;Liu, Dan;Liu, Yiguang
    • 《Applied Soft Computing》
    • 2023年
    • 136卷
    • 期刊

    Many variant Generative Adversarial Networks (GANs) have been proposed to address the problem that models are difficult to be trained, such as a network-based model, loss-based method, and training-based technique. However, these models rarely improve training stability by reducing the instability of the generator and discriminator simultaneously. For this purpose, inspired by the idea of network regulation, we design an auxiliary adversarial example regulator and propose a new training framework of GANs. In this method, to reduce the instability of the generator and discriminator simultaneously, we design a penalty to constrain directly and guide the generator to generate images, and gradually adjust the training of the discriminator by the auxiliary adversarial example regulator. With the designed constraint and discriminator, the generated image gets closer to the real image. Finally, experimental results demonstrate that the proposed method outperforms the baseline models. The code is available at https://github.com/AdleyGan/GAN-AE-P. © 2023 Elsevier B.V.

    ...
  • 2.面向服务机器人图像分割模型与应用研究

    • 关键词:
    • 图像分割;滑动窗口;全天候;知识语义;迁移学习
    • 张宇潇
    • 指导老师:电子科技大学 叶茂
    • 学位论文

    随着市场经济与科学技术的迅速发展,服务机器人被大量应用于人类的生产生活中,并发挥重要作用。在与服务机器人相关的技术中,图像分割技术是重要的关键技术之一。在图像分割技术的帮助下,服务机器人能够对视觉图像中的每一个像素点进行分类,有助于服务机器人进行环境感知。为服务机器人自主导航,道路识别,障碍物检测等任务提供了有利条件。近年来,得益于计算机视觉与人工智能技术的迅速发展,服务机器人图像分割技术也取得了长足进步。可是,仍然有许多难点未被攻克,比如服务机器人全天候图像分割技术,可迁移的服务机器人图像分割技术,以及图像分割技术在服务机器人上的应用。因此,对服务机器人图像分割技术进行深入研究,研发新的服务机器人图像分割算法,探索新的服务机器人图像分割应用技术具有重大意义。基于上述研究背景,本论文围绕服务机器人图像分割技术展开研究。通过分析国内外服务机器人图像分割技术的研究现状,针对服务机器人图像分割技术的难点,提出了新的服务机器人图像分割算法与应用方案,取得了以下成果:(1)本论文提出了一种基于滑动窗口的服务机器人图像分割技术。通过设置一个可在服务机器人视觉图像上滑动的窗口,对窗口区域进行分类,然后将窗口区域分类结果与基于阈值的图像分割技术融合,对服务机器人视觉图像完成二值分割。在嵌入式计算平台NVIDIA jetson TK1上的应用中,取得了比较好的分割效果。(2)本论文提出了一种基于生成对抗网络的全天候服务机器人图像分割技术。通过设计一个可监督的生成器,对服务机器人全天候视觉图像进行转换,再进行二值分割。在嵌入式计算平台NVIDIA jetson TX2上的应用中,在消耗少量计算资源的前提下,取得了准确度较高的分割效果。(3)本论文提出了一种基于语义的可迁移服务机器人图像分割技术。通过引入知识语义,对服务机器人图像分割技术进行迁移学习。使服务机器人在固定场景下学习的图像分割模型能够应用于其他场景中,并取得较好的分割效果。

    ...
  • 3.Multi-embedding space set-kernel and its application to multi-instance learning

    • 关键词:
    • Multi-instance learning; Multi-embedding space; Set-kernel; Similarity;CLASSIFICATION
    • Yang, Mei;Zhang, Yu-Xuan;Zhou, Zhengchun;Zeng, Wen-Xi;Min, Fan
    • 《NEUROCOMPUTING》
    • 2022年
    • 512卷
    • 期刊

    Set-level problems become critical when we are interested in animals in pictures, links in web pages, and components in drugs. The key issue is to measure the similarity between two sets. This paper develops a data-dependent multi-embedding space set-kernel (MSK) with close to linear time complexity and applies it to multi-instance learning (MIL), which is a typical set-level problem. The majority of current set-kernels are independent of the underlying data distribution. In contrast, MSK indirectly measures set similarity by determining the relationship between embedding vectors. Each set's embedding vectors are new representations with controlled dimensionality in the multi-embedding space. Multi-embedding space is described here as a set containing multiple subspaces based on the distribution of the data set. In addition, the MSK feature map is used to speed up the computation of similarity over the entire data set. Extensive experiments were done on 46 MIL data sets across five application domains. The results demonstrate that MSK has the lowest average classification loss and the highest stability com-pared with the rival set-kernels. The linear time complexity is also verified. Source codes are available at https://github.com/InkiInki/MSK.(c) 2022 Elsevier B.V. All rights reserved.

    ...
  • 4.State estimation for memristive neural networks with mixed time-varying delays via multiple integral equality

    • 关键词:
    • Memristiveneuralnetworks; Mixedtime-varyingdelays;Multipleintegralequality; Lessconservativestabilitycriterion;GLOBAL EXPONENTIAL STABILITY
    • Chen, Lijuan;Li, Binbin;Zhang, Ruimei;Luo, Jinnan;Wen, Chuanbo;Zhong, Shouming
    • 《NEUROCOMPUTING》
    • 2022年
    • 501卷
    • 期刊

    This paper investigates the problem of state estimation for memristive neural networks with leakage, discrete, and distributed delays. Specifically, we firstly take simplified double integration as an example to deriving the multiple integral equality by using the integration properties. And then, we develop a novel Lyapunov-Krasovskii functional (LKF) including multiple integral terms, which improve stability criteria in terms of linear matrix inequalities (LMIs) and present to the superiority of potential efficiency in practice. More importantly, a stability criterion with less conservative has been established by using lemma and the principle of distributed integral calculation. Finally, numerical examples are given to illustrate the effectiveness of the proposed methods. (c) 2022 Elsevier B.V. All rights reserved.

    ...
  • 5.End-to-end video compression for surveillance and conference videos

    • 关键词:
    • Deep learning; End-to-end video compression; Surveillance and conferencevideos; Online update
    • Wang, Shenhao;Zhao, Yu;Gao, Han;Ye, Mao;Li, Shuai
    • 《MULTIMEDIA TOOLS AND APPLICATIONS》
    • 2022年
    • 81卷
    • 29期
    • 期刊

    The storage and transmission tasks of surveillance and conference videos are an important branch of video compression. Since surveillance and conference videos have strong inter-frame correlation, considerable continuity at the image level and motion level between the consecutive frames exists. However, traditional video codec networks cannot fully use the characteristics of surveillance and conference videos during compression. Therefore, based on the DVC video codec framework, we propose a "MV residual + MV optimization" coding strategy for surveillance and conference videos to further reduce the compression rate and improve the quality of compressed video frames. During the testing stage, the online update strategy is promoted, which adapts the network's parameters to different surveillance and conference videos. Our contribution is to propose an optical flow residual coding method for videos with strong inter-frame correlation, implement optical flow optimization at decoding end and online update strategy at the encoding end. Experiments show that our method can outperform DVC framework, especially on CUHK Square surveillance video with 1.2dB improvement.

    ...
  • 6.Low-resolution human pose estimation(Open Access)

    • Wang, Chen ; Zhang, Feng ; Zhu, Xiatian ; Ge, Shuzhi Sam
    • 《Pattern Recognition》
    • 2022年
    • 126卷
    • 期刊

    Human pose estimation has achieved significant progress on images with high imaging resolution. However, low-resolution imagery data bring nontrivial challenges which are still under-studied. To fill this gap, we start with investigating existing methods and reveal that the most dominant heatmap-based methods would suffer more severe model performance degradation from low-resolution, and offset learning is an effective strategy. Established on this observation, in this work we propose a novel Confidence-Aware Learning (CAL) method which further addresses two fundamental limitations of existing offset learning methods: inconsistent training and testing, decoupled heatmap and offset learning. Specifically, CAL selectively weighs the learning of heatmap and offset with respect to ground-truth and most confident prediction, whilst capturing the statistical importance of model output in mini-batch learning manner. Extensive experiments conducted on the COCO benchmark show that our method outperforms significantly the state-of-the-art methods for low-resolution human pose estimation. © 2022 Elsevier Ltd

    ...
  • 7.Quality enhancement of compressed screen content video by cross-frame information fusion

    • 关键词:
    • Screen content video; Feature fusion; Edge information recovery
    • Huang, Jiawang;Cui, Jinzhong;Ye, Mao;Li, Shuai;Zhao, Yu
    • 《NEUROCOMPUTING》
    • 2022年
    • 493卷
    • 期刊

    In recent years, with the rise of various online learning platforms and game live broadcast industries, screen content video, a special type of video, is gradually emerging, and its traffic on the Internet is also increasing. Therefore, how to effectively enhance the quality of the screen content video has become an urgent problem to be solved. There exist a few successful compressed video enhancement algorithms. However, since there are a large number of areas with similar colors in the compressed screen content video, the traditional algorithms based on optical flow and deformable convolution cannot align the screen content video frames well. Specifically, for screen content videos containing animations and games, we propose a screen content video quality enhancement network based on the cross-fusion of multi-frame information. It includes a feature extraction module, a feature fusion module, an edge detail recovery module, and a reconstruction module. Our main contribution is the alignment-free quality enhancement framework based on cross-frame information fusion instead of traditional alignment based approaches. Through our experiments, the best results have been achieved on 13 screen content videos containing animations and games compressed by the SCC branch of HEVC/H.265. (c) 2021 Elsevier B.V. All rights reserved.

    ...
  • 8.面向非结构场景中垃圾拾取任务的高效感知方法

    • 《小型微型计算机系统》
    • 2022年
    • 期刊

    工作场景感知是机器人高效实现指定任务的重要前提。得益于深度学习的发展,现有方法可实现高性能的工作场景感知,但是要求较高的计算能力导致这些方法难以部署于低算力的平台上。本文针对移动机器人在非结构化场景中的垃圾拾取任务,构建了一个12类的垃圾识别数据集,并以此提出了一个结合深度学习与传统机器学习的高效感知方法。该方法在YOLOv4目标检测的基础上,设计了一种基于K-means++聚类的深度信息优化方法,并结合图像形态学变化和Canny边缘检测算法实现物体角度估计。实验结果表明了该方法准确率高、实时性强,对于非结构化场景中的干扰信息(如背景、物体材质等)具有一定的鲁棒性。

    ...
  • 9.Prototype-Based Multisource Domain Adaptation

    • 关键词:
    • Feature extraction; Prototypes; Data mining; Adaptation models;Semantics; Training; Research and development; Disentanglement;multisource domain adaptation (MDA); prototypes; reconstruction;RECOGNITION
    • Zhou, Lihua;Ye, Mao;Zhang, Dan;Zhu, Ce;Ji, Luping
    • 《IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS》
    • 2021年
    • 33卷
    • 10期
    • 期刊

    Unsupervised domain adaptation aims to transfer knowledge from labeled source domain to unlabeled target domain. Recently, multisource domain adaptation (MDA) has begun to attract attention. Its performance should go beyond simply mixing all source domains together for knowledge transfer. In this article, we propose a novel prototype-based method for MDA. Specifically, for solving the problem that the target domain has no label, we use the prototype to transfer the semantic category information from source domains to target domain. First, a feature extraction network is applied to both source and target domains to obtain the extracted features from which the domain-invariant features and domain-specific features will be disentangled. Then, based on these two kinds of features, the named inherent class prototypes and domain prototypes are estimated, respectively. Then a prototype mapping to the extracted feature space is learned in the feature reconstruction process. Thus, the class prototypes for all source and target domains can be constructed in the extracted feature space based on the previous domain prototypes and inherent class prototypes. By forcing the extracted features are close to the corresponding class prototypes for all domains, the feature extraction network is progressively adjusted. In the end, the inherent class prototypes are used as a classifier in the target domain. Our contribution is that through the inherent class prototypes and domain prototypes, the semantic category information from source domains is transformed into the target domain by constructing the corresponding class prototypes. In our method, all source and target domains are aligned twice at the feature level for better domain-invariant features and more closer features to the class prototypes, respectively. Several experiments on public data sets also prove the effectiveness of our method.

    ...
  • 排序方式:
  • 7
  • /