面向人机合作协同的机器人运动技能获取和执行研究

项目来源

国家自然科学基金(NSFC)

项目主持人

傅剑

项目受资助机构

武汉理工大学

立项年度

2017

立项时间

未公开

项目编号

61773299

研究期限

未知 / 未知

项目级别

国家级

受资助金额

63.00万元

学科

信息科学-自动化-自动化检测技术与装置

学科代码

F-F03-F0306

基金类别

面上项目

关键词

示范学习 ; 人机协作 ; 强化学习 ; 轨迹规划 ; 运动技能获取 ; motor skill acquisition ; human-robot collaboration ; trajectory planning ; Learning from demonstration ; reinforcement learning

参与者

向馗；庞牧野；罗璠；陈向成；魏达；曹策；刘冰；杜宇澄

参与机构

武汉理工大学；武汉大学；安徽大学

项目标书摘要：源于巨大的市场需求和前景，人机协作近年来成为机器人领域的前沿和热点。机器人如何学习到人类完成任务的个人偏好并自主适应，是当前亟待解决的问题，也是本项目研究的主要内容。本项目创造性地从机器人运动技能获取和执行视角出发开展研究。.1设计关节/操作空间上冗余的并发行为基元投射人机协作数据，实现在对偶空间上数据面向任务类别的分类聚集和分布，构建融合意图识别、个人偏好学习和适应的一体模型。2基于强化学习框架，结合表征学习的特征抽取、协作者反馈的引导学习来实现行为基元面向指标集的自主、快速适应和优化。3研究设计融合语义信息的操作算子对行为基元进行时空维度的复合，探究结合行为—运动模板库和长短期记忆网络对人体序列运动进行分割和识别，进而实现结合指标集的离散符号序列与连续运动规划的互相转化。.本研究将会揭示人体协作运动技能获得和提高的潜在模式和规律，为人机合作协同的研究提供新的研究方法和思路。

Application Abstract: Due to the huge market demand and charming prospect,the human-robot collaboration has become the cutting edge and hot spot in the field of robotics in recent years.It is an urgent problem for collaborative robot to learn and adapt to people’s preference about the task,which is addressed in this proposal.The principle investigator(PI)adopt a novel perspective,robot’s motor skill acquisition and execution,to conduct the research in this proposal...1)PI will design concurrent behavior primitives with redundancy in the joint and operation space,by which the collaborative data is projected onto the kernel of the behavior primitives.So the data(weight coefficient)in the dual space cluster with respect to the different task and cover a relative domain.In this way,an integrated interaction model for people’s intention identification,preference learning and adaption is constructed...2)Autonomous adaption and optimization of behavior primitive to meet the target is realized via feature extraction by representation learning and reward shaping by people’s feedback in the framework of reinforcement learning...3)Composition of the behavior primitives within the respective spatial and temporal dimensions is studied by introducing the operators with semantic information.Also,segmentation and identification of people’s sequence motions by behavior-motor library and LSTM are investigated.Moreover,interconnection between discrete symbol sequence with indicators and continuous motion planning is realized based on the previous achievement...In this study,we will reveal the potential patterns and rules of the acquisition and improvement of human collaborative motor skills,and provide new research methods and ideas for the research of human-robot collaboration.

项目受资助省

湖北省

项目结题报告(全文)

源于巨大的市场需求和前景，人机协作近年来成为机器人领域的前沿和热点。机器人如何学习到人类个体完成任务的个人偏好并自主适应，是亟待解决的问题也是本项目研究的主要内容。本项目创造性地从机器人运动技能获取和执行视角出发开展研究。具体而言:1)针对如何基于示范任务学习让机器人自主获得完成新任务能力的问题，我们提出双空间交替学习的思路和途径。它将当前机器人运动技能获取的 LfDRL 三阶段统一考虑，提出iLWR-PI2-AL算法实现了策略表达、模仿学习、策略提升的滚动优化。2)针对如何构建将时空耦合信息转化为可调制的运动模型并满足预设的条件约束的运动基元。同时在人机交互和协作中，以适配人类的行为意图变化而做出在线自适应调整的问题，我们提出面向多任务人机交互的MTiProMP模型，并结合解构和迭代策略实现了面向行为意图的多任务人机自适应交互、切换和协同。3)在面向不同任务运动技能的获取中，如何能能自主地掌握到完成任务的该技巧非常关键，它体现为各关节之间面向特定任务的隐含模式。我们提出双环结构启发式搜索的强化学习框架和 PI2-CMA-KCCA 算法用来加速面向新任务的运动技能获取。发现和预测关节间运动基元间和运动基元线内相关模式，实现了行为基元高效策略搜索。传统机器人操作和规划研究都是面对具体问题分别采用不同的模型和假设(彼此异构),这与人体本身基于同构模式来实现不同的运动技能有很大的不同。结合神经系统学、运动学和认识学的研究成果，本研究提出一种通过赋予机器人协作运动技能来实现人机合作协同的新思路和途径。通过构建协作行为基元，并结合模仿学习和强化学习实现运动技能传递(策略表达、模仿学习和策略提升)和人机交互协同(时间索引协作框架、状态索引协作框架),在机器人运动技能获取研究上做出有益的探索。该研究一定程度上揭示了人体协作运动技能获得和提高的潜在模式和规律，为人机合作协同的研究提供了新的方法和思路。

排序方式：时间相关性
显示方式：列表摘要

1.基于双空间交替学习的机器人轨迹规划研究

关键词：
运动技能获取轨迹规划强化学习基重组基金资助：《面向人机合同协作的机器人运动技能获取和执行研究》（61773299）的国家自然科学基金面上项目；专辑：信息科技专题：自动化技术 DOI：10.27381/d.cnki.gwlgu.2019.000311 分类号：TP242 导师：傅剑手机阅读

《》
0年
卷
期
期刊

机器人技术反映着一个国家的科学技术水平,是当代最具有重要战略意义的产业之一,而随着机器人技术的发展,其应用场景越来越复杂多变,传统的固定作业机器人无法满足生产要求,机器人要求被赋予更灵活快速的应激能力和更智能的行为。机器人运动技能的获取与泛化就是赋予机器人智能的一种重要方法,而基于示范学习加强化学习的框架（LfDRL）的运动技能获取方法,应用最为成功,本文基于策略表示、模仿学习和优化的三段范式LfDRL框架,针对在特定的性能约束条件下根据演示任务自主完成新的任务这一热点问题,提出了一种基于改进的局部加权回归（iLWR）、路径积分策略提升（PI~2）和基重组的运动技能学习方法（iLWR-PI~2-AL）。由于经典的LWR-PI~2方法训练过程中基函数固定,可能不适用于新任务,对此本文加入基函数自重组和iLWR双摄动方法,让算法在双空间里交替学习,将逐渐实现从熟悉任务到新任务的泛化学习。本文的研究内容如下:首先,本文研究了国内外的机器人发展现状,然后通过大量文献调研了与本文密切相关的模仿学习、强化学习以及与前两者相结合的深度学习方法。然后,本文介绍了机器人运动学的基本知识,包括正逆运动学、D-H坐标表示法,讨论了机器人研究中常见的运动解耦和关节冗余问题。其后,本文基于DMPs-iLWR和DMPs-GMR模仿学习方法,总结出一个模仿学习的统一框架,紧接着基于模仿学习,通过PI~2强化学习实现技能泛化,本文比较分析了iLWR-PI~2和GMR-PI~2两种方法的优劣势,提出了基于双空间交替学习的iLWR-PI~2策略提升方法（iLWR-PI~2-AL）,该方法通过权空间和基空间的交替优化寻找任务的最优/次优解,最后从理论层面解释了双空间学习的可行性。最后本文分别用SCARA、平面十连杆、NAO和UR5机器人作为实验平台验证所提出的算法,前两种机器人仅用MATLAB作仿真实验,后面两种在仿真后用实物进行验证,结果表明该算法性能优秀。

...

2.Robot Motor Skill Transfer with Alternate Learning in Two Spaces

关键词：
Functions;Manipulators;Reinforcement learning;Quantum theory;Robot programming;Intelligent robots;Alternate learning in two space;Improved locally weighted regression;Locally weighted regression;Motor skill acquisition;Motor skills;Path integral;Policy improvement with path integral by dual perturbation (PI²-DP);Skill transfer;Skills acquisition;Trajectory Planning

Fu, Jian;Teng, Xiang;Cao, Ce;Ju, Zhaojie;Lou, Ping
《IEEE Transactions on Neural Networks and Learning Systems》
2021年
32卷
10期
期刊

Recent research achievements in learning from demonstration (LfD) illustrate that the reinforcement learning is effective for the robots to improve their movement skills. The current challenge mainly remains in how to generate new robot motions automatically to perform new tasks, which have a similar preassigned performance indicator but are different from the demonstration tasks. To deal with the abovementioned issue, this article proposes a framework to represent the policy and conduct imitation learning and optimization for robot intelligent trajectory planning, based on the improved locally weighted regression (iLWR) and policy improvement with path integral by dual perturbation (PI2-DP). Besides, the reward-guided weight searching and basis function's adaptive evolving are performed alternately in two spaces, i.e., the basis function space and the weight space, to deal with the abovementioned problem. The alternate learning process constructs a sequence of two-tuples that join the demonstration task and new one together for motor skill transfer, so that the robot gradually acquires motor skill, from the task similar to demonstration to dissimilar tasks with different performance metrics. Classical via-points trajectory planning experiments are performed with the SCARA manipulator, a 10-degree of freedom (DOF) planar, and the UR robot. These results show that the proposed method is not only feasible but also effective.

...

3.融合任务置信度特征的机器人深度强化学习及其插孔实验研究

傅剑；刘若拙；钟亚东；王祺丰；向馗；
《》
0年
卷
期
期刊

4.Adaptive Multi-Task Human-Robot Interaction Based on Human Behavioral Intention

《IEEE ACCESS》
2021年
9卷
期
期刊

Learning from demonstrations with Probabilistic Movement Primitives (ProMPs) has been widely used in robot skill learning, especially in human-robot collaboration. Although ProMP has been extended to multi-task situations inspired by the Gaussian mixture model, it still treats each task independently. ProMP ignores the common scenario that robots conduct adaptive switching of the collaborative tasks in order to align with the instantaneous change of human intention. To solve this problem, we proposed an alternate learning-based parameter estimation method and an empirical minimum variation-based decomposition strategy with projection points, combining with linear interpolation strategy for weights, based on a Gaussian mixture model framework. Alternate learning of weights and parameters in multi-task ProMP (MTProMP) allows the robot to obtain a smooth composite trajectory planning which crosses expected via points. Decomposition strategy reflects how the desired via point state is projected onto the individual ProMP component, rendering the minimum total sum of deviations between each projection point with the respective prior. Linear interpolation is used to adjust the weights among sequential via points automatically. The proposed method and strategy are successfully extended to multi-task interaction ProMPs (MTiProMP). With MTProMP and MTiProMP, the robot can be applied to multiple tasks in industrial factories and collaborate with the worker to switch from one task to another according to changing intentions of the human. Classical via points trajectory planning experiments and human-robot collaboration experiments are performed on the Sawyer robot. The results of experiments show that MTProMP and MTiProMP with the proposed method and strategy perform better.

...

5.Compound Heuristic Information Guided Policy Improvement for Robot Motor Skill Acquisition

关键词：
robot learning; reinforcement learning; heuristic information; KCCA;PI2-CMA;PRIMITIVES; MODELS

Fu, Jian;Li, Cong;Teng, Xiang;Luo, Fan;Li, Boqun
《APPLIED SCIENCES-BASEL》
2020年
10卷
15期
期刊

Discovering the implicit pattern and using it as heuristic information to guide the policy search is one of the core factors to speed up the procedure of robot motor skill acquisition. This paper proposes a compound heuristic information guided reinforcement learning algorithm PI2-CMA-KCCA for policy improvement. Its structure and workflow are similar to a double closed-loop control system. The outer loop realized by Kernel Canonical Correlation Analysis (KCCA) infers the implicit nonlinear heuristic information between the joints of the robot. In addition, the inner loop operated by Covariance Matrix Adaptation (CMA) discovers the hidden linear correlations between the basis functions within the joint of the robot. These patterns which are good for learning the new task can automatically determine the mean and variance of the exploring perturbation for Path Integral Policy Improvement (PI2). Compared with classical PI2, PI2-CMA, and PI2-KCCA, PI2-CMA-KCCA can not only endow the robot with the ability to realize transfer learning of trajectory planning from the demonstration to the new task, but also complete it more efficiently. The classical via-point experiments based on SCARA and Swayer robots have validated that the proposed method has fast learning convergence and can find a solution for the new task.

...

6.基于双空间并发 ProMP 的人机协作和避障

傅剑；王超奇；李聪；罗璠；
《》
0年
卷
期
期刊

7.基于ProMPs和PI~2的机器人学习方法

关键词：
机器人学习;概率运动基元;路径积分;PI~2;贝叶斯估计;轨迹优化

傅剑;曹策;申思远
《武汉科技大学学报》
2019年
卷
05期
期刊

基于传统运动基元模型的机器人学习方法存在学习速度慢、学习结果精度低等问题,为此本文提出一种融合贝叶斯估计算法的概率运动基元（ProMPs）表达和模仿学习框架,同时还利用了基于核典型相关分析（KCCA）的改进型路径积分PI~2策略进行轨迹优化。ProMPs结合贝叶斯推断,为机器人实现有别于示范任务的新任务提供了一个可行解搜索起点,而利用附加泛函指标约束的PI~2算法能让机器人获得平滑的过点轨迹。通过UR5机器人实验平台和V-REP仿真软件对本文方法进行过点试验验证,结果表明,所提出的贝叶斯ProMPs-PI~2学习方法能快速而精准地完成机器人从示范任务到陌生任务的泛化学习,实现机器人新技能的获取。

...

8.Robot Motion Skills Acquisition Method Based on GU-ProMPs and Reinforcement Learning

Fu Jian；Wang Chaoqi；Du Jinyu；Luo Fan；Yang Chenguang；
《》
0年
卷
期
期刊

9.Robot motor skill acquisition with learning in two spaces

Fu Jian；Cao Ce；Du Jinyu；Shen Siyuan；
《》
0年
卷
期
期刊

10.Concurrent probabilistic motion primitives for obstacle avoidance and human-robot collaboration

Fu Jian；Wang ChaoQi；Du JinYu；Luo Fan；
《》
0年
卷
期
期刊

排序方式：时间相关性
显示方式：列表摘要