面向人机合作协同的机器人运动技能获取和执行研究
项目来源
项目主持人
项目受资助机构
立项年度
立项时间
项目编号
研究期限
项目级别
受资助金额
学科
学科代码
基金类别
关键词
参与者
参与机构
项目受资助省
项目结题报告(全文)
1.基于双空间交替学习的机器人轨迹规划研究
- 关键词:
- 运动技能获取 轨迹规划 强化学习 基重组 基金资助:《面向人机合同协作的机器人运动技能获取和执行研究》(61773299)的国家自然科学基金面上项目; 专辑:信息科技 专题:自动化技术 DOI:10.27381/d.cnki.gwlgu.2019.000311 分类号:TP242 导师:傅剑 手机阅读
- 0年
- 卷
- 期
- 期刊
机器人技术反映着一个国家的科学技术水平,是当代最具有重要战略意义的产业之一,而随着机器人技术的发展,其应用场景越来越复杂多变,传统的固定作业机器人无法满足生产要求,机器人要求被赋予更灵活快速的应激能力和更智能的行为。机器人运动技能的获取与泛化就是赋予机器人智能的一种重要方法,而基于示范学习加强化学习的框架(LfDRL)的运动技能获取方法,应用最为成功,本文基于策略表示、模仿学习和优化的三段范式LfDRL框架,针对在特定的性能约束条件下根据演示任务自主完成新的任务这一热点问题,提出了一种基于改进的局部加权回归(iLWR)、路径积分策略提升(PI~2)和基重组的运动技能学习方法(iLWR-PI~2-AL)。由于经典的LWR-PI~2方法训练过程中基函数固定,可能不适用于新任务,对此本文加入基函数自重组和iLWR双摄动方法,让算法在双空间里交替学习,将逐渐实现从熟悉任务到新任务的泛化学习。本文的研究内容如下:首先,本文研究了国内外的机器人发展现状,然后通过大量文献调研了与本文密切相关的模仿学习、强化学习以及与前两者相结合的深度学习方法。然后,本文介绍了机器人运动学的基本知识,包括正逆运动学、D-H坐标表示法,讨论了机器人研究中常见的运动解耦和关节冗余问题。其后,本文基于DMPs-iLWR和DMPs-GMR模仿学习方法,总结出一个模仿学习的统一框架,紧接着基于模仿学习,通过PI~2强化学习实现技能泛化,本文比较分析了iLWR-PI~2和GMR-PI~2两种方法的优劣势,提出了基于双空间交替学习的iLWR-PI~2策略提升方法(iLWR-PI~2-AL),该方法通过权空间和基空间的交替优化寻找任务的最优/次优解,最后从理论层面解释了双空间学习的可行性。最后本文分别用SCARA、平面十连杆、NAO和UR5机器人作为实验平台验证所提出的算法,前两种机器人仅用MATLAB作仿真实验,后面两种在仿真后用实物进行验证,结果表明该算法性能优秀。
...2.Robot Motor Skill Transfer with Alternate Learning in Two Spaces
- 关键词:
- Functions;Manipulators;Reinforcement learning;Quantum theory;Robot programming;Intelligent robots;Alternate learning in two space;Improved locally weighted regression;Locally weighted regression;Motor skill acquisition;Motor skills;Path integral;Policy improvement with path integral by dual perturbation (PI²-DP);Skill transfer;Skills acquisition;Trajectory Planning
- Fu, Jian;Teng, Xiang;Cao, Ce;Ju, Zhaojie;Lou, Ping
- 《IEEE Transactions on Neural Networks and Learning Systems》
- 2021年
- 32卷
- 10期
- 期刊
Recent research achievements in learning from demonstration (LfD) illustrate that the reinforcement learning is effective for the robots to improve their movement skills. The current challenge mainly remains in how to generate new robot motions automatically to perform new tasks, which have a similar preassigned performance indicator but are different from the demonstration tasks. To deal with the abovementioned issue, this article proposes a framework to represent the policy and conduct imitation learning and optimization for robot intelligent trajectory planning, based on the improved locally weighted regression (iLWR) and policy improvement with path integral by dual perturbation (PI2-DP). Besides, the reward-guided weight searching and basis function's adaptive evolving are performed alternately in two spaces, i.e., the basis function space and the weight space, to deal with the abovementioned problem. The alternate learning process constructs a sequence of two-tuples that join the demonstration task and new one together for motor skill transfer, so that the robot gradually acquires motor skill, from the task similar to demonstration to dissimilar tasks with different performance metrics. Classical via-points trajectory planning experiments are performed with the SCARA manipulator, a 10-degree of freedom (DOF) planar, and the UR robot. These results show that the proposed method is not only feasible but also effective.© 2012 IEEE....3.融合任务置信度特征的机器人深度强化学习及其插孔实验研究
- 傅剑;刘若拙;钟亚东;王祺丰;向馗;
- 0年
- 卷
- 期
- 期刊
4.Adaptive Multi-Task Human-Robot Interaction Based on Human Behavioral Intention
- 《IEEE ACCESS》
- 2021年
- 9卷
- 期
- 期刊
Learning from demonstrations with Probabilistic Movement Primitives (ProMPs) has been widely used in robot skill learning, especially in human-robot collaboration. Although ProMP has been extended to multi-task situations inspired by the Gaussian mixture model, it still treats each task independently. ProMP ignores the common scenario that robots conduct adaptive switching of the collaborative tasks in order to align with the instantaneous change of human intention. To solve this problem, we proposed an alternate learning-based parameter estimation method and an empirical minimum variation-based decomposition strategy with projection points, combining with linear interpolation strategy for weights, based on a Gaussian mixture model framework. Alternate learning of weights and parameters in multi-task ProMP (MTProMP) allows the robot to obtain a smooth composite trajectory planning which crosses expected via points. Decomposition strategy reflects how the desired via point state is projected onto the individual ProMP component, rendering the minimum total sum of deviations between each projection point with the respective prior. Linear interpolation is used to adjust the weights among sequential via points automatically. The proposed method and strategy are successfully extended to multi-task interaction ProMPs (MTiProMP). With MTProMP and MTiProMP, the robot can be applied to multiple tasks in industrial factories and collaborate with the worker to switch from one task to another according to changing intentions of the human. Classical via points trajectory planning experiments and human-robot collaboration experiments are performed on the Sawyer robot. The results of experiments show that MTProMP and MTiProMP with the proposed method and strategy perform better.
...5.Compound Heuristic Information Guided Policy Improvement for Robot Motor Skill Acquisition
- 关键词:
- robot learning; reinforcement learning; heuristic information; KCCA;PI2-CMA;PRIMITIVES; MODELS
- Fu, Jian;Li, Cong;Teng, Xiang;Luo, Fan;Li, Boqun
- 《APPLIED SCIENCES-BASEL》
- 2020年
- 10卷
- 15期
- 期刊
Discovering the implicit pattern and using it as heuristic information to guide the policy search is one of the core factors to speed up the procedure of robot motor skill acquisition. This paper proposes a compound heuristic information guided reinforcement learning algorithm PI2-CMA-KCCA for policy improvement. Its structure and workflow are similar to a double closed-loop control system. The outer loop realized by Kernel Canonical Correlation Analysis (KCCA) infers the implicit nonlinear heuristic information between the joints of the robot. In addition, the inner loop operated by Covariance Matrix Adaptation (CMA) discovers the hidden linear correlations between the basis functions within the joint of the robot. These patterns which are good for learning the new task can automatically determine the mean and variance of the exploring perturbation for Path Integral Policy Improvement (PI2). Compared with classical PI2, PI2-CMA, and PI2-KCCA, PI2-CMA-KCCA can not only endow the robot with the ability to realize transfer learning of trajectory planning from the demonstration to the new task, but also complete it more efficiently. The classical via-point experiments based on SCARA and Swayer robots have validated that the proposed method has fast learning convergence and can find a solution for the new task.
...6.基于双空间并发 ProMP 的人机协作和避障
- 傅剑;王超奇;李聪;罗璠;
- 0年
- 卷
- 期
- 期刊
7.基于ProMPs和PI~2的机器人学习方法
- 关键词:
- 机器人学习;概率运动基元;路径积分;PI~2;贝叶斯估计;轨迹优化
- 傅剑;曹策;申思远
- 《武汉科技大学学报》
- 2019年
- 卷
- 05期
- 期刊
基于传统运动基元模型的机器人学习方法存在学习速度慢、学习结果精度低等问题,为此本文提出一种融合贝叶斯估计算法的概率运动基元(ProMPs)表达和模仿学习框架,同时还利用了基于核典型相关分析(KCCA)的改进型路径积分PI~2策略进行轨迹优化。ProMPs结合贝叶斯推断,为机器人实现有别于示范任务的新任务提供了一个可行解搜索起点,而利用附加泛函指标约束的PI~2算法能让机器人获得平滑的过点轨迹。通过UR5机器人实验平台和V-REP仿真软件对本文方法进行过点试验验证,结果表明,所提出的贝叶斯ProMPs-PI~2学习方法能快速而精准地完成机器人从示范任务到陌生任务的泛化学习,实现机器人新技能的获取。
...8.Robot Motion Skills Acquisition Method Based on GU-ProMPs and Reinforcement Learning
- Fu Jian;Wang Chaoqi;Du Jinyu;Luo Fan;Yang Chenguang;
- 0年
- 卷
- 期
- 期刊
9.Robot motor skill acquisition with learning in two spaces
- Fu Jian;Cao Ce;Du Jinyu;Shen Siyuan;
- 0年
- 卷
- 期
- 期刊
10.Concurrent probabilistic motion primitives for obstacle avoidance and human-robot collaboration
- Fu Jian;Wang ChaoQi;Du JinYu;Luo Fan;
- 0年
- 卷
- 期
- 期刊
