Download presentation
Presentation is loading. Please wait.
1
Using OpenRDK to learn walk parameters for the Humanoid Robot NAO A. Cherubini L. Iocchi it’s me F. Giannone M. Lombardo G. Oriolo
2
Overview: environment Robotic Agent NAO ApplicationRobotic Soccer SDK Simulator Humanoid Robot Produced by Aldebaran
3
Process raw data from environment Elaborate raw data to obtain more reliable information Decide the best behaviour to accomplish the agent goal Actuate robot motors accordindly Vision Module Modelling Module Motion Control Module Behaviour Control Module Environment At First !!! Overview: (sub)tasks
4
Make Nao walk…how? Main Advantage …and a Drawback Based on an unknow Walk Model Ready to Use (…to be tuned) Nao is equipped with a set of motion utilities including walk implementation a walk implementation that can be No flexibility at all!!! called through an interface (NaoQi Motion Proxy) partially customized by tuning some parameters For these reasons we decided to develop our walk model and to tune it using machine learnig tecniques
5
SPQR Walking library development workflow Develop the Walk model using Matlab Test the walk model on Webots simulator Design and Implement a C++ library for our RDK Soccer Agent on Webots simulator on real NAO robot Finally tune walk parameters (on webots simulator and on NAO) SPQR Walk Model Test our Walking RDK Agent SPQR Walking Library
6
A simple Walking RAgent for Nao 2 scenari devono esser possibili: 1. Ragent che gira su webots 2. Ragent che gira su nao Mostrare quanto hanno in comune Spiegare vantaggi nell’usare RDK (possibilita’ di sviluppare e testare la walking library ortogonalmente alla sviluppo del resto del codice a cui puo’ cmq essere facilemente integrata)
7
A simple walking RAgent for Nao Motion Control Module NaoQi Adaptor Simple Behaviour Module Switches between two states: walk - stand Smemy SPQR Walking Library NAO (NaoQi) Webots Client TCP channel WEBOTS uses
8
Choose a set of variable output: 3D coordinates of selected points of the robot Choose and parametrize the desired trajectories for these variables at each phase of the gait SPQR Walking Engine Model 21 degrees of freedom Velocity Commands (v,ω) v is linear velocity ω is angolar velocity We follow the “Static Walking Pattern”: Use a-priori definition of the desired trajectories defined by: NAO model characteristics No actuated trunk No dynamic model available
9
SPQR velocity commands Initial Half Step Rectilinear Walk Swing Stand Position Final Half Step Curvilinear Walk Swing Turn Step Behavior Control Module Motion Control Module Joints Matrix (v,ω) (0,ω) (0,0) (v,0) (v,ω) (v,0) (0,0) (v,ω)
10
SPQR walking subtasks and parameters SPQR walk subtasks Foot trajectories in the xz plane Center of mass trajectory in lateral direction Hip yaw/pitch control (turn) Arm control X tot, X sw0, X ds Z st, Z sw Y ft, Y ss, Y ds, K r H yp KsKs Biped walking Double support phaseSwing phase SS%
11
SPQR Walking Library Class Diagram
12
Walk tuning: main issues Possible choices By hand By using machine learning techniques Machine Learning seems the best solution Less human interaction Explores the search space in a more systematic way …but take care of some aspects You need to define an effective fitness function You need to choose the right algorithm to explore the parameter space Only a limited amount of experiments can be done on a real robot
13
SPQR Learning System Architecture Learner Learning library RAgent Walking library uses Real Nao Webots Data to evaluate the fitness Fitness Iteration experiments (GPS)
14
SPQR Learner First iteration? Return initial Iteration and iteration information Apply the chosen algorithm (strategy) Yes No Policy Gradient (e.g., PGPR) Nelder Mead Simplex Method Genetic Algorithm Learner Return next Iteration and iteration information
15
Policy Gradient (PG) iteration Given a point p in the parameter space IR K Generate n (n=mk) policies from p (for each component of p: p i, p i + , or p i - ) Evaluate the policies For each k {1, …, K}, compute F k+, F k0, F k- For each k {1, …, K}, if F 0 > F + and F 0 > F - then k =0 else k = F + -F - *= normalized( ) p’=p+ *
16
Enhancing PG: PGPR At each iteration i, the gradient estimate (i) can be used to obtain a metric for measuring the relevance of the parameters. Given the relevance and a threshold T, PGPR prunes less relevant parameters in next iterations. forgetting factor
17
Curvilinear biped walking experiment The robot move along a curve with radius R for a time t Fitness function: In which: radial error path length
18
Simulators in learning tasks Advantages You can test the gait model and the learning algorithm without being biased by noise Limits The results of the experiments on the simulator can be ported on the real robot, but specialized solutions for the simulated model can be not so effective on the real robot (e.g., it does not take into account asymmetries, models are not very accurate)
19
Results (1) Five sessions of PG, 20 iterations each, all starting from the same initial configuration SS%, Ks, Yft have been set to hand-tuned values 16 policies for each iteration Fitness increases in a regular way Low variance among the five simulations
20
Results (2) Z sw XsKrX sw0 Five runs of PGPR Final parameter sets for the five PG runs
21
A. Cherubini, F. Giannone, L. Iocchi, M. Lombardo, G. Oriolo. “Policy Gradient Learning for a Humanoid Soccer Robot”. Accepted for Journal of Robotics and Autonomous Systems. A. Cherubini, F. Giannone, L. Iocchi, and P. F. Palamara, “An extended policy gradient algorithm for robot task learning”, Proc. of IEEE/RSJ International Conference on Intelligent Robots and System, 2007. A. Cherubini, F. Giannone, and L. Iocchi, “Layered learning for a soccer legged robot helped with a 3D simulator”, Proc. of 11th International Robocup Symposium, 2007. http://openrdk.sourceforge.net http://www.aldebaran-robotics.com/ http://spqr.dis.uniroma1.it Bibliography
22
??? Any Questions ??? ???
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.