Developer's Word | AI Control Robot Series Part Three —— Voice Control

D-Robotics · August 15, 2022, 10:37am

Looking forward to the world of AI, I want to make my life AI, and I have always been a lazy person who can never move my mouth, so I am keen to experience all kinds of AI robots, especially robots that can be controlled by moving my mouth, thinking that one day I can only move my mouth and never move my hand, so I have the following experiences.

The following one is said to be able to control it by talking, so I couldn’t wait to buy it and experience it:

Talking can indeed follow me for a while, but I really don’t want to have to turn right to guide it to turn right after I finish speaking…

Of course, there are also robots that do not require the small owner to do the same action before moving, such as this one:

However, why do you still have to move hands, remote control to control… Oh, that’s not what lazy people expect. Is it possible to make the robot move without moving the mouth to achieve the purpose of controlling the robot?

The answer must be yes!

Below Horizon’s intelligent voice-controlled robot series.

A function introduction

Let’s take a look at the final functionality:

The voice command word “go forward” controls the robot’s progress

The voice command word “step back” controls the robot to step back

The voice command word “turn left” controls the robot to turn left

The voice command word “turn right” controls the robot to turn right

The voice command word “stop motion” controls the robot to stop motion

Effect video (see attachment at the end of the article for the full video content) :

It was totally hands-off, was it? Was it?

From the beginning of speech to the rapid and accurate movement of the robot according to the command, thanks to the BPU of 5Tops computing power on X3, it can realize the low-delay algorithm reasoning ability, and the far-field (3~5 meters) noise reduction, echo cancellation, and high-accuracy asr recognition functions achieved by the speech algorithm module. Of course, the functions that can be achieved are far more than these, and the above is just the tip of the iceberg. Users can configure voice command words independently or define the robot’s behavior directly according to speech recognition, and rely on high-accuracy recognition to achieve the effect that is what they get.

Before you start fiddling with the installation of the robot, simply think about what basic modules are needed for the voice-controlled robot to move?

Voice input - microphone

Since it’s audio control, you have to have an audio input module, right? X3 is a microphone array hardware pickup, through the microphone to collect audio after the intelligent voice analysis module processing. The Horizon X3 is adapted to a linear four-mic microphone array, and has a mining function, which can achieve a good echo cancellation.

Intelligent speech perception

The intelligent speech algorithm can process the original audio, reduce the noise of the audio, recognize the speech, and output the DOA Angle of the voice source location. The voice after noise reduction can be used for high-quality voice calls, the voice results after recognition can be used for various application control, and the DOA Angle can be used to locate the voice source.

interaction

After recognizing the speech, define and implement different functional applications corresponding to different speech, such as “go forward” voice control robot forward movement.

Controls

According to the control command output of the “interactive” module, the mechanical control of the robot is realized.

Robot body

Of course, it also needs a robot body with motion ability to receive control instructions and control motor movement, and finally achieve the effect of controlling robot movement through voice.

The TogetherROS software stack, a robotics development platform released by Horizontal, comes with a rich, easy-to-use robot development component that includes all the functional modules involved in building an intelligent robot application (such as robot voice control), is completely open source and free, and allows developers to develop again, so let’s get started.

Two preparatory work

Hardware devices and software packages for building robot voice control application cases are ready.

hardware

Hardware includes:

①X3 faction

② Microphone array and switching board

③robot

X3 and TogetherROS match the original biped robot and Little R Tech’s wheatwheel car. Here with the end of the wheel foot robot to introduce, there is no need to worry about the students, the article will introduce their own way to build the robot.

④ Other accessories

A.sb Type C Interface power cable. At least with 5V@2A adapter for X3 power supply.

b. Serial cable. The connection method is as follows:

C.f memory card and card reader. The X3PI development board uses TF memory card as the system boot medium, and it is recommended to use TF memory card with a capacity of at least 8GB and a rate of C10 or more, in order to meet the storage space requirements of Ubuntu system and more application function packages.

Installation system

Refer to the Installation system section of the user manual of the Rising Sun X3 PI.

After the installation is complete (or has already been installed), you need to update the system.

System configuration

To configure the wireless network of the X3 PI, refer to the wireless network section of the X3 PI user manual.

After the wireless network is configured successfully, query the IP address:

As you can see, the IP address assigned to the wireless network of dispatch X3 is 192.168.1.147. Below, use this address and root account (password is root) to remotely connect to dispatch X3 through ssh. After successful login, the status is as follows:

Install the X3 pie on the robot

Install the X3 pie on the robot and test the ability to control the robot’s movement on the X3 pie via the publish /cmd_vel topic.

First, the audio board needs to be installed on the X3 pie, and second, the X3 pie installed with a linear four-mic microphone array is directly fixed to the robot, and the robot’s USB control interface is plugged into the X3 pie.

The installation effect is as follows:

After installation, you can control the robot movement to see if the installation is successful.

For ROS-enabled robots, a ROs-based robot motion control Node (geometry_msgs/msg/Twist) is typically provided with the function of subscribing to the /cmd_vel topic control message (the robot control message defined in ROS2 of the message type (geometry_msgs/ MSG/twist). The motion control command is sent to the robot through USB interface to realize the purpose of controlling the robot motion. The local biped robot used in this paper uses a USB interface, and provides a motion control package running on the X3 pie. After the package subscribs to the control message of /cmd_vel topic, it sends control commands to the robot through USB to realize the control of the robot.

Start the local and final biped robot operation control Node on the X3 pie. Open a terminal and run the following command:

source /opt/tros/setup.bash

ros2 run diablo_sdk ros_bridge_example

After the command is successfully executed, the following information is displayed:

X3 switches on a terminal and controls the robot to rotate at 0.3r/s by publishing /cmd_vel topic messages:

source /opt/tros/setup.bash
ros2 topic pub -r 10 /cmd_vel geometry_msgs/Twist '{linear: {x: 0, y: 0, z: 0}, angular: {x: 0, y: 0, z: 0.3}}'

After the command is successfully executed, the following information is displayed:

The effect of robot rotation after receiving the control command is as follows:

It indicates that the robot can move correctly according to the issued control command message.

How to install other robots

If you have other mobile robots in your hand, such as a robot using Raspberry PI or Jetson Nano as the upper computer, you can also install X3 PI on the robot to replace the Raspberry PI or Jetson Nano to control the robot movement. The installation method is as follows:

① Compile the motion control package that can run on the X3 pie

a.X3 Install ROS2 software system building and compilation tools:

apt update
apt-get install python3-catkin-pkg
pip3 install empy
pip3 install -U colcon-common-extensions

b. Copy the robotic motion control ROS2 package source code originally running on the Raspberry PI or Jetson Nano to the X3 PI.

c. On X3, directly use source /opt/tros/setup.bash under the path where the package source code project is located; The colcon build command compiles the package.

d. If the original motion control package is based on ROS1 development, the source code needs to be adapted to ROS2. Only the subscription and processing of the “cmd_vel” topic messages need to be adapted, and if there are other features in the original ROS1 package, you can ignore them for now.

② Installation

a. Fix X3 PI to the robot. If space is limited, the original Raspberry PI or Jetson Nano can be removed.

b. Use USB Type C to power X3. If there is no Type C power output on the robot, you can also use mobile power supply (output at least 5V& DC 2A) to power X3.

c. Plug the USB control interface of the robot into the X3 pie.

③ Test

a.X3 send to start the newly compiled robot motion control package.

b.X3 switch on a terminal and control the robot to rotate at 0.3r/s by Posting /cmd_vel topic message:

source /opt/tros/setup.bash
ros2 topic pub -r 10 /cmd_vel geometry_msgs/Twist '{linear: {x: 0, y: 0, z: 0}, angular: {x: 0, y: 0, z: 0.3}}'

If the robot rotates normally, it indicates that the X3 pie has been successfully installed.

Four complete robot voice control effects

Let’s start testing the full robotic voice control function.

X3 sends a terminal to open, start intelligent voice recognition and voice control script

source /opt/tros/setup.bash
cp -r /opt/tros/lib/hobot_audio/config/ .
bash config/audio.sh
ros2 launch audio_control hobot_audio_control.launch.py

X3 sends a terminal to open and start the robot motion control Node

source /opt/tros/setup.bash

ros2 run diablo_sdk ros_bridge_example

Robot movement is controlled by voice (see attachment at the end of the article for full video content)

Five-principle analysis

In Chapter 4, the script hobot_audio_control.launch.py and the motion control Node are respectively started in the two terminals of the X3 unit to realize the effect of controlling the robot by voice. This chapter analyzes the implementation principle.

System design of App

For a complex robot system, the upper computer and the lower computer are generally configured on the robot. The upper computer of the robot has strong computing power, and can perform complex upper-layer applications of the robot, and at the same time, it can shield the bottom differences of different types of robots to the greatest extent. The robot lower machine generally uses a low-cost MCU processor to collect/control data of various sensors and hardware on the robot body.

The voice controlled robot is composed of two parts, including the upper computer and the lower computer. The upper computer of the robot is X3, running multiple ROS2 nodes, including perception, interaction and control functions. The robot lower machine is a part of the robot body (not explained in detail here). The detailed composition of each part is as follows:

As you can see from the App’s system design drawing, the X3 and TogetherROS robots can use the chip’s AI acceleration capabilities and the abundance of TogetherROS algorithms and robot development components to quickly develop smart robot applications.

The following sections describe each module in detail.

App Node introduction

In the first chapter and the system design chapter, the functional modules required for the robot voice control App are analyzed. The Node is analyzed in accordance with these functional modules.

① Perception

We use the Boxs algorithm repository in TogetherROS (which corresponds to hobot_audio in the system design diagram) to capture audio through intelligent speech recognition nodes and process it through intelligent speech algorithm recognition to publish intelligent voice messages. The main functions include:

a. The microphone collects audio

Intelligent speech recognition Node collects raw audio for speech algorithm module processing.

b. Intelligent analysis of speech algorithm

The intelligent speech recognition Node sends the collected original audio into the speech algorithm SDK module, and uses the BPU processor for AI reasoning through the speech algorithm SDK to realize the functions of noise reduction and recognition of the speech, and returns the intelligent result to the intelligent speech recognition Node in the form of callback.

c. Intelligent voice message release

After the intelligent speech recognition Node gets the intelligent result of the voice algorithm SDK callback, it converts it into the form of SmartAudioData, publishes the message of topic /audio_smart, provides subscriptions and controls the robot movement.

The corresponding package name of Node is ‘hobot_audio’ and the parameters of Node are specified:

“ai_msg_pub_topic_name” : publishes AI-aware results of intelligent voice messages. The default topic name is “/audio_smart”.

“config_path” : The configuration file path required for the intelligent voice recognition module to run, including the wake word and the command word that users expect to be recognized

The design and flow logic of Node are as follows:

② Interaction

Voice control Node subscription intelligent voice recognition algorithm example published contains intelligent voice messages, according to the voice command words “go forward”, “backward”, “turn left”, “turn right” control messages, to control the movement of the robot.

The motion control message published by Node is the one defined in ROS2. The topic is “/cmd_vel” and the message type is “geometry_msgs/msg/Twist”. The corresponding package name of Node is ‘audio_control’ and the parameters of Node are specified:

“ai_msg_sub_topic_name”: Subscribe to the topic name" /audio_smart" that contains intelligent voice information.

“twist_pub_topic_name” : topic name for issuing motion control instruction messages “/cmd_vel”

“move_step”: the step size (speed) of the translation movement, 0.5 indicates that the moving speed is 0.5m/s, the larger the value, the faster the speed

“rotate_step”: the step size (speed) of the rotation movement, 0.5 indicates that the rotation speed is 0.5r/s, the higher the value, the faster the speed

If you need to control the robot movement through other command words, you can modify the relevant codes and configurations of intelligent voice recognition Node and voice control policy Node at the same time. For specific codes, see Code warehouse.

③ Controls

Robot motion control Node Subscribe to voice control Node publishes control messages whose topic is “/cmd_vel” and issues motion control commands to the robot’s lower computer through the USB bus according to the control protocol.

For different types of robots, the control protocols are different, corresponding to different motion control nodes. This paper uses a biped robot.

Node and Topic information when the App is running

After introducing the design principle of this APP and the nodes included in it, what is the relationship between each Node when it runs? For complex applications containing multiple nodes, ROS2 provides the ability to launch nodes in batches using a launch script.

Chapter 4 introduces how to run the robot control APP. So let’s take a look at the hobot_audio_control.launch.py script for App launch No, which reads as follows:

import os
from launch import LaunchDescription
from launch_ros.actions import Node
from launch.actions import IncludeLaunchDescription
from launch.launch_description_sources import PythonLaunchDescriptionSource
from ament_index_python import get_package_share_directory
def generate_launch_description():
return LaunchDescription([
Node(
package='hobot_audio',
executable='hobot_audio',
output='screen',
parameters=[
{"config_path": "./config"},
{"audio_pub_topic_name": "/audio_smart"}
],
arguments=['--ros-args', '--log-level', 'error']
),
Node(
package='audio_control',
executable='audio_control',
output='screen',
parameters=[
{"ai_msg_sub_topic_name": "/audio_smart"},
{"twist_pub_topic_name": "/cmd_vel"}
],
arguments=['--ros-args', '--log-level', 'error']
)
])

Two nodes are specified in the script, where the package configuration item in each Node represents the Node name, namely the intelligent voice recognition Node and voice control Node described earlier. In addition, the method of starting the motion control Node of this last biped robot is ros2 run diablo_sdk ros_bridge_example (this Node is started separately, not in the startup script). After successfully running the launch file and ros_bridge_example above, use the command line tool of ros2 on X3 to query the Node and Topic information running on the device:

root@ubuntu:~# source /opt/tros/setup.bash
root@ubuntu:~# ros2 node list
/audio_capture
/audio_control
/audio_control_parameter_node
/ros_bridge_example
/transform_listener_impl_558896ba50
root@ubuntu:~#
root@ubuntu:~#
root@ubuntu:~# ros2 topic list
/audio_smart
/cmd_vel
/parameter_events
/rosout
/imu/data_raw
/odom
/parameter_events
/quat_odom
/raw_odom
/rosout
/tf
/tf_static

It is found that multiple nodes (corresponding to the started Node) are running on the X3 Node. Communication between these ROS2 nodes is based on pub&sub mechanism, and these nodes are connected in series through topic to form a pipeline.

Because there is a lot of Node and Topic information when the App is running, direct ros2 noce list cannot show the association between nodes. Node Graph function of rqt (ROS2 Foxy version and rqt need to be installed on PC, and PC needs to be in the same network segment as X3PC) can be used to visually display nodes running on X3PC, topics published and subscribed by Node, and the connection relationship between nodes. The following figure (where the Node name is in the oval box and the topic name is in the rectangular box) :

It can be seen that the whole graph (pipeline) starts from audio_capture Node (audio capture and intelligent voice recognition processing, package name hobot_audio), and passes through audio_control Node (receiving intelligent voice results, receiving intelligent voice recognition and processing). ros_bridge_example Node (robot motion control) is the end point, perfectly showing the connection between each Node.

Six voice control advanced

Is it possible to redefine the functionality of voice control? The answer is, yes! In the function introduction of Chapter 1, the corresponding relationship between voice recognition and control functions in the App is introduced in detail, and several command words are briefly displayed. However, if the user needs to recognize other speech or add other control commands, how to operate them?

You can modify the definition and configuration of command words in the intelligent voice recognition Node, and modify the control policy for user-defined command words in the voice control Node, or directly add a voice control APP.

In the intelligent voice recognition Node, the configuration files of the device wake word and command word are in the /opt/tros/lib/hobot_audio/config/hrsc/cmd_word.json file (of course, if the user copies this config folder to another path, It depends on the actual configuration path), the default configuration is as follows:

The first item in the configuration file is the wake word, followed by the command word. You can configure it as required.

7 FAQs

How to reproduce the App effect?

Replicating the App effect involves two parts:

① Robot voice control App

Refer to Chapter 2 for preparations to install TogetherROS on the X3 pie.

② The final biped robot and robot motion control package

Access methods can be found in the product information: https://developer.horizon.ai/forumDetail/94246984227025410

In addition to the biped robot, X3 and TogetherROS are also compatible with Little R Technology’s wheatwheel car, which can also be used to experience App effects directly.

Can I experience the App without a robot?

Can experience. In the absence of a robot, you can use this App to control virtual robot movement in the gazebo simulation environment.

How do I adapt the App to my robot?

Taking a biped robot as an example, this paper describes the effect of a voice control App. The App does not depend on any form of robot. The motion control message published by the App is the message defined in ROS2 (topic is “/cmd_vel” and message type is “geometry_msgs/msg/Twist”). For details refer to the principle analysis in Chapter 5). As shown below, the composition of the App can be divided into two parts: the upper computer of the robot framed in red and the lower computer framed in blue:

① Red dotted box part

This part of the function does not depend on the robot, that is, it can be directly transplanted to any form of robot. The transplantation method is to install the X3 pie on the robot, and install the X3 pie with an audio microphone and TogetherROS as described in Section 2.

② Blue dotted line box

This part of the function depends on the robot, need targeted adaptation. According to the different state of the robot, the corresponding adaptation methods are different. In state 1, there were upper and lower computers on the original robot. For example, Raspberry PI or Jetson Nano was used as the upper computer, and the upper computer had robot motion control Node. Need to recompile robot motion control Node on X3 pai; In state 2, the original robot had only the lower machine. After developing robot motion control Node (refer to components/xrrobot · develop · HHP/app/xr_robot · GitLab (horizon.ai)), compile robot motion control Node on X3.

What microphone arrays does the App support? Can I change my own microphone to pick up the sound?

The App has no requirements for the microphone, but the audio acquisition part of the Horizon robot platform intelligent voice recognition Node is only adapted to the linear four-mic microphone array adapted to X3. If the user needs to use other microphone arrays, it needs to be adapted at the system level, including the audio driver, because it involves related modifications such as the system image driver, which will not be discussed here. Users can also obtain voice through other means, such as the network, or audio files, but these methods require users to modify the code adaptation.

How to adapt my own speech algorithm?

The integrated speech algorithm of the intelligent speech recognition module is integrated through the SDK. For details, see the Perception section of 5.2.

Take a look at the code directory structure:

The horizon_speech_sdk in the include folder of the code is the Horizon Speech algorithm SDK, including the speech algorithm reasoning module. Code each folder description is as follows:

audio_capture: indicates the code related to the audio capture function

audio_engine: Speech algorithm SDK interactive code

horizon_speech_sdk: Speech algorithm SDK external headers and so library

config: The configuration file required for the program to run

Users who want to use their own speech algorithms to experience the effects can directly modify the audio_engine code and remove the horizon_speech_sdk.

How to adjust the speed of the robot?

The motion startup script of the voice-controlled robot is hobot_audio_control.launch.py. At present, the default speed (translation 0.3, rotation 0.5) is adopted for robot motion control by word APP. If you need to modify, you can add two parameters “move_step” and “rotate_step” in the script, which can control the translation and rotation speed of the robot:

Node(
package='audio_control',
executable='audio_control',
output='screen',
parameters=[
{"ai_msg_sub_topic_name": "/audio_smart"},
{"twist_pub_topic_name": "/cmd_vel"},
{"move_step": 0.3},
{"rotate_step": 0.5}
],
arguments=['--ros-args', '--log-level', 'error']
)

For details about the parameters, see Section 5.2.

How to develop an application Node to extend the voice App functionality?

You can refer to the code of audio_control Node in components/xrrobot · develop · HHP/app/xr_robot · GitLab (horizon.ai) to achieve the function.

TogetherROS provides a smart speech recognition Node that supports user-defined voice recognition wake words and command words, as well as voice DOA output so that users can define different command words by modifying the configuration of the smart speech recognition Node, and the smart speech recognition Node will publish the smart result through a message. Help users quickly develop various application examples on the X3 pie.

Can I develop a python Node extension App feature?

Yes, ROS2 supports cross-device, cross-platform, cross-language, and TogetherROS is fully compatible with the ROS2 Foxy version, so it also supports these features. For example, an LED light is installed on the 40PIN of the X3 pie, and the LED light is lit when the robot is in motion. Users can use python to develop a ROS Node, subscribe to the message published by the voice interaction Node whose topic is “/cmd_vel”, and check whether the control command in the message is the start motion command (whether the value is non-0). If there is a non-0 value, it means that the start motion is and the LED light is on; otherwise, the LED light is off.

After using python to develop Node, compile and run directly on the X3 pie, subscribe to the cmd_vel topic message python code sample:

import rclpy
from rclpy.node import Node
from geometry_msgs.msg import Twist
class MinimalSubscriber(Node):
def __init__(self):
super().__init__('minimal_subscriber')
self.subscription = self.create_subscription(
Twist,
'cmd_vel',
self.listener_callback,
10)
self.subscription  # prevent unused variable warning
def listener_callback(self, msg):
self.get_logger().info('I heard: "%s"\n' % msg)
def main(args=None):
rclpy.init(args=args)
minimal_subscriber = MinimalSubscriber()
rclpy.spin(minimal_subscriber)
# Destroy the node explicitly
# (optional - otherwise it will be done automatically
# when the garbage collector destroys the node object)
minimal_subscriber.destroy_node()
rclpy.shutdown()
if __name__ == '__main__':
main()

Topics

语音控制_20220815192138.mp4