2023 03 25 Raven Status Update

A fair amount of progress as been made on Raven over the last few days.

I have list of TODO items that need to be tackled. One of those items has been on the list or months and this week it bubbled to the top. My proximity sensor values were being time stamped with old values.

In ROS (Robot Operating System), the whole software stack that tries to generate commands to move the robot somewhere interesting relies on a flood of data coming in, like SONAR readings, time of flight readings and LIDAR readings. Each of those readings comes with a timestamp indicating when the reading was taken.

On my robot, the readings are made on a custom PC board I made, and then sent to the main computer of the robot. The custom board needs to have the same sense of “now”-ness as the main PC — the two computers have to agree within a few milliseconds on what time it is now. It doesn’t do any good to take a SONAR reading and then have it show up a half second late to the main computer. The algorithms all have a small “tolerance” for time skew, and if sensor readings are too old, they are rejected out of hand as useless.

The software piece I use to transmit my sensor values from the custom board to the main computer is a package called “Micro ROS”. I use this software with some small customization as my custom board has a lot more capability to it that the usual, intended board using Micro ROS. 

One function built into Micro ROS is the ability to ask another computer what time it is, and set its own clock to match that other computer. But setting it just once doesn’t work. Each computer has a clock driven by a crystal-controlled oscillator, and crystals drift as they heat up. Worse, much worse, the CPU clock chip in my custom board seems to “burp” now and then when interrupt signals come in and my hardware generates a fair number of interrupt signals.

Another problem is that Micro ROS has baked into it an expectation that the main computer is using a particular version of communication software, and the expected version currently has bugs which cause my stereo cameras to not operate correctly. It took a bit of reading for me to realize that little factoid.

For the moment, I can ignore that, so I set my main computer back to using the buggy communication software. Also, when Micro ROS asks for the current time, it takes “quite a bit of time” to get the result, usually about 150 milliseconds, but sometimes as much as a second. So any answer it gets from the main PC will be inherently wrong right away.

My last few days of programming have been devoted to finding someway to make that all work with an allowable tolerance for errors in timing. I tried over and over, and I’m normally very good at finding creative solutions to tricky problems. Still my code got progressively worse the more I tried to fix the problem. And then, my robot buddy Ralph Hipps called for one of our at-least-daily robot chats and in the process of explaining the problem to him, it occurred to me what the root cause was. My custom board was attempting to do a complex operation during interrupt handling.

Interrupt handlers on a computer must be very quick. If your interrupt handler code takes more than about a millisecond, sometimes even only a few tens of microseconds, everything falls apart. And because I was explaining the symptoms to someone else, I finally realized that I was taking probably tens of milliseconds in my interrupt handler for the SONAR sensors.

Once I realized that, the fix was easy. The details aren’t too important, but I stashed the important information from the signal in a shared, global location and exited the interrupt handler. Then the normal, non-interrupt code picked up the stashed information for processing. Outside of the interrupt handler, you can largely take whatever time you want to process data. Sort of. ROS navigation depends heavily on high “frame rates” from sensors. If you can’t generate, say, LIDAR data 10 times a second or faster, you have to slow your robot way down, otherwise the robot would move faster than it could react to, say, the cat rearing up in front of your robot with nunchucks in hand.

The robot is now happily sending sensor data nearly three times faster than before, about 80 frames a second and rarely gets the time out of sync by very much. Now I can move on to the new problems that have shown up because I fixed this problem.

Below is an obligatory snap shot of a part of the visualization software showing my robot, in a room, with LIDAR, SONAR and time of flight sensors showing as well. No cat is in this picture.

Everything about robots is hard (TM)

Visualizing the noise in proximity sensors

My time-of-flight proximity sensors of my Raven (formerly Puck) robot use the VL53L0X chips and are fairly noisy. I wanted to have a quick way to visualize just how noisy they were. Here is the graph I produced:

Histograms for 8 time-of-flight sensors

The two rows represent the front versus back of the robot frame. The four columns represent the two pairs of sensors for the left and right sides of the robot frame. At each corner of the robot frame, there is a time-of-flight sensor that is pointed outward to the side (left or right) and another that is pointed either forward or backward. Each histogram is showing normalized counts per distance over the last sampling period.

As you can see from the histograms, some of the sensors are fairly noisy. This is with the robot not moving and nothing nearby is moving. Readings at 0 mm should be ignored as I moved all readings that were beyond 2 meters into the 0 meter bucket as they are uninteresting. The histogram in the third column of the first row shows a fairly wide variance in readings, while the histogram in the last column of the last row show a pretty narrow variance.

Below is the python code used to generate the plots. It includes code to read the ROS 2 messages generated by my custom monitor computer and uses matplotlib to do the display.

from datetime import datetime
import rclpy
from rclpy.node import Node
from rclpy.qos import QoSProfile, QoSReliabilityPolicy, QoSHistoryPolicy

import matplotlib.pyplot as plt
import matplotlib.animation as animation
import numpy as np
from sensor_msgs.msg._range import Range
from std_msgs.msg import String

class MinimalSubscriber(Node):

    def __init__(self):
        super().__init__('minimal_subscriber')
        self.start_time = datetime.now()
        self.callback_count = 0

        # These are the title strings shown per sensor histogram.
        # They are a shortcut, for me, as to the physical position of the corresponding sensor.
        # The 'X" shows where the sensor is located. 
        # The '<' and '>' show whether the sensor is on the left or right side of the robot frame.
        # The '^' and 'v' show whether the sensor is on the front or rear of the robot frame.
        self.sensor_names = [['X<^-', '-<^X', 'X^>-', '-^>X'],
                             ['X<v-', '-<vX', 'Xv>-', '-v>X']]
        self.number_sensors = 8 # There are 8 sensors.
        self.number_values_to_cache = 20 # I want to show the variance over this number of the last readings.
        self.last_n_values_per_sensor = np.zeros(
            (self.number_sensors, self.number_values_to_cache), dtype='float')
        self.next_index_number = np.zeros((self.number_sensors), dtype='int32')

        # Create an array of histograms.
        # Two rows for front vs back of robot.
        # Four columns for left-sideways, left-front-or-back, right-front-or-back, right-sideways position.
        self.figure, self.axis = plt.subplots(
            nrows=2, ncols=4, sharex=False, sharey=False, squeeze=False, figsize=(8, 2))
        
        # Set the window title.
        self.figure.canvas.set_window_title('Time Of Flight Sensors step')

        # Create the x-axis values. I'm interested in only ranges from 0.00 to 1.00 meters.
        self.bins = [x / 100.0 for x in range(100)]

        # Make it all look pretty.
        plt.subplots_adjust(hspace=0.6)
        plt.autoscale(enable=True, axis='both', tight=True)
        plt.rcParams['lines.linewidth'] = 1

        # Set up the ROS 2 quality of service in order to read the sensor data.
        qos_profile = QoSProfile(
            reliability=QoSReliabilityPolicy.BEST_EFFORT,
            history=QoSHistoryPolicy.KEEP_LAST,
            depth=1
        )
        
        # Subscribe to the sensor topics.
        for sensor_number in range(8):
            self.subscription = self.create_subscription(
                Range,
                '/tof{s}Sensor'.format(s = sensor_number),
                self.listener_callback,
                qos_profile,
            )

        # Set up the 8 histogram formats and titles.
        self.patches = [1, 2, 3, 4, 5, 6, 7, 8]
        for row in range(2):
            for col in range(4):
                n, bins, patches = self.axis[row][col].hist(
                    self.last_n_values_per_sensor[row][col], self.bins, histtype='bar')
                self.patches[(row * 4) + col] = patches
                self.axis[row, col].set_title(
                    self.sensor_names[row][col], fontsize=8, fontfamily='monospace')
        
        # Let's go.
        plt.ion()
        plt.show()
        
        self.subscription  # prevent unused variable warning


    # Process a time-of-flight sensor message of type Range.
    def listener_callback(self, msg):
        self.callback_count = self.callback_count + 1
        sensor_number = int(msg.header.frame_id[-1])    # Get the sensor number.
        range_value = msg.range
        if (range_value > 2.0):
            # If the range is greater than 2 meters, ignore it by setting it to zero.
            range_value = 0
            
        # Capture the last readings of the sensor in a ring buffer.
        self.last_n_values_per_sensor[sensor_number][self.next_index_number[sensor_number]] = range_value

        if (self.callback_count % 24) == 0:
            # Peridically update the plots.
            for s in range(8):
                # For each sensor, create a histogram.
                data = self.last_n_values_per_sensor[s]
                n, _ = np.histogram(data, self.bins, density=True)
                max = n.max()
                for count, rect in zip(n, self.patches[s]):
                    rect.set_height(count / max) # Normalize the height of the rectangle.
            self.figure.canvas.draw()
            self.figure.canvas.flush_events()
            
            # Print out the frames per second of sensor data for all 8 sensors since the last plot update.
            # Divide by 8 if you want to know the frames per second per sensor.
            duration = datetime.now() - self.start_time
            fps = self.callback_count / (duration.seconds + (duration.microseconds / 1000000.0))
            print("callback_count: %d, duration: %f, fps: %3.2f" % (self.callback_count, (duration.seconds + (duration.microseconds / 1000000.0)), fps))

        # Update the ring buffer index.
        self.next_index_number[sensor_number] = self.next_index_number[sensor_number] + 1
        if self.next_index_number[sensor_number] >= self.number_values_to_cache:
            self.next_index_number[sensor_number] = 0

def main(args=None):
    rclpy.init(args=args)

    minimal_subscriber = MinimalSubscriber()

    rclpy.spin(minimal_subscriber)

    minimal_subscriber.destroy_node()
    rclpy.shutdown()


if __name__ == '__main__':
    main()

2022 11 02 Puck Status

The big result since the last status report is that more, very important safety systems have been implemented to protect the robot and its surroundings, and the robot now has more data about objects near to it as it moves so it can avoid running into those obstacles. And the new power and data control panel is finished, which makes it easier for me to disassemble the robot.

The last week and a half has seen a lot of work on Puck. Two of the time of flight sensors were replaced as the originals were either not sending distance measurements or the measurements were wildly wrong. While that layer of the physical robot case was opened to replace the sensors, I also color-coded all the cables for the time of flight and SONAR sensors, making it easier to trace the cables from the custom micro processor board (a.k.a the Teensy Monitor Board) to the physical sensors. The new power and data control panel was finished and all the new wiring is in place.

I also rewired the signal cable from the relay block to the emergency stop of the motor controller. This would allow the Teensy Monitor to immediately stop power going to the motors if some emergency situation were discovered.

A lot of work was done in the software of the Teensy Monitor. More diagnostic messages were added, which showed that there are still reliability issues with communication between the Teensy Monitor and the sensors, which is something I’ll have to solve eventually. I may just need to add better cable shielding.

The motor temperature sensors now are published to a topic, and the temperature values are now averaged as the sensors tend to be rather noisy. Code was added to detect an over-temperature situation for the motors. When the robot is driven for a long time, the temperature inside the motors can build up to the point that there is a risk of melting the wires inside.

Importantly, code to detect a runaway motor situation is now implemented. This happens especially when the encoder connections to the motor controller come loose. Remember that the robot vibrates a lot when it moves, and the connectors on the RoboClaw motor controller are not designed to stay well connected in the face of vibration.

The redundant motor current sensors were found to be useless, so the code that talked with those sensors is disabled for now. I’ll need to find some better sensors in the future. Meanwhile the motor current sensors built into the RoboClaw are working well enough. And code was added to detect an over-current situation, which can happen when the robot runs up against, say, a wall and the motor wheels stop turning or turn very slowly. Left in this situation, the wires in the motor would quickly overheat and literally melt.

With the three new fault detection code routines, the robot now get shut off via the emergency-stop signal to the RoboClaw if the motors get too hot, if the motors take off at high speed unexpectedly, or if the motors start drawing a lot of current, beyond normal operation. This is a three way protection to save the expensive motors and prevent the robot from running into something and just keep pushing forward.

With the now fully working SONAR and time of flight sensors, and the data being published at 25 to 30 frames per second, I finally figured out how to hook those sensors into the robot motion planning software. Now, if the robot is moving along and comes across an obstacle that only one of the proximity sensors can see (the LIDAR sees objects only at a specific height), the robot can make a plan to move around the object.

Outstanding work remains to discover why some of the sensor values are occasionally rejected by the planning software. It is likely that is because the data is too old, meaning that the data was more than a couple of milliseconds old by the time the planning software saw it. There may be a simple fix, but it needs to be investigated.

Adding SONAR-like sensors to Nav2

It is probably typical that the Navigation-2 (Nav2) setup under ROS 2 is just using LIDAR for generating costmaps. It is also possible to include SONAR-like sensors in your cost maps, that is sensors that publish a Range type of message. The process is easy.

Edit your YAML parameter configuration file for Nav2 as follows. In the “local_costmap” section, you will find a line that lists the plugins to be used to generate the local costmap. Add a “range_layer” to the list. Following the plugins line, you should see, e.g., sections describing the obstacle_layer and inflation_layer — that is, sections corresponding to the names in the plugins list. You will need to add a new section describing the range_layer.

Note, the names obstacle_layer, inflation_layer, range_layer, etc in the plugins list are arbitrary names, they don’t actually make the layer become, say, and obstacle layer. It’s what you supply as sub-parameters that make the section do what it does.

local_costmap:
  local_costmap:
    ros__parameters:
      plugins: ["obstacle_layer", "inflation_layer", "range_layer"]
      range_layer:
        plugin: "nav2_costmap_2d::RangeSensorLayer"
        topics:
          [
            "/sonar0Sensor",
            "/sonar1Sensor",
            "/sonar2Sensor",
            "/sonar3Sensor", 
            "/tof0Sensor",
            "/tof1Sensor",
            "/tof2Sensor",
            "/tof3Sensor",
            "/tof4Sensor",
            "/tof5Sensor",
            "/tof6Sensor",
            "/tof7Sensor",
          ]
      inflation_layer:
        .......rest of local_costmap YAML.....

Just like the section describing, say, the inflation_layer, you will add a new range_layer section by repeating the name given in the plugins list, followed by a colon. The first entry should describe the class name that will handle the data type. In the case of Range type values, the line should say:

plugin: "nav2_costmap_2d::RangeSensorLayer"

Then add a topics entry consisting of the word topics followed by a colon and then a single topic name in quotes, or a list of topic names formed by a left square bracket, then each quoted topic name in the list separated by commas and a closing right square bracket.

That’s it. If you bring up the nav stack, you should now see that when you wave you hand in front of your SONAR sensor, a new obstacle shows up in the local costmap.

There are a couple of caveats to consider. The published topic containing the Range message needs to have a valid frame_id in the header. And that frame_id needs to be described in the URDF for the robot, which, of course, needs to be published so that the transform system in ROS knows how to relate the Range value to the coordinate frames of the robot.

And, the time stamp in the header needs to be accurate and timely. This especially could be a problem if you are, say, publishing the data from an Arduino micro controller which may not have a notion of the time used by the rest of the ROS components. If you are publishing using microROS, there are ways to synchronize your Arduino with the main ROS system.

There are a few additional parameters besides plugin and topics that you can provide for the range_layer. They are described here.