FAQ#
Installation#
Setup fails#
- In case you encounter errors during setup or build process, please make sure to use an appropriate branch for your environment
- ROS 2 Humble, Ubuntu 22.04: main branch
- ROS 2 Iron, Ubuntu 22.04: main branch
- It's also important to delete
./build
./install
and./src
directory before rebuilding CARET if you have built CARET using wrong settings
CLI tool doesn't work#
In case CLI tool execution fails, please make sure to perform CARET environment settings
source /opt/ros/humble/setup.bash
source ~/ros2_caret_ws/install/local_setup.bash
ros2 caret check_caret_rclcpp <path-to-workspace>
Warning
CARET CLI tool doesn't work properly in Anaconda environment. Please use pure Python.
Recording#
LTTng session doesn't start after ros2 caret record
#
- Check detailed status for recording sequence by adding
-verbose
option (e.g.ros2 caret record -v
)- You will see
N/M process started recording
, whereN
is the number of processes which have been started recording andM
is the number of total processes to be started recording
- You will see
- If
N
increases very slowly, add--recording-frequency
option with integer greater than 100 (e.g.ros2 caret record -f 500
)- Please be careful that it increases the possibility of recording failure
- If
N
remains 0, remove~/.lttng
and start recording again- Please be careful that the first recorded trace data after removing
~/.lttng
tends to lack some events. So, please ignore the data
- Please be careful that the first recorded trace data after removing
- It's also important to make sure you don't have another LTTng session running
So many nodes named /caret_trace_ooooooo
created#
- As described in design section, a node to store CARET events is created for each process. Therefore, if a target application is huge and has a lot of processes, the number of CARET nodes also becomes huge
Visualization#
Result (plot, message_flow, etc.) is not outputted, or there seems something wrong with the result#
- Please use the following commands for verification
ros2 caret check_caret_rclcpp
to check if a target application is built with CARET/rclcppros2 caret check_ctf
to check if tracing data is recorded properly
- Please make sure the followings:
- A target application is built with CARET/rclcpp
- CARET environment is set properly before running a target application
export LD_PRELOAD=$(readlink -f ~/ros2_caret_ws/install/caret_trace/lib/libcaret.so)
source ~/ros2_caret_ws/install/local_setup.bash
- LTTng trace is started before running a target application
ros2 trace -s e2e_sample -k -u "ros2*"
- or consider to use launch file
- Trace data is not discarded
- In case trace data is discarded, use Trace filter
- The size of trace data is proper
- If the size of trace data is extremely small (e.g. only few KByte) and a target application has lots of nodes, the maximum number of file descriptors may not be enough. It can be increased by
ulimit -n 65536
- If the size of trace data is extremely small (e.g. only few KByte) and a target application has lots of nodes, the maximum number of file descriptors may not be enough. It can be increased by
- See Recording for more details
Parts of results are not outputted#
- If certain nodes are not traced but some nodes are traced, some packages may be built without CARET/rclcpp. Please make sure
<depend>rclcpp</depend>
is described inpackage.xml
- Another possibility is that some nodes can't be analyzed due to CARET's limitations:
- CARET cannot analyze a node which has two or more timer callbacks with the same period time setting
- CARET cannot analyze a node which has two or more subscription callbacks with the same topic name
- Callback information in such nodes are not outputted. Also, message flow will be discontinued at such nodes
TraceResultAnalyzeError: Failed to find
error occurs#
- The error occurs if information in an architecture file and trace data are inconsistent with each other
- Please modify the architecture file or check recording process
- e.g.
TraceResultAnalyzeError: Failed to find callback_object.node_name: /localization/pose_twist_fusion_filter/ekf_localizer, callback_name: timer_callback_0, period_ns: 19999999, symbol: void (EKFLocalizer::?)()
Visualization (callback)#
Callback frequency is smaller than expected value#
Plot.create_callback_frequency_plot
calculates frequency from one second to one second. It counts how many times a callback function is called for a second, and just uses the count as frequency. Therefore, the frequency on the last term tends to small because the last term is usually shorter than one second- Another possibility is that the frequency of a subscription callback will be small if it receives topics not periodically but infrequently. Also, the frequency of a timer callback will be small if the timer dynamically stops/starts
Callback latency is bigger than expected value#
- Some nodes may run initialization process. In this case, the latency time calculated by
Plot.create_callback_latency_plot
is huge on the first execution
Visualization (message flow)#
Message flow is discontinued#
- If parts of nodes/communications don't run at all during recording, message flow stops on the way and such nodes/communications are not displayed on y-axis
- Please make sure that all nodes/communications in a target path run during recording, or modify a target path to analyze actually working path
- Another possibility is that a target path includes a node which CARET cannot analyze due to its limitations as explained above
What is a gray color rectangle in a message flow diagram?#
- A rectangle in a message flow diagram indicates a period from the entry to the exit of a callback function, while a line shows a flow of topics
- Note: a rectangle is not always illustrated
Huge delay between topic publication and callback start#
- In a message flow diagram, elapsed time from
ooo/rclcpp_publish
toooo/callback_start
means latency from when a topic is published to when the following callback starts - It includes the following time:
- Communication (topic) latency
- Wait by ROS scheduler
- Wait by OS scheduler
- In most cases, it doesn't take so much time. In case the time is huge, the followings are possible causes:
- There is a problem in communication
- An executor cannot wake up because other processes occupy CPU
- A callback cannot wake up because other callbacks in the same callback group occupy the executor
- Processing time of a callback is longer than topic subscription period
Message flow looks split#
- Take the following system for example;
Node_C
publishes a topic when it receives a topic fromNode_A
Node_A
publishes a topic with a rate of 50 Hz, whileNode_B
publishes a topic at 10 Hz
- Message flow (
Node_B
->Node_C
->Node_D
) looks split atNode_C
. It's becauseNode_C
publishes 5 topics while receiving 1 topic fromNode_B
- Note: A similar phenomenon will happen even if
Callback c0
is a timer callback
Message flow looks dropped#
- Take the following system for example;
Node_C
publishes a topic when it receives a topic fromNode_A
Node_A
publishes a topic with a rate of 10 Hz, whileNode_B
publishes a topic at 50 Hz
- Message flow (
Node_B
->Node_C
->Node_D
) looks disconnected atNode_C
four times every five messages. It's becauseNode_C
publishes 1 topic while receiving 5 topics fromNode_B
. So 4 topics don't have a corresponding topic to be published toNode_D
- Note: A similar phenomenon will happen even if
Callback c0
is a timer callback
How response time is calculated?#
- In general, response time is the time a system or functional unit takes to react to a given input (reference). Response time calculated by CARET is the time it takes for input data to arrive at the last node. It doesn't include processing time at the first/last node nor latency of an actuator. It's calculated as the sum of communication latency time (from the time when a node publishes a topic to the time when the following node subscribes the topic) and node latency time (from the time when a node subscribes a topic to the time when it publishes another topic) in a path
- In the following diagram, input data at point A is first reflected with output at point X (
ResponseTime_Best
) ResponseTime_Best
can be considered as a path (dataflow) latency timeResponseTime_Best
can be considered as a happy case, which is contrary to the following worst case scenario
- In the following diagram, input data at point A is first reflected with output at point X (
- Assuming that input information is created by a sensor such as an object detection sensor, delay in a sensor should be considered. For instance, if a new object appears at point B, the time from point B to point A should added to the response time. The worst case scenario is that a new object appears just after the previous flow (point C). Response time for the worst case is shown as
ResponseTime_Worst
- CARET can calculate both
ResponseTime_Best
andResponseTime_Worst
using the following APIs:response_time.to_best_case_timeseries()
,response_time.to_best_case_histogram()
response_time.to_worst_case_timeseries()
,response_time.to_best_worst_histogram()
- CARET also provides
response_time.to_histogram()
API. It creates histogram assuming a new object appears from point C to point A at intervals of histogram bin size
Message flow is broken when using RelayNode#
- Message_flow is sometimes broken when using RelayNode. RelayNode uses GenericPublisher and GenericSubscription instead of the usual Publisher and Subscription. These classes do not have the trace points needed for analysis by CARET.
- If you want to record nodes that use GenericPublisher or GenericSubscription, you need to rebuild them with caret-rclcpp and record them with
--light
option. Here are the steps to rebuild a RelayNode.
-
Clone topic_tools into your workspace. You can choose ros2_caret_ws for this workspace.
cd /path/to/workspace mkdir src cd src git clone https://github.com/ros-tooling/topic_tools -b humble
-
Build topic_tools with caret-rclcpp.
cd /path/to/workspace source /opt/ros/humble/setup.bash source ~/ros2_caret_ws/install/local_setup.bash colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release
Please make sure to source local_setup.bash of this workspace before you run RelayNode.