Runtime recording#

As explained in Tracepoint section, CARET records meta-information at initialization and reduces tracepoint data as much as possible at runtime. This allows for a low overhead at runtime, but recording including meta-information requires a LTTng session before running the application.

CARET requires a set of recorded data to have meta-information and timestamps on events.
To allow users to start session anytime when they want, CARET stores meta-information into disk when recording session starts. To stop and restart recording session, CARET holds meta-information on memory until a target application is terminated.

This section explains the details of runtime recording feature.

Basic idea#

Runtime recording is a feature that holds initialization information on memory and stores it to trace data after recording session starts. It let user start recording session anytime.

For this feature, each tracepoint has three states as below.

WAIT state
- Obtain information on running applications and store trace data in memory.
PREPARE state
- Record stored trace data as LTTng tracepoints (delayed recording).
RECORD state
- Record runtime trace data as LTTng trace points (synchronous recording).

A dedicated-node, named as trace node, is executed per ROS 2 process to manage the state. A trace node is executed on a dedicated thread along with threads for ordinary nodes. It is created when application is launched.

Notice

A trace node runs on a thread created via function hooking. This thread is created even if a ROS 2 process is not implemented with rclcpp. A trace node thread is created if a ROS 2 process is implemented with rclpy, and it control the states as well. Though the trace node runs on a Python-based node, recording events for the node is not performed correctly. Only initialization tracepoints are recorded. Python serves Global Interpreter Lock (GIL) mechanism, but a trace node runs on a asynchronous thead which is not blocked by GIL.

Typical use cases are shown as follows.

# Run a node at Terminal 0 first.
ros2 run pkg node

# Execute "record" command with Termial 1 after node startup.
ros2 caret record

State transition is shown below.

uml diagram

Refer to the sequence diagram is written in Sequence for further details.

A trace node has a topic-based interface as well as an ordinary ROS 2 node. Topic message is used to get state from a trace node or change state of it.

Besides, to maintain compatibility of conventional usage, CARET is able to record meta-information and runtime events when session has started in advance.

uml diagram

Note that meta-information is recorded in each LTTng session.

The following state diagram shows state machine of the three states

uml diagram

Refer to Status for further details of the state machine.

Multi-host system#

On multi-host system, typical use cases are shown as follows.

# Run a node at Terminal 0-0 on Host 0.
ros2 run pkg0 node0

# Run a node at Terminal 1-0 on Host 1.
ros2 run pkg1 node1

# Execute "record" command with Termial 0-1 on Host 0.
ros2 caret record

# Execute "record" command with Termial 1-1 on Host 1.
ros2 caret record

State transition is shown below.

uml diagram

Please note that "Start recording" and "Stop recording" are sent to all trace nodes regardless of its host since they are topic messages. To prevent state transitions by messages from other hosts, trace node ignores messages as follows.

Ignore "Start recording" when no active LTTng session exists.
Ignore "Start recording" when its state is not WAIT.
Ignore "End recording" when an active LTTng session exists.

Topic#

Runtime recording uses the following topic messages.

topic name	message type	role
`/caret/start_record`	Start.msg	Start recording. Transition to PREPARE state.
`/caret/end_record`	End.msg	End recording. Transition to WAIT state.
`/caret/status`	Status.msg	Sync current recording state.

Start.msg#

uint32 recording_frequency 100
string ignore_nodes  # reserved
string ignore_topics # reserved
string select_nodes  # reserved
string select_topics # reserved

CARET records sets of meta-information to a LTTng ring-buffer one by one rather than tries to store those meta-information at once. CARET serves a parameter, recording_frequency, to control velocity to record meta-information. recording_frequency is frequency at which each process records meta-information. It decides how many sets of meta-information is stored to the ring-buffer per second. If the frequency is higher, it costs less time to complete meta-information recording, but possibility of tracer discarded is higher.

ignore_nodes ignore_topics select_nodes, and select_topics are unused fields for the future implementation. They are reserved fields for setting tracepoint filtering at the start of the measurement from CLI.

Info

Another method to avoid tracer discarded is writing meta-information with blocking mode. LTTng serves a function to apply blocking mode to chosen events, and chosen events are written to disks exactly. Blocking mode will reduce occurrence of data loss. In this moment, recording_frequency is introduced to mitigate data loss because range of influence on implementation is smaller than blocking mode.

Status.msg#

int8 UNINITIALIZED=0
int8 WAIT=1
int8 PREPARE=2
int8 RECORD=3

string caret_node_name
int8 status
string[] node_names # reserved
int64 pid # reserved

A trace node name is given to caret_node_name field.

status is the WAIT, PREPARE, or RECORD status.

node_names field is unused in the present, it will be utilized by a future function. It is a reserved field to represent a list of node names managed by the trace node.

pid field is also unused because it will be used for an unimplemented feature. It is a reserved field to represent the process ID.

End.msg#

(Empty)

The End topic is for notification, so its contents are empty.

State definition#

A detailed state transition is shown below.

uml diagram

WAIT#

item	description
Transition conditions for entering	- Start application with no active LTTng session. - Receive messages from `/caret/end_record` topic when no active LTTng session exists.
Transition conditions for exiting	- Receive messages from `/caret/start_record` topic when an active LTTng session exists.
Initialization trace point	- Store in memory. - Record as LTTng tracepoint (synchronous recording).
Runtime trace data	- Discard.

PREPARE#

item	description
Transition conditions for entering	- Receive messages from `/caret/start_record` topic when current state is WAIT and active LTTng session exists.
Transition conditions for exiting	- Receive messages from `/caret/end_record` topic when no active LTTng session exists. - Finish recording stored initialization trace data.
Initialization trace data	- Record as LTTng tracepoint (synchronous recording). - Record stored data as LTTng tracepoint at fixed frequency from trace nodes (delayed recording).
Runtime trace data	- Discard to prevent discarding initialization trace data.

Velocity of storing initialization trace data to a LTTng's ring buffer is adjusted with recording_frequency in Start.msg.

Info

Initialization trace data are recorded synchronously in all states. In the PREPARE state, the same data are also recorded from trace nodes with delay. In this way, Initialization trace data are recorded as much as possible, even if the LTTng session and the application are started in the opposite order. Especially in the PREPARE state, there are two types of recording: synchronous recording and delayed recording from trace nodes. Therefore, the same data may be stored in duplicate. Duplicate data are handled on the caret_analyze side.

RECORD#

item	description
Transition conditions for entering	- Start application with active LTTng session. - Finish recording stored initialization trace data.
Transition conditions for exiting	- Receive messages from `/caret/end_record` topic when no active LTTng session exists.
Initialization trace data	- Record as LTTng tracepoint (synchronous recording).
Runtime trace data	- Record as LTTng tracepoint (synchronous recording).

Sequence#

Details of the sequence diagram are shown below.

# Run a node at Terminal 0 first.
ros2 run pkg node

# Execute "record" command with Termial 1 after node startup.
ros2 caret record

uml diagram

Sequence (multi-host system)#

Details of the sequence diagram on multi-host system are shown below.

# Run a node at Terminal 0-0 on Host 0.
ros2 run pkg0 node0

# Run a node at Terminal 1-0 on Host 1.
ros2 run pkg1 node1

# Execute "record" command with Termial 0-1 on Host 0.
ros2 caret record

# Execute "record" command with Termial 1-1 on Host 1.
ros2 caret record

uml diagram

Tracepoint#

Runtime recording feature has delayed recording which supports recording activation anytime after a target application launches. As a timestamp is given when event is recorded, that for initialization trace point is different from actual time when the trace point is called. It is inconvenient for analysis script provided by caret_analyze because it utilizes invocation time of the initialization trace point. For example, expected time when timer callback is invoked is calculated from initialization time and a given period. If only recording time is given, the expected time cannot be calculated correctly.

To tackle this inconvenience, all initialization trace points have timestamps given respectively when they are called during launch of a target application.

[ros2:rcl_timer_init] (-> [ros2_caret:rcl_timer_init])

(context)
time (time that a lttng tracepoint is called.)
...

(tracepoint data)
void * timer_handle
int64_t period
int64_t init_timestamp (timestamp given when trace point is called during )

init_timestamp is an added argument which has original time when initialization trace point is invoked. As the prefix of ros2: is for ros2_tracing, ros2_caret is prefix for representing trace points for CARET.