Evaluate Annotationless Perception#

Evaluate Autoware's recognition features (perception) without annotations using the perception_online_evaluator.

Requires Autoware with the following PR features. https://github.com/autowarefoundation/autoware.universe/pull/6556

Evaluation method#

The annotationless_perception evaluation is executed by launching the annotationless_perception.launch.py file. Launching the file executes the following steps:

Execute launch of evaluation node (annotationless_perception_evaluator_node), logging_simulator.launch file and ros2 bag play command
Autoware receives sensor data output from input rosbag and the perception module performs recognition.
The perception_online_evaluator publishes diagnostic topic to /perception/perception_online_evaluator/metrics
The evaluation node subscribes to the topic and evaluates data. The result is dumped into a file.
When the playback of the rosbag is finished, Autoware's launch is automatically terminated, and the evaluation is completed.

Evaluation results#

The output topic of perception_online_evaluator is in the form of the following sample. topic sample

For each subscription, the following judgment results are output for each recognition class.

If all classes are normal, the test is successful.

Normal#

The following two values specified in the scenario or launch argument are used to judge

Threshold
PassRange(Coefficient to correct threshold)

Success or failure is determined for each status.name in /perception/perception_online_evaluator/metrics according to the following rules. Items for which no threshold is set (min, max, mean) are always judged as normal. Only those items for which a threshold is specified are subject to evaluation.

min#

If threshold * lower_limit <= minimum value of min <= threshold * upper_limit, it is assumed to be normal.

max#

If threshold * lower_limit <= maximum value of max <= threshold * upper_limit, it is assumed to be normal.

Lower limit recommended to be 0.0

mean#

If threshold * lower_limit <= average value of mean <= threshold * upper_limit, it is assumed to be normal.

metric_value#

If threshold * lower_limit <= value of metric_value <= threshold * upper_limit, it is assumed to be normal. metric_value is determined by the current topic value only and does not update the values of min, max, and mean metrics.

An illustration is shown below.

metrics

Error#

When the normal condition is not met

Topic name and data type used by evaluation node#

Subscribed topics:

Topic name	Data type
/perception/perception_online_evaluator/metrics	diagnostic_msgs::msg::DiagnosticArray

Published topics:

Topic name	Data type
N/A	N/A

Method of specifying conditions#

The conditions can be given in two ways

Describe in scenario#

Evaluation:
  UseCaseName: annotationless_perception
  UseCaseFormatVersion: 0.3.0
  Conditions:
    ClassConditions:
      # Describe the conditions for each class. If a class with no conditions is output, only the metrics are calculated. It does not affect the evaluation.
      # In the sample data, the class of TRUCK is also output, but the condition is not described, so TRUCK is always Success.
      # When specifying conditions from result.jsonl, only keys described here will be updated.
      # Even though TRUCK metrics appear in result.jsonl, they are not added to the evaluation condition because the TRUCK key is not specified in this example.
      CAR: # classification key
        Threshold:
          # Keys not described will not be evaluated (will always be a success)
          lateral_deviation: { max: 0.4, mean: 0.019 }
          yaw_deviation: { max: 3.1411, mean: 0.05 }
          predicted_path_deviation_5.00: { max: 16.464, mean: 1.8 }
          total_objects_count_r60.00_h10.00: { metric_value: 10 }
        PassRange:
          min: 0.0-2.0 # lower[<=1.0]-upper[>=1.0]
          max: 0.0-2.0 # lower[<=1.0]-upper[>=1.0]
          mean: 0.5-2.0 # lower[<=1.0]-upper[>=1.0]
          metric_value: 0.9-1.1
      BUS: # classification key
        Threshold:
          # Only lateral_deviation is evaluated.
          yaw_rate: { max: 0.05 } # Only max is evaluated.
        PassRange:
          min: 0.0-2.0 # lower[<=1.0]-upper[>=1.0]
          max: 0.0-2.0 # lower[<=1.0]-upper[>=1.0]
          mean: 0.5-2.0 # lower[<=1.0]-upper[>=1.0]
          metric_value: 0.9-1.1

Specify by launch argument#

This method is assumed to be used mainly.

If the file path of result.jsonl output from past tests is specified, the metrics values from past tests are used as threshold values. The values are updated from result.jsonl only for the thresholds listed in the scenario.

The passing range can also be specified as an argument.

An image of its use is shown below.

threshold

driving-log-replayer-cli#

dlr simulation run -p annotationless_perception -l annotationless_threshold_file:=${previous_test_result.jsonl_path} -l 'annotationless_pass_range:={"KEY1":VALUE1"[,"KEY2":"VALUE2"...]}'

# example
dlr simulation run -p annotationless_perception -l annotationless_threshold_file:=$HOME/out/annotationless/2024-0314-155106/sample/result.jsonl -l 'annotationless_pass_range:={"CAR":{"min":"0.0-1.1","max":"0.0-1.2","mean":"0.5-1.3"},"BUS":{"min":"0.0-1.1","max":"0.0-1.2","mean":"0.5-1.3"}}'

WebAutoCLI#

webauto ci scenario run --project-id ${project-id} --scenario-id ${scenario-id} --scenario-version-id ${scenario-version-id} --simulator-parameter-overrides 'annotationless_threshold_file=${previous_test_result.jsonl_path},annotationless_pass_range:={"KEY1":VALUE1"[,"KEY2":"VALUE2"...]}'

Autoware Evaluator#

Add to parameters in the simulator configuration in .webauto-ci.yml.

simulations:
  - name: annotationless_perception
    type: annotationless_perception
    simulator:
      deployment:
        type: container
        artifact: main
      runtime:
        type: simulator/standard1/amd64/medium
      parameters:
        annotationless_threshold_file: ${previous_test_result.jsonl_path}
        annotationless_pass_range:
          KEY1: VALUE1
          KEY2: VALUE2

How to update scenario conditions#

The driving-log-replayer-cli has the ability to run multiple scenarios in succession that exist under the data_directory of a profile. On the other hand, when evaluation conditions are given as arguments, the same arguments are applied to multiple scenarios, which is inconvenient.

In the case of local testing using driving-log-replayer-cli, instead of specifying arguments, the following commands are provided so that scenario conditions can be updated as needed.

update-condition command to manually update scenario conditions
run's -u option to automatically update scenario conditions after a simulation run

There are two ways to update

existing Update only those items that appear in the scenario
all Update all values in the metrics

# manual update
dlr simulation update-condition -s ${scenario_path} -r ${result.jsonl_path} -u ${existing|all}

# automatically update scenario after simulation run
dlr simulation run -p annotationless_perception -u ${existing|all}

Arguments passed to logging_simulator.launch#

To make Autoware processing less resource-consuming, modules that are not relevant to evaluation are disabled by passing the false parameter as a launch argument. The following parameters are set to false:

perception: true
planning: false
control: false
sensing: false / true (default false, set by launch argument)

How to specify the sensing argument#

driving-log-replayer-cli#

dlr simulation run -p annotationless_perception -l sensing:=true

WebAutoCLI#

webauto ci scenario run --project-id ${project-id} --scenario-id ${scenario-id} --scenario-version-id ${scenario-version-id} --simulator-parameter-overrides 'sensing=true'

Autoware Evaluator#

Add to parameters in the simulator configuration in .webauto-ci.yml.

simulations:
  - name: annotationless_perception
    type: annotationless_perception
    simulator:
      deployment:
        type: container
        artifact: main
      runtime:
        type: simulator/standard1/amd64/medium
      parameters:
        sensing: "true"

simulation#

State the information required to run the simulation.

Topic to be included in the input rosbag#

Topic name	Data type
/pacmod/from_can_bus	can_msgs/msg/Frame
/localization/kinematic_state	nav_msgs/msg/Odometry
/sensing/gnss/ublox/fix_velocity	geometry_msgs/msg/TwistWithCovarianceStamped
/sensing/gnss/ublox/nav_sat_fix	sensor_msgs/msg/NavSatFix
/sensing/gnss/ublox/navpvt	ublox_msgs/msg/NavPVT
/sensing/imu/tamagawa/imu_raw	sensor_msgs/msg/Imu
/sensing/lidar/concatenated/pointcloud	sensor_msgs/msg/PointCloud2
/sensing/lidar/*/velodyne_packets	velodyne_msgs/VelodyneScan
/tf	tf2_msgs/msg/TFMessage

The vehicle topics can be included instead of CAN.

Topic name	Data type
/localization/kinematic_state	nav_msgs/msg/Odometry
/sensing/gnss/ublox/fix_velocity	geometry_msgs/msg/TwistWithCovarianceStamped
/sensing/gnss/ublox/nav_sat_fix	sensor_msgs/msg/NavSatFix
/sensing/gnss/ublox/navpvt	ublox_msgs/msg/NavPVT
/sensing/imu/tamagawa/imu_raw	sensor_msgs/msg/Imu
/sensing/lidar/concatenated/pointcloud	sensor_msgs/msg/PointCloud2
/sensing/lidar/*/velodyne_packets	velodyne_msgs/VelodyneScan
/tf	tf2_msgs/msg/TFMessage
/vehicle/status/control_mode	autoware_auto_vehicle_msgs/msg/ControlModeReport
/vehicle/status/gear_status	autoware_auto_vehicle_msgs/msg/GearReport
/vehicle/status/steering_status	autoware_auto_vehicle_msgs/SteeringReport
/vehicle/status/turn_indicators_status	autoware_auto_vehicle_msgs/msg/TurnIndicatorsReport
/vehicle/status/velocity_status	autoware_auto_vehicle_msgs/msg/VelocityReport

Topics that must not be included in the input rosbag#

Topic name	Data type
/clock	rosgraph_msgs/msg/Clock

The clock is output by the --clock option of ros2 bag play, so if it is recorded in the bag itself, it is output twice, so it is not included in the bag.

evaluation#

State the information necessary for the evaluation.

Scenario Format#

See sample

Evaluation Result Format#

See sample

The format of each frame and the metrics format are shown below. NOTE: common part of the result file format, which has already been explained, is omitted.

{
  "Frame": {
    "Ego": {},
    "OBJECT_CLASSIFICATION": {
      // Recognized class
      "Result": { "Total": "Success or Fail", "Frame": "Success or Fail" }, // The results for Total and Frame are the same. The same values are output to make the data structure the same as other evaluations.
      "Info": {
        "name_min_max_mean": { "min": "min value", "max": "max value", "mean": "average value" },
        "name_metric_value": { "metric_value": "value"},
        ...
      },
      "Metrics": {
        "name_min_max_mean": {
          "min": "Minimum value of min",
          "max": "Maximum value of max",
          "mean": "Average value of mean"
        },
        ...
      }
    }
  }
}

See the figure below for the meaning of items

lateral_deviation

predicted_path_deviation