Perception#

Overview#

Perception stack recognizes the surrounding of the vehicle to achieve safe and efficient autonomous driving. The output of Sensing describes environment "as is", and is usually too primitive to be used directly for high-level planning. Perception stack will extract key and organize it into more meaningful data for Planning stack.

Perception_overview

Role#

Perception stack has 2 main roles.

Object Recognition
Traffic Light Recognition

Use Cases#

Perception must provide enough information to support following use cases:

Use case	Required output from `Perception`	How the output is used
1. Changing lane	Object Recognition-Prediction: - Predicted paths of objects on target lane	To decide `when` and `where` changing lane depending on objects' predicted paths. `when`: which timing to trigger lane change depending on obstacles position and velocity. `where`: where to go depending on objects' position and shape.
2. Turning at intersection	Object Recognition- Prediction: - Predicted paths of objects at an intersection	To decide `when` turning at an intersection depending on objects' predicted path. `when`: which timing to turning depending on objects' future paths.
3. Avoiding parked vehicles	Object Recognition- Detection: - Objects' class, shape and, position Object Recognition- Tracking: - Objects' velocity	To decide `where` to avoid objects depending on objects' properties. `where`: where to avoid objects in given area depending on objects' class, velocity, shape and position.
4. Stopping at a crosswalk when pedestrians are walking	Object Recognition- Prediction: - Predicted paths of objects at a crosswalk	To decide where stopping based on pedestrians' position and velocity.
5. Passing intersection without traffic lights	Object Recognition- Detection: - Objects' shape. Object Recognition- Prediction: - Predicted paths of objects at an intersection	To decide `when` passing intersection depending on objects' properties. `when`: which timing to pass intersection while negotiating with other objects based on objects' properties like, predicted paths and shape.
Merging into another lane	Object Recognition- Prediction: - Predicted paths of objects at merging area	To decide when merging into another lane depending objects' predicted paths.
6. Taking over Pedestrian/Cyclists	Object Recognition- Detection: - Objects' shape, position and orientation. Object Recognition- Tracking: - Objects' velocity	To decide `when` and `where` taking over depending on objects' predicted paths `when`: which timing to taking over depending on obstacles position and velocity. `where`: where to go depending on objects' position and shape.
7. Stopping/yielding to an obstacle	Object Recognition- Detection: - Objects' shape, position, and orientation Object Recognition- Tracking: - Objects' velocity	To decide where to stop/yield based on pedestrians' position and velocity.
8. Passing intersection with traffic lights	Traffic Light Recognition- Classification: - Traffic signal status	To decide whether to go or stop based on traffic signal status.

Requirements#

From above table, high-level requirements of Perception stack are summarized as below:

Perception stack should recognize following objects: (Use Case 3, 5, 6)
1. vehicle
2. pedestrian
3. cyclists
4. other objects that is on road or parking lot
For each object, Perception stack should provide following information: (Use Case 1-7)
1. Pose (done in ObjectDetection)
2. Shape (done in ObjectDetection)
3. Predicted future path (done in Object Tracking+Prediction)
Perception stack should provide traffic light information: (Use Case 😎
1. The status of traffic light
2. Unique id of traffic light from map

Input#

Input	Topic (Data Type)	Explanation
LiDAR	`/sensing/lidar/pointcloud` (`sensor_msgs::PointCloud2`)	Pointcloud data captured by LiDAR comes from Sensing stack. Perception stack is allowed to choose subscribing to pointcloud with/without background depending on algorithms.
Camera	`/sensing/{camera_name}/image` (`sensor_msgs::Image`)	Image data captured by Camera comes from Sensing stack. CameraInfo contains intrinsic parameters for the image.
Map	`/map/vector_map` (`autoware_lanelet2_msgs::MapBin`)	This is Map data in lanelet2 format. Map stack has utility packages for processing map data.
Drive Route (optional)	`/planning/route` (`autoware_planning_msgs::Route`)	This is route information for reaching a destination. In Perception stack, it is used for detecting the traffic lights associated with route information.

Output#

Output	Topic Name (Data Type)	Explanation
Dynamic Object	`/perception/object_recognition/objects` (`autoware_perception_msgs::DynamicObjectArray`)	This includes obstacles' information. An obstacle is described by 3 major properties; State, Shape, Semantic. Detail design for these properties is in below Object Recognition section.
Traffic Light State	`/perception/traffic_light_recognition/traffic_light_states` (`autoware_perception_msgs::TrafficLightStateArray`)	This includes the status of traffic light signals in array format.

Design#

This Perception stack consists of 2 separated modules and each module can be subdivided into following components:

Object Recognition (satisfies Requirement 1 and 2)
- Detection
- Tracking
- Prediction
Traffic Light Recognition (satisfies requirement 3)
- Detection
- Classification

Perception_component

Key points of the structure#

Interfaces are separated according to the current algorithm level.
Enable complex autonomous driving use cases by including information like objects' future movement.
Depends on technology development in the future, this structure might be changed (e.g. E2E).

Object Recognition#

Role#

Recognize obstacles that could potentially move. Provide detail information for obstacles required in the Planning stack.

The motivation behind recognizing obstacles comes from a requirement for balancing safety and efficiency in autonomous driving. If emphasizing safety too much, it needs to consider every possible movement of obstacles. Autonomous vehicles could end up freezing. If emphasizing efficiency too much, recognize every object as static obstacles. A car could hit a pedestrian in an intersection because of the efficient drive to a destination. Balanced autonomous driving is achieved by recognizing obstacles.

Requirement#

Perception_object_if

Detection#

Detection component detects objects from sensor data.

Detection component is responsible for clarifying the following objects' property.

Property	Definition	Data Type	Parent Data Type
type	Class information	`uint8`	`autoware_perception_msgs::Semantic`
confidence	Class's confidence 0.0~1.0.	`float64`	`autoware_perception_msgs::Semantic`
pose	Position and orientation	`geometry_msgs::PoseWithCovariance`	`autoware_perception_msgs::State`
orientation_reliable	Boolean for stable orientation or not	`bool`	`autoware_perception_msgs::State`
shape	Shape in 3D bounding box, cylinder or polygon	`autoware_perception_msgs::Shape`	`autoware_perception_msgs::DynamicObject`

Tracking#

Tracking component deals with time-series processing.

Tracking component is responsible for clarifying the following objects' property.

Property	Definition	Data Type	Parent Data Type
id	Unique object id over frames	`uuid_msgs::UniqueID`	`autoware_perception_msgs::DynamicObject`
twist	Velocity in ROS twist format.	`geometry_msgs::TwistWithCovariance`	`autoware_perception_msgs::State`
twist_reliable	Boolean for stable twist or not.	`bool`	`autoware_perception_msgs::State`
acceleration	Acceleration in ROS accel format.	`geometry_msgs::AccelWithCovariance`	`autoware_perception_msgs::State`
acceleration_reliable	Boolean for stable acceleration or not.	`bool`	`autoware_perception_msgs::State`

Prediction#

Prediction component is responsible for clarifying the following objects' property.

Property	Definition	Data Type	Parent Data Type
predicted_path	Predicted future paths for an object.	`autoware_perception_msgs::PredictedPath[]`	`autoware_perception_msgs::State`

Necessary information is defined in autoware_perception_msg::DynamicObjectArray.msg with layered msg structure.

Perception_object_msg

Input#

LiDAR#

sensor_msgs::PointCloud2

LiDAR input is an essential input for recognizing objects. Its ability to describe the 3D world is utilized in detecting obstacles surrounding the vehicle.

Camera (optional)#

sensor_msgs::Image

Camera input is used to obtain details about obstacles. Fine resolution camera enables to detect objects at far distance with high accuracy.

Map#

autoware_lanelet2_msgs::MapBin

Assuming the obstacles which follow rules in a map, Perception stack can infer objects' property such as orientation or future movements by using map information.

Output#

autoware_perception_msgs::DynamicObjectArray

Recognized objects with predicted paths are used in situations like intersection, crosswalk and lane change. Planning stack uses objects' information for avoiding objects or following a vehicle ahead.

Traffic light recognition#

Make sense of traffic light's signal.

Definition#

Not only classifying its color but also understanding unique signals like arrow signals.

It needs to recognize traffic light signals in order to ensure safe autonomous driving.

Requirement#

Need to fill lamp_states in autoware_traffic_light_msg::TrafficLightState.msg

Perception_trafficlight_msg_msg

Property	Definition	Data Type	Parent Data Type
lamp_states	Sequence of traffic light result	`autoware_perception_msgs::LampState[]`	`autoware_perception_msgs::TrafficLightState`
id	This id corresponds to the traffic light id defined in a map	`int32`	`autoware_perception_msgs::TrafficLightState`

Input#

Camera#

sensor_msgs::Image

Mainly using camera data to make sense of traffic light's color.

Map#

autoware_lanelet2_msgs::MapBin

By using a map with traffic light's location, clarify which part of an image needs to be paid attention.

Drive Route (optional)#

autoware_planning_msgs::Route

With the route associated with the traffic light, improve the accuracy of traffic light recognition.

Output#

autoware_perception_msgs::TrafficLightStateArray

Planning stack receives data from this module. It changes the vehicle maneuver based on the result of traffic light recognition.