Skip to content

T4Dataset

This module implements the database layer for the T4 annotation format, built on top of the abstract base classes in the database module.

Summary

Property Value
Format Json
Annotations 3D Bounding boxes
Modality Multiple LiDAR + multiple cameras
Dependencies t4-devkit, polars
Input Scenario YAML files and T4 annotation directories
Output Sequence[DatasetRecord] saved as Parquet via Polars

Module relationships

Module Role Depends on
t4scenarios.py T4Scenarios extends Scenarios: reads scenario YAML files and builds per-split scenario data scenarios
t4records_generator.py T4RecordsGenerator reads T4 annotations via t4-devkit and emits DatasetRecord scenarios, schemas, t4-devkit
t4dataset.py T4Dataset extends BaseDatabase: orchestrates parallel record generation across scenarios base_database, t4scenarios, t4records_generator, scenarios, schemas, polars
classDiagram
    direction TB

    class polars {
        <<external>>
        DataFrame
        Schema
    }

    class t4_devkit {
        <<external>>
        Tier4
        Sample
        SampleData
        CalibratedSensor
    }

    class scenarios {
        <<databases>>
        Scenarios
        ScenarioData
        DatasetParams
    }

    class schemas {
        <<databases>>
        DatasetRecord
        DatasetTableSchema
    }

    class BaseDatabase {
        <<databases>>
        get_polars_schema()
        get_unique_scenario_data()
        process_scenario_records()
    }

    class T4Scenarios {
        build_scenarios()
        _build_scenario_data()
        _build_scenario_splits()
    }

    class T4RecordsGenerator {
        generate_dataset_records()
        extract_t4_sample_record()
    }

    class T4Dataset {
        process_scenario_records()
        _run_t4records_generator()
    }

    T4Scenarios --|> scenarios : extends Scenarios

    T4Dataset --|> BaseDatabase : extends
    T4Dataset --> T4Scenarios : scenario groups
    T4Dataset --> T4RecordsGenerator : creates per scenario
    T4Dataset --> polars : writes Parquet via DataFrame

    T4RecordsGenerator --> T4Scenarios : reads ScenarioData
    T4RecordsGenerator --> schemas : emits Sequence[DatasetRecord]
    T4RecordsGenerator --> t4_devkit : reads T4 annotations

    schemas --> polars: Sequence[DatasetRecord] to Parquet

Output table schema

T4Dataset.process_scenario_records() produces a list of DatasetRecord objects and persists them as a Polars DataFrame written to Parquet. The table schema is defined in schemas.py via DatasetTableSchema:

Column Polars type Description
scenario_id String Unique identifier of the driving scenario
sample_id String Unique identifier of the individual sample/frame
sample_index Int32 Zero-based index of the sample within scenario
location String Geographic location where data was captured
vehicle_type String Type of vehicle used for data collection

Each row corresponds to one DatasetRecord (a frozen Pydantic model). The Parquet file is cached under the database's cache_path with a filename derived from the database hash for reproducibility.

Implementation

Path Description
autoware_ml/databases/t4dataset/t4scenarios.py T4 scenario YAML parsing and split construction
autoware_ml/databases/t4dataset/t4records_generator.py T4 annotation reading and DatasetRecord generation
autoware_ml/databases/t4dataset/t4dataset.py T4 database orchestration with parallel processing
autoware_ml/databases/scenarios.py Base scenario models (Scenarios, ScenarioData)
autoware_ml/databases/schemas.py DatasetRecord and DatasetTableSchema definitions
autoware_ml/databases/base_database.py Shared BaseDatabase implementation
autoware_ml/scripts/generate_dataset.py Hydra entrypoint for dataset generation

Acknowledgment

T4Dataset is based on the nuScenes dataset schema.

  • Repository: https://github.com/nutonomy/nuscenes-devkit
  • License: Apache 2.0
  • Paper: Caesar, H., Bankiti, V., Lang, A. H., Vora, S., Liong, V. E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., and Beijbom, O. "nuScenes: A Multimodal Dataset for Autonomous Driving." CVPR, 2020.