Skip to content

T4Dataset

This module implements the database layer for the T4 annotation format, built on top of the abstract base classes in the database module.

Summary

Property Value
Format JSON (T4 annotation tables via t4-devkit)
Annotations 3D bounding boxes
Modality Multiple LiDAR (+ cameras in source data, not yet exported)
Dependencies t4-devkit, polars, numpy
Input Scenario YAML files and T4 annotation directories
Output Sequence of dataset rows saved as Parquet via Polars

Module relationships

Module Role Depends on
t4scenarios.py T4Scenarios extends Scenarios: reads scenario YAML files and builds per-split scenario data scenarios
t4records_generator.py T4RecordsGenerator reads T4 annotations via t4-devkit and builds T4SampleRecord per sample scenarios, schemas, t4-devkit
t4sample_records.py T4SampleRecord holds intermediate per-sample data and converts to the unified dataset row model schemas
t4dataset.py T4Dataset extends BaseDatabase: orchestrates parallel record generation across scenarios base_database, t4scenarios, t4records_generator, scenarios, schemas, polars
classDiagram
    direction TB

    class polars {
        <<external>>
        DataFrame
        Schema
    }

    class t4_devkit {
        <<external>>
        Tier4
        Sample
        SampleData
        CalibratedSensor
    }

    class scenarios {
        <<databases>>
        Scenarios
        ScenarioData
        DatasetParams
    }

    class schemas {
        <<databases>>
        Dataset row model
        DatasetTableSchema
        LidarFrameDataModel
        LidarSourceDataModel
        CategoryMappingDataModel
        Box3DDataModel
        Box3DDatasetSchema
        FrameBasicMetadata
    }

    class BaseDatabase {
        <<databases>>
        get_polars_schema()
        get_unique_scenario_data()
        process_scenario_records()
    }

    class T4Scenarios {
        build_scenarios()
        _build_scenario_data()
        _build_scenario_splits()
    }

    class T4RecordsGenerator {
        generate_dataset_records()
        extract_t4_sample_record()
        _extract_lidar_frame()
        _extract_lidar_sweeps()
        _extract_lidar_sources()
        _extract_category_mapping()
    }

    class T4SampleRecord {
        to_dataset_record()
    }

    class T4Dataset {
        process_scenario_records()
        _run_t4records_generator()
    }

    T4Scenarios --|> scenarios : extends Scenarios

    T4Dataset --|> BaseDatabase : extends
    T4Dataset --> T4Scenarios : scenario groups
    T4Dataset --> T4RecordsGenerator : creates per scenario
    T4Dataset --> polars : writes Parquet via DataFrame

    T4RecordsGenerator --> T4Scenarios : reads ScenarioData
    T4RecordsGenerator --> T4SampleRecord : builds per sample
    T4RecordsGenerator --> schemas : uses FrameBasicMetadata, LidarFrameDataModel, ...
    T4RecordsGenerator --> t4_devkit : reads T4 annotations

    T4SampleRecord --> schemas : converts to dataset row model

    T4Dataset --> schemas : record.to_dictionary() to Parquet
    schemas --> polars : DataFrame with DatasetTableSchema

Output table schema

T4Dataset.process_scenario_records() produces a list of DatasetRecord objects and persists them as a Polars DataFrame written to Parquet. For the complete table layout and nested struct definitions, see Dataset Schema.

Each row corresponds to one DatasetRecord (a frozen Pydantic model). The Parquet file is cached under the database's cache_path with a filename derived from the database hash for reproducibility.

Implementation

Path Description
autoware_ml/databases/t4dataset/t4scenarios.py T4 scenario YAML parsing and split construction
autoware_ml/databases/t4dataset/t4records_generator.py T4 annotation reading and per-sample extraction
autoware_ml/databases/t4dataset/t4sample_records.py Intermediate T4SampleRecord to unified dataset row conversion
autoware_ml/databases/t4dataset/t4dataset.py T4 database orchestration with parallel processing

Acknowledgment

T4Dataset is based on the nuScenes dataset schema.

  • Repository: https://github.com/nutonomy/nuscenes-devkit
  • License: Apache 2.0
  • Paper: Caesar, H., Bankiti, V., Lang, A. H., Vora, S., Liong, V. E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., and Beijbom, O. "nuScenes: A Multimodal Dataset for Autonomous Driving." CVPR, 2020.