MLflow¶
Autoware-ML uses MLflow for experiment tracking. Training, testing, and deployment all log to a shared local MLflow backend.
Launching the UI¶
Open http://localhost:5000 in your browser.
Options:
--port: Custom port (default: 5000)--db-path: SQLite database path (default:mlruns/mlflow.db)--experiment-name: Export only the specified experiment into an isolated store before launching the UI--config-name: Shorthand for--experiment-nameusing a task config path--export-dir: Directory for the extracted experiment store
By default, the UI opens the shared tracking DB directly. To isolate a single experiment for sharing or inspection, export it first:
autoware-ml mlflow export --config-name calibration_status/calibration_status_classifier/resnet18_t4dataset_j6gen2 --export-dir /tmp/calibration_status_mlflow
autoware-ml mlflow ui --db-path /tmp/calibration_status_mlflow/mlflow.db
This creates an extracted MLflow store for that experiment and lets you open the UI against the exported DB.
For remote access, run MLflow directly with --host 0.0.0.0.
If MLflow is running on a remote machine, 0.0.0.0 only makes it listen on that machine. You still need to forward the port to your local machine. A common option is SSH port forwarding:
Then open:
What Gets Logged¶
Training runs automatically log:
- Metrics: Loss curves, task-specific metrics, learning rate
- Hyperparameters: Complete Hydra configuration
- Artifacts: Resolved config snapshots, run metadata, saved checkpoints, and deploy exports
Testing and deployment create separate runs in the same experiment and keep lineage to the source training run. Deployment runs also log exported ONNX and TensorRT artifacts.
Using the UI¶
Experiments are derived from config paths with / replaced by _
(e.g., <task>_<model>_<config>). Runs are named with stage and timestamp,
and tagged with metadata such as:
config_nametaskmodelconfig_variantstagehydra_dirartifact_dircheckpoint_pathfor test/deploygit_sha
Click a run to view:
- Parameters: All hyperparameters
- Metrics: Interactive training curves
Comparing Runs¶
Select multiple runs and click Compare to view:
- Parallel coordinates plots for hyperparameter relationships
- Scatter plots comparing metrics
- Overlaid training curves
Organizing Experiments¶
Experiments are derived from config paths. Use meaningful config names for clarity.
Add custom tags for organization:
Or via CLI:
Programmatic Access¶
import mlflow
mlflow.set_tracking_uri("sqlite:///mlruns/mlflow.db")
experiment = mlflow.get_experiment_by_name("<task>_<model>_<config>")
runs = mlflow.search_runs(
experiment_ids=[experiment.experiment_id],
filter_string="metrics.`val/loss` < 0.5",
order_by=["metrics.`val/loss` ASC"]
)
Storage Location¶
Autoware-ML now keeps both Hydra scratch outputs and MLflow-owned artifacts
under the same mlruns/ root:
mlruns/mlflow.db: Shared SQLite backend storemlruns/<task>/<model>/<config>/<run_id>/hydra/: Hydra scratch directory and local command outputsmlruns/<task>/<model>/<config>/<run_id>/artifacts/: Run-owned MLflow artifactsmlruns/<task>/<model>/<config>/<run_id>/artifacts/run_metadata.json: Run metadata used to preserve lineage across train, test, and deploy
To back up experiments completely, copy mlruns/.