t4sanity
t4sanity performs sanity checks on T4 datasets, reporting any issues regarding the dataset requirements.
$ t4sanity -h
Usage: t4sanity [OPTIONS] DB_PARENT
╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * data_root TEXT Path to root directory of a dataset. [default: None] [required] │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --version -v Show the application version and exit. │
│ --output -o TEXT Path to output JSON file. [default: None] │
│ --revision -rv TEXT Specify if you want to check the specific version. [default: None] │
│ --exclude -e TEXT Exclude specific rules or rule groups. [default: None] │
│ --strict -s Indicates whether warnings are treated as failures. │
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to copy it or customize the installation. │
│ --help -h Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Shell Completion
Run the following command to install completion, and reload shell.
Usages
As an example, we have the following the dataset structure:
Then, you can run sanity checks with t4sanity <DATA_ROOT>:
$ t4sanity <DATA_ROOT>
>>>Sanity checking...: 1it [00:00, 9.70it/s]
=== DatasetID: dataset1 ===
STR001: ✅
STR002: ✅
STR003: ✅
STR004: ✅
STR005: ✅
STR006: ✅
STR007: ✅
STR008: ✅
...
+-----------+---------+--------+--------+---------+----------+
| DatasetID | Version | Passed | Failed | Skipped | Warnings |
+-----------+---------+--------+--------+---------+----------+
| dataset1 | 0 | 49 | 0 | 2 | 3 |
+-----------+---------+--------+--------+---------+----------+
Strict Mode
Basically, rules whose severity is WARNING will be treated as success.
With -s; --strict option enables us to treat warnings as failures:
Exclude Checks
With -e; --excludes option enables us to exclude specific checks by specifying the rule IDs or groups:
Exit Status Logic
t4sanity CLI returns the exit code based on the following conditions:
| Condition | --strict |
Exit Code | Notes |
|---|---|---|---|
At least one Severity.ERROR rule failed |
N/A | 1 | Always fails the run |
At least one Severity.WARNING rule failed, no Severity.ERROR failed |
False (default) |
0 | Run is considered successful, warnings are reported |
At least one Severity.WARNING rule failed, no Severity.ERROR failed |
True |
1 | Treat warnings as failures; exit with failure |
| All rules passed or skipped | N/A | 0 | Run is considered successful |
Dump Results as JSON
To dump results into JSON, use the -o; --output option:
Then a JSON file named result.json will be generated as follows:
{
"dataset_id": "<DatasetID: str>",
"version": <Version: int>,
"reports": [
{
"id": "<RuleID: str>",
"name": "<RuleName: str>",
"severity": "<WARNING/ERROR: str>",
"description": "<Description: str>",
"status": "<PASSED/FAILED/SKIPPED: str>",
"reasons": "<[<Reason1>, <Reason2>, ...]: [str; N] | null>" // Failed or skipped reasons, null if passed
},
]
}
Here is the description of the JSON format:
dataset_id: The ID of the dataset.version: The version of the dataset.reports: An array of rule reports.id: The ID of the rule.name: The name of the rule.severity: How important a rule is.description: A description of the rule.status: What happened when it ran.reasons: An array of reasons for failure or skipped rules, null if passed.