Quality Checks¶
Quality checks validate data before it is written to a sink. Rivet runs assertions on the computed result set and halts execution if any check fails, preventing bad data from reaching your target catalog.
Assertions vs Audits vs Tests
Assertions run pre-write on computed data in memory. Audits run post-write by reading back from the target catalog. Tests run offline against fixture data. This guide covers assertions. See Assertions, Audits & Tests for the full picture.
How Quality Checks Work¶
Checks are attached to a joint (typically a sink). When the executor reaches that joint, it evaluates every check against the materialized result before writing. If any check fails, execution stops with a RVT-6xx error.
Checks can be declared in:
- Inline in SQL via
-- rivet:assert:annotations - Inline in YAML under the
quality:block - Co-located YAML file with the same stem as the joint
- Dedicated YAML in the
quality/directory
Assertion Types¶
not_null¶
Fails if any row has a NULL in the specified column.
from rivet_core.checks import Assertion
from rivet_core.models import Joint
orders_clean = Joint(
name="orders_clean",
joint_type="sql",
sql="SELECT order_id, customer_id, amount FROM raw_orders WHERE status = 'completed'",
assertions=[
Assertion(type="not_null", config={"column": "order_id"}),
Assertion(type="not_null", config={"column": "customer_id"}),
],
)
unique¶
Fails if any value (or combination) appears more than once.
row_count¶
Fails if the number of rows falls outside the specified bounds.
accepted_values¶
Fails if any row contains a value not in the allowed set.
expression¶
Fails if the SQL expression evaluates to FALSE for any row.
custom¶
Runs a SQL query that must return zero rows to pass.
schema¶
Fails if the result set doesn't match declared column names and types.
freshness¶
Fails if the most recent value in a timestamp column is older than the threshold.
Supported units: m (minutes), h (hours), d (days).
relationship¶
Fails if any value in a column doesn't exist in a referenced joint's column.
Note
The relationship check is recognized and can be declared, but is currently skipped at execution time. It will be fully implemented in a future release.
YAML Configuration¶
Inline quality block¶
# joints/orders_clean.yaml
name: orders_clean
type: sql
sql: |
SELECT order_id, customer_id, amount, status
FROM raw_orders WHERE status = 'completed'
quality:
assertions:
- type: not_null
columns: [order_id]
- type: unique
columns: [order_id]
- type: row_count
min: 1
Co-located quality file¶
Place a YAML file in the same directory with the same stem. It must not contain name and type keys:
# joints/orders_clean.yaml (co-located quality file)
assertions:
- type: not_null
columns: [order_id]
- type: unique
columns: [order_id]
Dedicated quality directory¶
# quality/orders_clean.yaml
joint: orders_clean
assertions:
- type: not_null
columns: [order_id]
- type: expression
sql: "amount >= 0"
Sink Integration¶
Quality checks are most commonly attached to sink joints:
from rivet_core.checks import Assertion
from rivet_core.models import Joint
write_orders = Joint(
name="write_orders",
joint_type="sink",
upstream=["orders_clean"],
catalog="warehouse",
table="orders",
assertions=[
Assertion(type="not_null", config={"column": "order_id"}),
Assertion(type="unique", config={"column": "order_id"}),
Assertion(type="row_count", config={"min": 1}),
],
)
Failure behavior¶
When a check fails:
- Raises a
RVT-6xxerror with check type, joint name, and sample of failing rows - Aborts the write — no data reaches the target
- Exits with a non-zero status code
$ rivet run --joint write_orders
✗ Quality check failed: not_null on column 'order_id' in joint 'write_orders'
3 rows with NULL order_id (showing first 5):
┌──────────┬─────────────┬────────┐
│ order_id │ customer_id │ amount │
├──────────┼─────────────┼────────┤
│ NULL │ 42 │ 19.99 │
│ NULL │ 17 │ 5.00 │
│ NULL │ 88 │ 99.00 │
└──────────┴─────────────┴────────┘
Error: RVT-601 assertion failed — write aborted
Quick Reference¶
| Type | Key config | Fails when |
|---|---|---|
not_null |
column / columns |
Any NULL in column |
unique |
column / columns |
Duplicate values |
row_count |
min, max |
Count outside bounds |
accepted_values |
column, values |
Value not in set |
expression |
sql |
Expression is FALSE |
custom |
sql |
Query returns rows |
schema |
columns |
Schema mismatch |
freshness |
column, max_age |
Timestamp too old |
relationship |
column, to |
Missing FK (currently skipped) |