DataForge Platform

Ember

A prescriptive data catalog for scalable platforms

Ember is DataForge's prescriptive data catalog — the system of record for how data is shaped, validated, and interpreted across the platform.

Unlike traditional catalogs that observe data after the fact, Ember is prescriptive by design. It captures intent as declarative logic and applies it consistently across pipelines so meaning does not fragment as the platform scales.

psql — ember_prod
ember_prod=#
SELECT
s.source_name, s.connection_type,
COUNT(DISTINCT e.enrichment_id) AS rules,
MAX(h.end_datetime) AS last_run,
COUNT(h.process_id) FILTER (WHERE h.status_code = 'C') AS runs_30d
FROM meta.source s
JOIN meta.enrichment e ON e.source_id = s.source_id AND e.active_flag
LEFT JOIN history.process h ON h.source_id = s.source_id
WHERE s.project_id = 496 AND s.active_flag
GROUP BY s.source_name, s.connection_type
ORDER BY rules DESC LIMIT 5;
source_name conn_type rules last_run runs_30d
Coins - JCCosttran table 138 2025-05-11 06:12 31
Linc - vSMAgreement file 132 2025-05-11 06:14 31
Linc - vSMWorkOrder file 110 2025-05-11 06:15 31
RieckC - SEOrderVal table 96 2025-05-11 06:13 28
Linc - vSMDetailTxn file 69 2025-05-11 06:16 31
(5 rows)

Catalogs That Observe Do Not Scale

Catalogs That Prescribe Do

Traditional data catalogs are built to document what already exists. They scan schemas, collect descriptions, and surface metadata after pipelines are deployed.

This works early. It breaks as pipelines multiply and logic diverges.

Ember was built for the opposite problem.

Traditional Catalogs

  • Observe data after the fact
  • Document outcomes rather than intent
  • Track tables and rows, not logic
  • Drift as pipelines evolve independently
  • Require manual governance to stay accurate

Ember

  • Defines intent before pipelines run
  • Prescribes how data should behave
  • Applies rules consistently across pipelines
  • Keeps meaning stable as complexity grows
  • Built-in governance — no manual effort required

Because Ember defines behavior rather than recording outcomes, it becomes part of how pipelines are built and operated, not a system consulted after the fact.

With Ember, the data platform has a true system of record — an explicit source of truth for how data is defined and processed.

Where Definition Meets Execution

Every declarative rule defined in Ember is linked to the pipelines that run it and the outcomes those pipelines produce. Execution details are not inferred or stitched together from external systems.

01

Define

Rules, sources, and validations are declared explicitly in Ember — not embedded in pipeline code.

02

Execute

Pipelines run automatically from Ember definitions. Dependencies are implicit. Orchestration is never manually defined.

03

Observe

Execution metadata, timing, lineage, and outcomes are captured automatically and tied back to the definitions that produced them.

What Ember captures

  • Declarative definitions for data shape, rules, and validation
  • How declarative logic executes within the single enforced pipeline architecture defined by Alloy
  • Execution metadata including timing, dependencies, and outcomes
  • Lineage that reflects actual transformations, not inferred guesses

Because definition and execution are stored together, Ember becomes the authoritative source for understanding how data behaves across the entire platform.

Column lineage — live from DataForge

DataForge column lineage diagram showing three sources converging into a single output enrichment

Lineage is captured automatically — no manual mapping required.

Understand Pipelines Without Reverse Engineering

In most data platforms, understanding pipeline behavior requires custom logging, external observability tools, and manual investigation. In Ember, visibility is automatic and always accurate.

Because logic is declarative and execution is standardized, Ember captures operational detail as pipelines run. There is no need to instrument pipelines or reconstruct behavior from scattered logs.

What teams can see by default

  • Which rules and transformations ran
  • Where logic succeeded or failed
  • How long each step took
  • What downstream data was affected

Pipeline process log — live from DataForge

Process Timeline 4/9/2024
DataForge pipeline process log showing step-by-step timing, scope, and status for each operation

Every step — timing, scope, operation, and status — captured without instrumentation.

Why this matters

When teams can see what happened and why, they debug faster, onboard new developers more easily, and make changes with confidence.

Operational clarity is not a feature — it is a consequence of a well-defined system.

The Queryable Catalog

Everything is SQL. Everything is Queryable.

Every configuration, rule, and execution record in Ember lives in a Postgres database. The meta schema holds all pipeline configuration. The history schema holds all execution records. Both sit in the same database — directly cross-queryable. No separate APIs. No dashboards to export. No log pipelines to build.

meta.sourcemeta.enrichmentmeta.enrichment_historymeta.outputmeta.output_columnhistory.processlineage.destination_query()lineage.origin_query()
Audit Trail meta schema
-- Built-in change history. No log tooling.
SELECT
update_datetime,
name AS rule,
expression AS new_expr,
updated_by
FROM meta.enrichment_history
WHERE source_id = 1047
ORDER BY update_datetime DESC;

Every rule change is automatically recorded with who changed it and when. No instrumentation required.

Operations Debugging cross-schema
-- Config + history. One query, no tools.
SELECT
s.source_name,
h.status_code, h.rows_processed,
h.start_datetime, h.end_datetime
FROM history.process h
JOIN meta.source s
ON s.source_id = h.source_id
WHERE h.status_code != 'C'
AND h.start_datetime
> now() - interval '24h';

Config and execution history in a single cross-schema join. Replaces external log analytics and monitoring pipelines.

Full Lineage lineage schema
-- Downstream impact for any rule.
SELECT *
FROM lineage.destination_query(
p_enrichment_id := 4821
);
-- origin_query() traverses upstream
-- destination_query() traverses downstream
-- Accepts source, rule, output, mapping

Stored procedures traverse the full dependency graph and return the complete lineage chain as a tabular result set.

Logic Has a Shape

Ember does not allow logic to be defined arbitrarily. Every rule, transformation, and validation must conform to a small set of explicit, structured patterns — defined declaratively, attached to known entities, scoped to specific stages.

What this structure enforces

  • Logic must be defined declaratively, not embedded ad hoc in pipelines
  • Transformations attach to known entities and lifecycle stages
  • The unit of work is fixed and consistent across all pipelines
  • Custom code is allowed only in controlled, explicit locations

Without Ember vs. With Ember

Three pipelines, three definitions
# Pipeline A — Engineer 1
df['revenue'] = df['price'] * df['qty']
df.dropna(subset=['revenue'])
-- Pipeline B — Engineer 2
SELECT price * quantity AS revenue
FROM orders WHERE price IS NOT NULL
// Pipeline C — Engineer 3
revenue = coalesce(price, 0) * qty
// TODO: add null check
One definition — applied everywhere
source: orders
rule: revenue
expression: SUM([This].price * [This].qty)
type: DECIMAL(18,2)
nullable: false
→ Applied consistently across all pipelines and engineers

This constraint is intentional.

Custom code is supported, but only in clearly defined extension points. It operates within the same structural boundaries rather than redefining how the pipeline works.

The architecture is predetermined.

Logic simply slots into a known structure. Meaning remains consistent. Execution remains predictable.

The result

  • Dependencies are always implicit
  • Orchestration is never manually defined
  • Pipeline design disappears as a concern
  • Complexity grows without multiplying workflows

This is a subtle advantage in small systems. At scale, it is a fundamental shift.

Built for Automation at Scale

How Talos and Ember work together

Input
"Calculate revenue per customer"
Natural language
Translates
Talos
AI control plane
Enforces
Ember
Same rules as humans
Runs
Pipeline
Fully automated

Ember is not just a catalog that documents what happened.

It is a prescriptive definition layer that replaces pipeline design itself.

The same constraints that allow large engineering teams to scale without chaos are what make AI-driven automation possible.

"Ember works because it removes choice where choice creates fragmentation."
  • Logic has a fixed shape.
  • Execution follows a single model.
  • Definitions are explicit and shared.

Talos is DataForge's AI control plane that translates natural language into Ember's structured definitions. When something does not conform, Ember responds exactly as it would for a human developer — by rejecting it.

There is no special path for AI. Ember is built to support both developers and AI from the ground up.

Ember is the missing bridge between large language models and fully automated data pipelines.