DataForge Platform

Ember

A prescriptive data catalog for scalable platforms

Ember is DataForge's prescriptive data catalog — the system of record for how data is shaped, validated, and interpreted across the platform.

Unlike traditional catalogs that observe data after the fact, Ember is prescriptive by design. It captures intent as declarative logic and applies it consistently across pipelines so meaning does not fragment as the platform scales.

Get started free Talk to us

psql — ember_prod

ember_prod=#

SELECT

s.source_name, s.connection_type,

COUNT(DISTINCT e.enrichment_id) AS rules,

MAX(h.end_datetime) AS last_run,

COUNT(h.process_id) FILTER (WHERE h.status_code = 'C') AS runs_30d

FROM meta.source s

JOIN meta.enrichment e ON e.source_id = s.source_id AND e.active_flag

LEFT JOIN history.process h ON h.source_id = s.source_id

WHERE s.project_id = 496 AND s.active_flag

GROUP BY s.source_name, s.connection_type

ORDER BY rules DESC LIMIT 5;

source_name	conn_type	rules	last_run	runs_30d
Coins - JCCosttran	table	138	2025-05-11 06:12	31
Linc - vSMAgreement	file	132	2025-05-11 06:14	31
Linc - vSMWorkOrder	file	110	2025-05-11 06:15	31
RieckC - SEOrderVal	table	96	2025-05-11 06:13	28
Linc - vSMDetailTxn	file	69	2025-05-11 06:16	31

(5 rows)

Catalogs That Observe Do Not Scale

Catalogs That Prescribe Do

Traditional data catalogs are built to document what already exists. They scan schemas, collect descriptions, and surface metadata after pipelines are deployed.

This works early. It breaks as pipelines multiply and logic diverges.

Ember was built for the opposite problem.

Traditional Catalogs

Observe data after the fact
Document outcomes rather than intent
Track tables and rows, not logic
Drift as pipelines evolve independently
Require manual governance to stay accurate

Ember

Defines intent before pipelines run
Prescribes how data should behave
Applies rules consistently across pipelines
Keeps meaning stable as complexity grows
Built-in governance — no manual effort required

Because Ember defines behavior rather than recording outcomes, it becomes part of how pipelines are built and operated, not a system consulted after the fact.

With Ember, the data platform has a true system of record — an explicit source of truth for how data is defined and processed.

Where Definition Meets Execution

Every declarative rule defined in Ember is linked to the pipelines that run it and the outcomes those pipelines produce. Execution details are not inferred or stitched together from external systems.

Define

Rules, sources, and validations are declared explicitly in Ember — not embedded in pipeline code.

Execute

Pipelines run automatically from Ember definitions. Dependencies are implicit. Orchestration is never manually defined.

Observe

Execution metadata, timing, lineage, and outcomes are captured automatically and tied back to the definitions that produced them.

What Ember captures

Declarative definitions for data shape, rules, and validation
How declarative logic executes within the single enforced pipeline architecture defined by Alloy
Execution metadata including timing, dependencies, and outcomes
Lineage that reflects actual transformations, not inferred guesses

Because definition and execution are stored together, Ember becomes the authoritative source for understanding how data behaves across the entire platform.

Column lineage — live from DataForge

Lineage is captured automatically — no manual mapping required.

Understand Pipelines Without Reverse Engineering

In most data platforms, understanding pipeline behavior requires custom logging, external observability tools, and manual investigation. In Ember, visibility is automatic and always accurate.

Because logic is declarative and execution is standardized, Ember captures operational detail as pipelines run. There is no need to instrument pipelines or reconstruct behavior from scattered logs.

What teams can see by default

Which rules and transformations ran
Where logic succeeded or failed
How long each step took
What downstream data was affected

Pipeline process log — live from DataForge

Process Timeline 4/9/2024

Every step — timing, scope, operation, and status — captured without instrumentation.

Why this matters

When teams can see what happened and why, they debug faster, onboard new developers more easily, and make changes with confidence.

Operational clarity is not a feature — it is a consequence of a well-defined system.

The Queryable Catalog

Everything is SQL. Everything is Queryable.

Every configuration, rule, and execution record in Ember lives in a Postgres database. The meta schema holds all pipeline configuration. The history schema holds all execution records. Both sit in the same database — directly cross-queryable. No separate APIs. No dashboards to export. No log pipelines to build.

meta.sourcemeta.enrichmentmeta.enrichment_historymeta.outputmeta.output_columnhistory.processlineage.destination_query()lineage.origin_query()

Audit Trail meta schema

-- Built-in change history. No log tooling.

SELECT

update_datetime,

name AS rule,

expression AS new_expr,

updated_by

FROM meta.enrichment_history

WHERE source_id = 1047

ORDER BY update_datetime DESC;

Every rule change is automatically recorded with who changed it and when. No instrumentation required.

Operations Debugging cross-schema

-- Config + history. One query, no tools.

SELECT

s.source_name,

h.status_code, h.rows_processed,

h.start_datetime, h.end_datetime

FROM history.process h

JOIN meta.source s

ON s.source_id = h.source_id

WHERE h.status_code != 'C'

AND h.start_datetime

> now() - interval '24h';

Config and execution history in a single cross-schema join. Replaces external log analytics and monitoring pipelines.

Full Lineage lineage schema

-- Downstream impact for any rule.

SELECT *

FROM lineage.destination_query(

p_enrichment_id := 4821

);

-- origin_query() traverses upstream

-- destination_query() traverses downstream

-- Accepts source, rule, output, mapping

Stored procedures traverse the full dependency graph and return the complete lineage chain as a tabular result set.

Logic Has a Shape

Ember does not allow logic to be defined arbitrarily. Every rule, transformation, and validation must conform to a small set of explicit, structured patterns — defined declaratively, attached to known entities, scoped to specific stages.

What this structure enforces

Logic must be defined declaratively, not embedded ad hoc in pipelines
Transformations attach to known entities and lifecycle stages
The unit of work is fixed and consistent across all pipelines
Custom code is allowed only in controlled, explicit locations

Without Ember vs. With Ember

Three pipelines, three definitions

# Pipeline A — Engineer 1

df['revenue'] = df['price'] * df['qty']

df.dropna(subset=['revenue'])

-- Pipeline B — Engineer 2

SELECT price * quantity AS revenue

FROM orders WHERE price IS NOT NULL

// Pipeline C — Engineer 3

revenue = coalesce(price, 0) * qty

// TODO: add null check

One definition — applied everywhere

source: orders

rule: revenue

expression: SUM([This].price * [This].qty)

type: DECIMAL(18,2)

nullable: false

→ Applied consistently across all pipelines and engineers

This constraint is intentional.

Custom code is supported, but only in clearly defined extension points. It operates within the same structural boundaries rather than redefining how the pipeline works.

The architecture is predetermined.

Logic simply slots into a known structure. Meaning remains consistent. Execution remains predictable.

The result

Dependencies are always implicit
Orchestration is never manually defined
Pipeline design disappears as a concern
Complexity grows without multiplying workflows

This is a subtle advantage in small systems. At scale, it is a fundamental shift.

Built for Automation at Scale

How Talos and Ember work together

Input

"Calculate revenue per customer"

Natural language

Translates

Talos

AI control plane

Enforces

Ember

Same rules as humans

Runs

Pipeline

Fully automated

Ember is not just a catalog that documents what happened.

It is a prescriptive definition layer that replaces pipeline design itself.

The same constraints that allow large engineering teams to scale without chaos are what make AI-driven automation possible.

"Ember works because it removes choice where choice creates fragmentation."

Logic has a fixed shape.
Execution follows a single model.
Definitions are explicit and shared.

Talos is DataForge's AI control plane that translates natural language into Ember's structured definitions. When something does not conform, Ember responds exactly as it would for a human developer — by rejecting it.

There is no special path for AI. Ember is built to support both developers and AI from the ground up.

Ember is the missing bridge between large language models and fully automated data pipelines.

Get started for free Talk to us

Solution guides

Evaluate DataForge by platform goal

View DataForge facts

Enterprise data platform

Ember

A prescriptive data catalog for scalable platforms

Catalogs That Observe Do Not Scale

Catalogs That Prescribe Do

Traditional Catalogs

Ember

Where Definition Meets Execution

Define

Execute

Observe

What Ember captures

Understand Pipelines Without Reverse Engineering

What teams can see by default

Everything is SQL. Everything is Queryable.

Logic Has a Shape

What this structure enforces

This constraint is intentional.

The architecture is predetermined.

Built for Automation at Scale

Evaluate DataForge by platform goal

Enterprise data platform for governed analytics at scale

Data pipeline platform for complex enterprise source systems

Data engineering platform with architecture built in

Data orchestration platform without manually assembled DAG sprawl

Data observability platform with lineage, quality, audit, and cost context