DataForge Platform

Alloy

The enforced pipeline architecture that scales cleanly, stays transparent, and enables customization where it matters.

Instead of every team designing their own pipeline patterns, Alloy provides a single, named layer model that every data pipeline follows — making platforms easier to build, operate, and extend over time.

Pipeline layers

PRODUCT Outputs for analytics & applications
INGOT Durable, reusable representations
ALLOY Structured incremental transformations
MINERAL Isolated change detection
ORE Raw source data entry point

Data Platforms Don't Fail on Volume.

They Fail on Chaos.

Most data platforms struggle to scale not because of data volume, but because of architectural inconsistency. As new pipelines are added, teams introduce new patterns, exceptions, and one-off designs — each one reasonable in isolation, collectively impossible to operate.

Without a defined and enforced architecture, every new requirement becomes a design exercise. Alloy was built for the opposite problem.

Custom Architecture

  • Every pipeline designed differently
  • Patterns multiply over time
  • Onboarding requires tribal knowledge
  • Rebuilds are expensive and disruptive
  • Operational complexity grows faster than data volume

Alloy

  • Single enforced architecture across every pipeline
  • New pipelines inherit the model automatically
  • Transparent, operable, and documented by default
  • Rebuilds become incremental changes, not rewrites
  • Complexity grows without multiplying workflows

What happens without enforced architecture

This is a real production data platform after years of custom pipeline development without a standardized architecture.

A real production data platform showing the chaos of unconstrained custom pipeline development

Each point represents a manually built code package.
Over time, systems like this become difficult to change, operate, or trust.

A Clear, Explicit Layered Architecture

Alloy enforces a single, named layer model that every data pipeline follows. Each layer has a clear purpose, and the boundaries between layers are explicit, so teams can understand and operate the platform without reverse-engineering hidden behavior.

The Alloy refinement process

The Alloy refinement process: ORE → MINERAL → ALLOY → INGOT → PRODUCT

No hidden layers. No implicit stages. No shortcuts.
Every pipeline moves through the same explicit architecture.

ORE

Raw source data enters the platform in its original form, providing a consistent and observable starting point.

MINERAL

Changes are detected and isolated so incremental processing is the default, not an optimization added later.

ALLOY

Data is enriched incrementally within a structured, repeatable transformation layer.

INGOT

Data is refined and consolidated into durable, reusable representations.

PRODUCT

Final outputs are delivered in a form optimized for analytics, applications, and downstream systems.

Incremental by Design

Most platforms treat incremental processing as a performance optimization added later. In Alloy, it is a foundational behavior — built into the architecture from the first layer so pipelines never waste compute reprocessing data that hasn't changed.

01

Ingest

Raw data lands in ORE exactly as received from the source — no transformation, no assumptions, no loss of fidelity.

02

Detect

Mineral isolates only the records that are new or changed — so the downstream enrichment step only ever processes the incremental batch.

03

Enrich

Alloy applies business logic — rules, relations, and validations — only to the changed records. The result merges into a hub table that is always current.

Predictable data movement

Data moves through the platform using consistent execution patterns, eliminating one-off workflows and hidden dependencies.

Incremental by default

Change detection runs at the Mineral layer — only new and changed records flow through enrichment. No reprocessing, no waste.

Transparent operations

Because execution follows a known structure, teams can reason about pipeline behavior, troubleshoot issues, and make changes without reverse-engineering custom logic.

Designed to scale without redesign

As new pipelines are added, they inherit the same execution model, preventing complexity from compounding over time.

Declarative Logic, Embedded in the Platform

Ember is the declarative knowledge layer of DataForge. It defines how data should be shaped, validated, and interpreted in a way that is reusable, consistent, and independent of any single pipeline.

By separating logic from execution, Ember allows organizations to scale their data platforms without duplicating code or re-implementing the same transformations across pipelines.

  • Logic defined once: Transformations and rules are expressed declaratively, allowing the same logic to be reused across many pipelines without duplication or drift.
  • Column-level precision: Logic is defined at the level of individual data attributes rather than entire tables, enabling fine-grained control while keeping pipelines simple and predictable.
  • Organizational knowledge, captured: Ember encodes shared understanding about data meaning and behavior, turning business logic into a durable asset instead of tribal knowledge embedded in code.
  • Guardrails for automation: Because logic is explicit and structured, Ember provides a reliable foundation for automation and AI-assisted workflows without introducing unpredictable behavior.

How Ember drives the Alloy pipeline

Ember
Declarative Logic

Pipeline Execution

ORE
MINERAL
ALLOY
INGOT
PRODUCT

Customization, Where It Belongs

DataForge handles the vast majority of data integration and transformation needs using its enforced architecture and declarative logic. For the cases that fall outside those patterns, DataForge provides carefully defined extension points where custom logic can be introduced without breaking the surrounding architecture.

Customization is explicit, not implicit

Custom logic is only introduced in designated parts of the pipeline, preventing hidden behavior and one-off patterns from spreading across the platform.

The architecture remains intact

Even when custom processes are required, they operate within Alloy's enforced layer model and execution structure.

Declarative first, code when necessary

Automated and declarative tools are the default path. Custom code is available for exceptional cases, not as a starting point.

No framework lock-in

Extension points accept any function returning a Spark DataFrame. Existing code fits with minimal changes — no base classes, no custom decorators required.

The vast majority of DataForge implementations require no custom code at all.

Ad-hoc custom code vs. DataForge SDK

Custom logic spread across pipelines
# Pipeline A — custom pre-processing
def transform_orders(df):
df['revenue'] = df['price'] * df['qty']
return df.dropna(subset=['revenue'])
-- Pipeline B — same logic, different author
SELECT price * quantity AS revenue
FROM orders WHERE price IS NOT NULL
DataForge SDK — explicit, scoped, within the architecture
# pip install dataforge-sdk
from dataforge import IngestionSession
# Initialize for the configured source
session = IngestionSession()
# Pass a function returning a Spark DataFrame
session.ingest(
lambda: spark.read.json("s3://bucket/data/*.json")
)
→ DataForge handles CDC, enrichment, and merge

Built for Automation, Not Guesswork

Alloy's enforced architecture and Ember's declarative logic create a data platform with explicit structure and governed behavior. Pipelines follow known patterns. Logic is reusable and governed. Execution is consistent and observable.

This foundation is what makes AI-assisted automation practical. Because the platform behaves consistently, a conversational AI interface can reason about change, understand intent, and safely propose or apply updates — without introducing hallucinations or fragile workflows.

AI works when the system underneath it is designed to be understood.

Talos is DataForge's AI-powered conversational interface that translates natural language into action — operating within the guardrails defined by Alloy and Ember, not bypassing them.