Refresh Strategies in DataForge
Vadim Orlov Vadim Orlov

Refresh Strategies in DataForge

Discover the power of DataForge Cloud's refresh patterns to streamline your data pipelines. In this video, you'll learn about six key refresh methods: full refresh for initial dataset ingestion, append-only for incremental data updates, and advanced options like timestamp, sequence, and custom patterns for handling time-series data or unique scenarios. Watch as we demonstrate configurations, simulate dataset changes, and explore features like watermarks for tracking updates, historical data preservation, and atomic processing. Whether managing small datasets or complex time-series data, DataForge Cloud empowers you to optimize data transformations with precision and flexibility.

Read More
Engineering Choices and Stage Design with Traditional ETL
Joe Swanson Joe Swanson

Engineering Choices and Stage Design with Traditional ETL

In this demo, Joe Swanson, Co-founder and Lead Developer at DataForge, guides viewers through building a BI data model using the Coalesce ETL platform. He explains key stages of the process, such as defining data types, grouping customer data, and unpivoting item data for better reporting. Joe discusses crucial decision points, like when to use typed staging tables, group stages, or CTEs to optimize data transformations. He concludes by hinting at Part 2, where he will show how DataForge simplifies and automates these steps, making data modeling more efficient and reusable.

Read More
Data Transformation at Scale: Rule Templates & Cloning
Vadim Orlov Vadim Orlov

Data Transformation at Scale: Rule Templates & Cloning

Vadim Orlov, CTO of DataForge, tackles common data transformation challenges like repetitive coding and platform complexity in this video. He introduces DataForge Cloud’s rule templates and cloning features to streamline data management through a DRY (Don’t Repeat Yourself) approach.

Vadim walks through setting up data connections, creating reusable rule templates across datasets, and calculating metrics like sale prices and totals. He then demonstrates configuring an output table for reporting and, when the company adds a subsidiary, shows how the cloning feature replicates configurations for new platforms effortlessly.

This demonstration reveals how DataForge Cloud’s tools save time and centralize code management, enabling efficient, scalable, and reusable data engineering without constant rewrites.

Read More
Mastering Schema Evolution & Type Safety with DataForge
Vadim Orlov Vadim Orlov

Mastering Schema Evolution & Type Safety with DataForge

Schema changes are a common cause of pipeline failures. DataForge addresses this by focusing on type safety and schema evolution.

Type safety ensures reliable transformations through compile-time validation, preventing unexpected errors. Schema evolution automates handling of changes like new columns, data type updates, and nested structures.

With DataForge’s configurable strategies, such as upcasting and cloning, pipelines adapt smoothly to schema changes, reducing manual effort and improving reliability.

Read More
Introducing Stream Processing in DataForge: Real-Time Data Integration and Enrichment
Joe Swanson Joe Swanson

Introducing Stream Processing in DataForge: Real-Time Data Integration and Enrichment

DataForge introduces Stream Processing, enabling seamless integration of real-time and batch data for dynamic, scalable pipelines. Leveraging Lambda Architecture, users can enrich streaming data with historical insights, facilitating comprehensive real-time analytics. Key features include Kafka integration, batch enrichment, and downstream processing. This advancement simplifies real-time data management, enhances analytics capabilities, and accelerates AI/ML applications, all within a fully managed, automated platform.

Read More
Sub-Sources: Simplifying Complex Data Structures with DataForge
Vadim Orlov Vadim Orlov

Sub-Sources: Simplifying Complex Data Structures with DataForge

In DataForge Cloud 8.1, we introduced Sub-Sources, simplifying the handling of nested complex arrays (NCAs) like ARRAY<STRUCT<..>>. This feature allows you to use standard SQL syntax on NCAs without needing to normalize or modify the underlying data. Sub-Sources act as "virtual" tables, enabling easy transformations while preserving the original structure. This innovation saves time and effort for data engineers working with complex, semi-structured data.

Read More
DataForge vs. Databricks Delta Live Tables for Change Data Capture
Vadim Orlov Vadim Orlov

DataForge vs. Databricks Delta Live Tables for Change Data Capture

Check out our latest video where Vadim Orlov, CTO of DataForge, compares automating Change Data Capture (CDC) in DataForge Cloud versus Databricks Delta Live Tables. Discover how DataForge simplifies CDC processes, saving time and effort with automation, and watch a live demo showcasing its efficiency in real-world use cases.

Read More
Introducing Our New Plus Subscription Plan: Elevate Your Data Engineering Capabilities
Matthew Kosovec Matthew Kosovec

Introducing Our New Plus Subscription Plan: Elevate Your Data Engineering Capabilities

We’re excited to unveil our new Plus plan, tailored for startups and small enterprises. At just $400 per month, this plan offers a comprehensive suite of features including a dedicated DataForge workspace, up to 50 data sources, automated orchestration, and a browser-based IDE. Enjoy a 30-day free trial to experience its benefits firsthand. The Plus plan provides an excellent balance of functionality and affordability to support your data engineering needs and drive growth. Start your trial today and see how Plus can elevate your data operations!

Read More
Introduction to the DataForge Framework Object Model
Matthew Kosovec Matthew Kosovec

Introduction to the DataForge Framework Object Model

Part 2 of the DataForge blog series explores the implementation of the DataForge Core framework, which enhances data transformation through the use of column-pure and row-pure functions. It introduces the core components, such as Raw Attributes, Rules, Sources, and Relations, that streamline data engineering workflows and ensure code purity, extensibility, and easier management compared to traditional SQL-based approaches.

Read More
Introduction to the DataForge Declarative Transformation Framework
Matthew Kosovec Matthew Kosovec

Introduction to the DataForge Declarative Transformation Framework

Discover how to build better data pipelines with DataForge. Our latest article explores breaking down monolithic data engineering solutions with modular, declarative programming. Explore the power of column-pure and row-pure functions for more manageable and scalable data transformations.

Read More
Introducing DataForge Core: The first functional code framework for data engineering
Paula David Paula David

Introducing DataForge Core: The first functional code framework for data engineering

In the fast-paced world of data engineering, agility and efficiency are paramount. However, traditional approaches often fall short, leading to convoluted pipelines, skyrocketing costs, and endless headaches for data engineers. Enter DataForge Core – a game-changing open-source framework designed to streamline data transformations while adhering to modern software engineering best practices.

Read More
DataForge Releases Version 7.1.0
Paula David Paula David

DataForge Releases Version 7.1.0

DataForge is happy to announce the release of our latest version!

It is now easier than ever to transform, process, and store your data to create your dream Data Intelligence Platform.

Read More