Data Transformation at Scale: Rule Templates & Cloning
Vadim Orlov, CTO of DataForge, tackles common data transformation challenges like repetitive coding and platform complexity in this video. He introduces DataForge Cloud’s rule templates and cloning features to streamline data management through a DRY (Don’t Repeat Yourself) approach.
Vadim walks through setting up data connections, creating reusable rule templates across datasets, and calculating metrics like sale prices and totals. He then demonstrates configuring an output table for reporting and, when the company adds a subsidiary, shows how the cloning feature replicates configurations for new platforms effortlessly.
This demonstration reveals how DataForge Cloud’s tools save time and centralize code management, enabling efficient, scalable, and reusable data engineering without constant rewrites.
Mastering Schema Evolution & Type Safety with DataForge
Schema changes are a common cause of pipeline failures. DataForge addresses this by focusing on type safety and schema evolution.
Type safety ensures reliable transformations through compile-time validation, preventing unexpected errors. Schema evolution automates handling of changes like new columns, data type updates, and nested structures.
With DataForge’s configurable strategies, such as upcasting and cloning, pipelines adapt smoothly to schema changes, reducing manual effort and improving reliability.
Introducing Stream Processing in DataForge: Real-Time Data Integration and Enrichment
DataForge introduces Stream Processing, enabling seamless integration of real-time and batch data for dynamic, scalable pipelines. Leveraging Lambda Architecture, users can enrich streaming data with historical insights, facilitating comprehensive real-time analytics. Key features include Kafka integration, batch enrichment, and downstream processing. This advancement simplifies real-time data management, enhances analytics capabilities, and accelerates AI/ML applications, all within a fully managed, automated platform.
Sub-Sources: Simplifying Complex Data Structures with DataForge
In DataForge Cloud 8.1, we introduced Sub-Sources, simplifying the handling of nested complex arrays (NCAs) like ARRAY<STRUCT<..>>. This feature allows you to use standard SQL syntax on NCAs without needing to normalize or modify the underlying data. Sub-Sources act as "virtual" tables, enabling easy transformations while preserving the original structure. This innovation saves time and effort for data engineers working with complex, semi-structured data.
DataForge vs. Databricks Delta Live Tables for Change Data Capture
Check out our latest video where Vadim Orlov, CTO of DataForge, compares automating Change Data Capture (CDC) in DataForge Cloud versus Databricks Delta Live Tables. Discover how DataForge simplifies CDC processes, saving time and effort with automation, and watch a live demo showcasing its efficiency in real-world use cases.
Introducing Our New Plus Subscription Plan: Elevate Your Data Engineering Capabilities
We’re excited to unveil our new Plus plan, tailored for startups and small enterprises. At just $400 per month, this plan offers a comprehensive suite of features including a dedicated DataForge workspace, up to 50 data sources, automated orchestration, and a browser-based IDE. Enjoy a 30-day free trial to experience its benefits firsthand. The Plus plan provides an excellent balance of functionality and affordability to support your data engineering needs and drive growth. Start your trial today and see how Plus can elevate your data operations!
Introduction to the DataForge Framework Object Model
Part 2 of the DataForge blog series explores the implementation of the DataForge Core framework, which enhances data transformation through the use of column-pure and row-pure functions. It introduces the core components, such as Raw Attributes, Rules, Sources, and Relations, that streamline data engineering workflows and ensure code purity, extensibility, and easier management compared to traditional SQL-based approaches.
Introduction to the DataForge Declarative Transformation Framework
Discover how to build better data pipelines with DataForge. Our latest article explores breaking down monolithic data engineering solutions with modular, declarative programming. Explore the power of column-pure and row-pure functions for more manageable and scalable data transformations.
Introducing Event Data Processing Using Kafka in DataForge Cloud 8.0
Dataforge Cloud 8.0 now enables event integration to do batch writes and reads from any Kafka topic.
How Modern LLMs Are Redefining User Interfaces
Meet Talos: Dataforge's Virtual AI Assistant, powered by advanced large language models (LLMs). Talos brings a new level of efficiency and intuitiveness to data management.
Introducing Complex Types with Extended Schema Evolution in DataForge Cloud 8.0
Introducing Complex Types with Extended Schema Evolution. Dataforge Cloud 8.0 enabled full support of struct and array complex types.
DataForge Unveils Version 8.0: Transforming Data Management
DataForge Release Version 8.0, continuing its mission to make data management, integration, and analysis faster and easier than ever.
Introducing DataForge Core: The first functional code framework for data engineering
In the fast-paced world of data engineering, agility and efficiency are paramount. However, traditional approaches often fall short, leading to convoluted pipelines, skyrocketing costs, and endless headaches for data engineers. Enter DataForge Core – a game-changing open-source framework designed to streamline data transformations while adhering to modern software engineering best practices.
DataForge Releases Version 7.1.0
DataForge is happy to announce the release of our latest version!
It is now easier than ever to transform, process, and store your data to create your dream Data Intelligence Platform.