Data Prep overview

From Data Prep , you can create and run sequences (pipelines), as well as, create functional transformations(mapping groups).

Pipeline are a series of technical transformations to apply to tabular outputs in chains. For example, you can set up pipelines of common transformations to prepare data for or from your various systems of record.

Tip: If your chains use a common sequence of Tabular Transformation and File Utilities commands to update data from your systems of record, create pipelines to perform those transformations with a single Run pipeline command.

A pipeline can apply transformations to:

Modify the layout of data, such as to add or remove columns or adjust their values
Apply filters to remove rows from the data based on specific criteria
Sort the data or apply summations based on specific columns
Map relationships between data models based on defined rules

Mapping Groups build the relationships between data models of different enterprise systems by defining how to transform the codes/value from one system to another. They are applied within a Pipeline as a transformation step.

Mapping Groups support a range of mapping techniques that are powerful, yet simple to use, and address common or complex requirements. They are designed for users to define and update to transform and harmonize data. Mapping Groups can be shared across multiple pipelines. The interface is very Excel-like, providing a familiar and welcoming feel.

Requirements

Data Prep is controlled entirely at the org level and does not recognize individual workspaces or their permissions.

This means:

Data Prep is shared among all authorized users in your org.
Any user with access to Chain Builder also has access to Data Prep.
All users who can create or edit chains will have the ability to manage pipelines in Data Prep.
A single Data Prep pipeline can be used across multiple chains and workspaces within an organization.

Step 1. Set up a Data Prep connection

To apply a pipeline's transformations to data in a chain, you include the Data Prep connector's Run pipeline command. If you haven't already, set up a Data Prep connection. With the Data Prep connection set up, you can the open Data Prep from Wdata Chains .

Step 2. Upload sample files

From Sample files attach_file in Data Prep, upload sample files that represent the tabular data to transform to help ease pipeline creation.

Once uploaded, you can use a sample file to:

Quickly define the columns and types of data a pipeline interacts with
Preview how a pipeline or mapping transformation impacts the data

Step 3. Create groups for mapping transformations

To map relationships between data models within a pipeline, you can include Mapping transformations. From Mapping groups library_books in Data Prep, create mapping groups to define the relationships between values and how to transform values from one system to another within a mapping transformation.

Tip: To set values for a mapping transformation when the pipeline runs, set up runtime variables for the mapping group.

When you create a mapping group, you can define its rules to transform values based on an exact match, a simple pattern, or regular expression.

Step 4. Set up pipelines

A Pipeline is the collection of technical and functional transformations that are applied to data processed by Data Prep.

The technical transformations defined in a Pipeline are used to modify the data layout. Activities such as adding or removing columns, reordering columns, or inserting new columns are all examples of technical transformations.
The functional transformation is the process of building a relationship between the data models of the systems being integrated. Functional transformation is often referred to as mapping and is managed by Data Prep Mapping Groups. Mapping Groups are applied within a Pipeline as a transformation step.

To define the sequence of transformations to apply to tabular data, create pipelines from Pipelines in Data Prep.

When you create a pipeline, you:

Define the columns and types of data it interacts with, either manually or based on a sample file or uploaded delimited file
Set up the transformations to apply—in order—when the pipeline runs

Tip: To set values for a transformation when the pipeline runs, set up runtime variables for the pipeline.

Step 5. Run pipelines in chains

To apply the transformations to tabular data from an output earlier in a chain, use the Data Prep connector's Run pipeline command. When you set up the command, you:

Select the pipeline to run and the tabular output to transform
Map the tabular file's columns to the pipeline's column definition
Set any runtime variable values for the pipeline

Support

Community

Workiva Support Center