A pipeline can apply different types of transformations, based on how you want to impact the tabular data. Every Pipeline requires at least one (1) transformation and may be assigned a virtually unlimited number of transformations.
Note: When you set up a transformation that uses a comparison operator, such as contains
, greater than
, equals (=
), or does not equal (!=
), press ENTER after its value.
These transformations are grouped into four (4) categories:
- Transform (Technical Transformations) - modify the data layout of the payload.
- Filter - Remove rows from the data payload based on a criterion/criteria.
- Data - Sort data by specific fields and apply summation to derive data points.
- Mapping (Functional Transformations) - build relationships between data fields.
Adding transformations to a Pipeline is a very guided, but not prescriptive process. This ensures the ultimate flexibility to address your transformation needs
Note: Prior to adding transformations to your Pipeline, be sure to pin a sample file.
Requirements
Data Prep is controlled entirely at the org level and does not recognize individual workspaces or their permissions.
This means:
- Data Prep is shared among all authorized users in your org.
- Any user with access to Chain Builder also has access to Data Prep.
- All users who can create or edit chains will have the ability to manage pipelines in Data Prep.
- A single Data Prep pipeline can be used across multiple chains and workspaces within an organization.
Transform
To modify the layout of the tabular data, such as to add or remove columns or adjust their values, add a Transform transformation.
Note: When you add a column, you specify its name, the type and format of its values, and where it should appear in the tabular data.
Transformation | Description |
---|---|
Insert column | Adds a field to the dataset with a value in each row. When you set up this transformation, specify the columns's static or variable value. |
Insert row numbers | Adds a column with row numbering to the dataset. When you set up this transformation, specify the rows' starting number. |
Excel formula | Adds a column with the result of an operation performed on specified columns, based on a Microsoft Excel® formula. When you set up this transformation, specify the formula to perform. When you write the formula, refer to the columns by the their headers, such as SUM(amount1, amount 2) .
Note: Unlike in Excel, do not start with an
|
Remove columns | Removes specified columns from the dataset |
Merge columns | Adds a column with the merged values of specified columns, separated by a specified delimiter |
Rename | Renames specified columns in the dataset |
Shift column | Repositions a specified column within the dataset |
Split field value | Splits a column's values into multiple columns, based on a specified delimiter |
Keep columns | Removes all columns from the dataset except those specified |
Flip sign | Flips the sign of a specified column's values, such as from positive to negative, or vice versa |
Uppercase | Converts a specified column's values to all UPPERCASE |
Lowercase | Converts a specified column's values to all lowercase |
Find and replace | Replaces all instances of a specified text value in the dataset with a specified new value |
Group by | Aggregates the dataset into unique records based on specified columns, similar to a GROUP BY SQL statement
|
Copy column | Adds a duplicate of a specified column |
String distance | Calculates the number of edits (i.e., "distance") needed to turn one text string into the other. Also known as a Levenshtein distance transformation. Each string is limited to a maximum of 1024 characters; a maximum distance of 100 is calculated. |
Tip: To remove rows from a dataset, use a keyword—such as Ignore
—to indicate the rows to remove. Then, use the keyword as a To value in a Mapping transformation, and set up a Smart filter transformation with an OR condition to remove any mapped field with that keyword.
Filter
To remove rows from the data based on specific criteria, add a Filter transformation.
Transformation | Description |
---|---|
Exact match | Removes all rows that don't match the specified text exactly |
RegExp match | Removes all rows that don't match the specified regular expression (RegExp) |
Take rows | Removes all rows at and following the specified number. For example, 10: removes the tenth row and all subsequent rows. |
Smart filter | Removes or keeps a column's records based on specified conditions |
Data
To sort tabular data or apply summations based on specific columns, add a Data transformation.
Transformation | Description |
---|---|
Sort | Sorts the dataset based on specified columns |
Sum if | Creates an additional pipeline output with the sum of all values in a specified column |
Mapping
To transform a column's values based on rules defined by a mapping group, add a Mapping transformation:
- Select Mapping, and click Next.
- Select the mapping group with the rules to apply.
- Under Match columns, select which of the pipeline's columns map to the mapping group's.
- Click Save.
If you apply multiple Mapping transformations to a column, ensure they're in the correct order within the pipeline.
Tip: When a pipeline applies a Mapping transformation, it transforms the column's values inline. To view a column's values both before and after the transformation at the same time, apply a Copy column transformation to add a duplicate of the column before the Mapping transformation. To indicate the "before" status of the copy's values, append a prefix to its name, such as Src_
or UM_
, for Source or Unmapped, respectively. To later remove this column from the dataset, apply a Group by or Remove columns transformation.
Additional Transformation Functions
Add a Transformation
Additional transformations can be added to a Pipeline by clicking the green plus signs to the right or left of an existing transformation. Clicking the plus-sign to the left or right of an existing transformation adds a new transformation before or after it, respectively.
Copy a Transformation
Any existing transformation in the Pipeline can be duplicated by clicking the copy icon that is under the transformation. A copy of the transformation is created and the transformation form is opened. Update the transformation inputs as needed and save the copied transformation.
Delete a Transformation
Any transformation in a Pipeline can be deleted by clicking the trash can icon under the transformation. When prompted to confirm the delete, type the required text and then click the Delete button. Click the Back button on the confirmation form to cancel the deletion.
When deleting a transformation that creates a new column in the Pipeline, any subsequent transformation that utilizes the column that was created by the now-deleted transformation will be marked as invalid and the transformation icon (blue circle) will turn red to indicate that the transformation is no longer valid.
Note: Deleted transformations cannot be restored.
Reorder Transformations
The order of the transformations can be changed by clicking the left and right arrows under a specific transformation. By clicking the left arrow, the transformation is shifted to occur immediately before the transformation that previously proceeded it. The right arrow shifts the transformation to occur after the transformation that previously followed.