To transform tabular data with the Data Prep connector, you first set up the sequence—or pipeline—of transformations to apply. A pipeline provides a graphical representation of its defined transformations and enables a preview of each transformation's impact.
Tip: If your chains use a common sequence of Tabular Transformation and File Utilities commands to update data from your systems of record, create pipelines to perform those transformations with a single Run pipeline command.
Requirements
Data Prep is controlled entirely at the org level and does not recognize individual workspaces or their permissions.
This means:
- Data Prep is shared among all authorized users in your org.
- Any user with access to Chain Builder also has access to Data Prep.
- All users who can create or edit chains will have the ability to manage pipelines in Data Prep.
- A single Data Prep pipeline can be used across multiple chains and workspaces within an organization.
Step 1. Create the pipeline
Tip: Before you create the pipeline, from Sample files, upload a sample file that represents the columns and data it will transform to easily define the pipeline's columns and enable a preview of the transformations applied.
- In Wdata, click Chains and Data Prep.
Note: To access Data Prep from Wdata Chains, first set up a Data Prep connector.
- From Pipelines
- For the first pipeline, click Create a pipeline.
- Otherwise, click New pipeline (+) next to the search bar.
, under Active pipelines, create the pipeline:
- Enter a name and description to help identify the pipeline.
- Click Create.
Step 2. Define the columns
To specify the fields the pipeline will interact with, define the columns of the data it transforms. When you define a column, you specify its name and the type and format of its data. For example, for a column with a Number data type, specify its decimal places and the characters used for its decimal and thousands separators.
Note: The column names defined for the pipeline can differ from the columns in the data it transforms.
To define the pipeline's columns, you can use the column definition from an uploaded sample file or a delimited file saved locally or on your network. You can also manually define columns.
To ease pipeline creation, we recommend you use a sample file to define its columns:
Note: To use a sample file, first upload it to Sample files.
- Under Define columns, click Pick from list.
- Select the sample file with the column definition to use, and click OK.
Note: The sample file's column definition will replace any columns defined for the pipeline.
- Review the column definition, and edit the columns' names as necessary.
- Click Save.
To define the pipeline's columns, you can upload a file with the same column definition.
Note: The file must be delimited and contain a header row.
- Under Define columns, click Create from file.
- Browse to and select the file with the column definition to use, and click OK.
Note: The file's column definition will replace any columns defined for the pipeline.
- Review the column definition, and edit the columns' names and data types as necessary.
Note: Be sure to review and update the column definition. The pipeline uses columns names from the file's header row and guesses data types based on the data.
- Click Save.
To manually define a column:
- Under Define columns, click Add columns.
- Select the column's data type.
- Enter a name and description to help identify the column.
- Specify the format of the column's data, based on its type:
- For a String column, select any special format, such as for universally unique identifiers (UUIDs), binary strings, email addresses, or uniform resource identifier (URI) web addresses.
- For an Integer column, select the thousands separator.
- For a Number column, enter the number of decimals places, and select the decimal and thousands separators.
- For a Date, Time, or DateTime column, select its string-from-time (strftime) format.
Note: A Binary column contains values such as True or False, or 1 or 0.
- After you define all columns, click Save.
Step 3. Set up the transformations
- To preview the transformations' impact, pin a sample file indicative of the columns and data to be transformed by the pipeline.
- Click Create transformation.
- Select the transformation to apply, and click Next.
- Set up the transformation, and click Save.
- To set up any additional transformations, click Add transformation before or after the existing transformation, based on when it should occur.
Tip: To add another instance of a transformation already in the pipeline, click its Copy, and set up the new instance as necessary.
- Adjust the transformations as necessary:
- To reposition a transformation within the pipeline, click its Move forward or Move back.
- To remove a transformation from the pipeline, click its Delete.
Note: If you move or delete a transformation, adjust any transformations that depend on its result as necessary.
Step 4. Publish the pipeline
When the pipeline is ready for use, click Publish.
After you publish the pipeline, you can use it with the Run pipeline command of the Data Prep connector to apply its transformations to tabular data within a chain.