In this Connected Learning Path, we will take a data extract from a system and perform a number of transformations to prepare the data file for loading to another system.
| Primary Learning Objective | To highlight some of the simple, yet powerful Commands of the Tabular Transformation Connector that can be used to address common transformation needs |
| Prerequisites | Configure HTTP, File Utilities, and Tabular Transformation Connector Connections |
| Supporting Chain Template | CLP | Tabular Transformations |
Step 1: Create a Chain
- Add a new Chain
- Name the Chain: CLP | Tabular Transformations
- Create a chain variable
- Name: cv-TT-SampleData
- Value: https://cs-sftp-training-bucket.s3.amazonaws.com/cs-training/transformation-qs/sample.txt
- Save the Chain
Step 2: Retrieve Data from a Source System
Use the HTTP Connector to illustrate how to retrieve data from a data source for which a Workiva Connector does not exist.
- Add a GET Command from the HTTP Connector to the Start node
- Configure the Command with the following:
Step 3: Update the Headers in the Data File
Update the headers in the file to match the field names of the system to which the data file will be loaded after the transformations are complete. The Map Headers Command allows us to update the field name based on the index allowing the Command to be more flexible in the event that a field name changes in the future.
- Add a Map Headers Command from the Tabular Transformation Connector to the Chain
- Connect the Start Node (HTTP - Get Source System Data) to the Map Headers Command
- Edit the Command:
- Name the Command: Map Headers
- Click the Input file parameter. The Variable selection pane opens to the left. Under the Command section, expand the Get - Source System Data Command and select the Response Output
- Leave the Output file parameter blank
- Check the box to Preview results
- Leave the Delimiter as Comma since the source system data file is comma-delimited
- In the Input Headers parameter, enter numeric zero (0) through seven (7) since the file has eight fields and lists are zero-based. Be sure to press the Enter key after typing each number!
- Check the box to Use Indexes. This tells the Command that the Input Headers are based on position in the file rather than the actual column name.
- In the output headers, specify the name of the new fields to be created. Order matters as the first entry in the list will replace the header name of the first field in the source system file and so on for each of the remaining entries. Type the following headers and press the Enter key after each:
- Product
- Market
- Month
- Scenario
- Sales
- COGS
- Expenses
- Operating Income
- Save the Command
Step 4: Move Data Columns to Rows
Next, update the data file to transpose the multiple columns of data into rows to create a single data column. This operation is called an unpivot. The unpivot operation aligns the data into the expected format, a single column containing all data, of the downstream system.
- Add an Unpivot Command from the Tabular Transformation Connector to the Chain
- Connect the Map Headers Command to the Unpivot Command
- Edit the Command
- Name the Command: Unpivot - Move Measures to Rows
- Select the Map Headers Output for the Input file parameter
- The Map Headers Output is still comma-delimited so we leave the Delimiter as Comma
- In the Aggregation parameter, select Sum. When choosing Sum, any resulting rows that are created by the unpivot that are the same across all columns will aggregate the data values to a single data point.
- In the New Column Label parameter, enter Measures. This is the header for the new column that will be created and populated with the header values from the columns being unpivoted.
- In the Data Column Label parameter, enter Amount
- The Data Headers, Starting Pivot Column Name, Ending Pivot Column Name, Starting Pivot Column Index, and Ending Pivot Column Index fields are used to perform the pivot operation. Not all of these parameters need to be specified.
- If using the Data Headers parameter, then the remaining parameters should be left blank. The Data Headers parameter allows the name of the columns to be specified. This parameter is useful for fields that are not concurrent/side by side.
- The Starting/Ending Column Name parameters allow the name of the first and last column (for a concurrent range) to be specified. If using these parameters, the Data Headers parameter as well as the Starting/Ending Column Index must be left blank. If the Ending Column Name is left blank, the unpivot will be performed for all fields after (and including) the Starting Column Name.
- The Starting/Ending Column Index parameters allow the position of the first and last column (for a concurrent range) to be specified. If using these parameters, the Data Headers parameter as well as the Starting/Ending Column Name must be left blank. If the Ending Column Index is left blank, the unpivot will be performed for all fields after (and including) the Starting Column Index. We use these parameters in this exercise.
- In the Starting Pivot Column Index, enter 4. This tells the Command to pivot starting at the fifth column since the index is zero-based.
- Leave the Ending Pivot Column Index blank.
- Check the Preview results option
- Save the Command
Step 5: Update Scenario
Next, change the scenario name. This illustrates a simple, one-off transformation that lends itself well to Integration Studio. More extensive or complex renaming transformations are significantly streamlined by Data Prep.
- Add a Find and Replace Command from the Tabular Transformation Connector to the Chain
- Connect the Unpivot Command to the Find and Replace Command
- Edit the Command:
- Name the Command: Find and Replace - Budget with Plan
- In the Input file parameter, select the Pivoted Result Output from the Unpivot Command
- Leave the Output file parameter blank
- Check the box to Preview results
- The unpivoted file is still comma-delimited so specify the Delimiter as Comma
- We are going to be finding the text Budget and replacing it with the text Plan. In the Find pattern parameter, enter Budget.
- In the Match Pattern Value parameter, select Exact
- Exact means that we are looking for the entire text specified in the Find pattern parameters. To search for substrings or wildcards, we would need to use the Regular Expression (regex) pattern type.
- In the Replacement Value parameter, enter Plan. Any instance of the text Budget will be replaced with the text Plan.
- The Replace matches only parameter is not applicable to exact matches and can be left unchecked as it's the default.
- Leave the Case Insensitive option unchecked. This makes the Find and Replace operation case sensitive so that Budget will be replaced by Plan but budget will not.
- The Scenario column where the Find Pattern (Budget) is located in the fourth column. In the Columns parameter, enter a numeric three (3) since the parameter is zero-based.
- Save the Command
Step 6: Keep New York Plan Data Only
Next, keep only New York's Plan using the Smart Filter Command. This Smart Filter allows us to specify conditions based on the data that will be used to keep or delete certain rows from the data set.
- Add a Smart Filter Rows Command from the Tabular Transformation Connector to the Chain
- Connect the Smart Filter Rows Command to the Find and Replace Command
- Edit the Command:
- Name the Command: Smart Filter Rows - New York Plan Only
- In the Input file parameter, select the Find & Replace Output from the Find and Replace Command
- Leave the Output file parameter blank
- Specify Comma as the Delimiter
- In the Filters section, there is an ability to Text, Number, and Date filters. Multiple filter conditions can be specified. If applying filters to different types (Text, Number, Date), then the Operator in the upper left needs to be specified accordingly. If applying multiple conditions within a given type (e.g., Text), then the Operator for that filter type needs to be specified.
- In this exercise, we will apply Text-based filters so the global Operator can be either And or Or.
- If necessary, set the General Operator to AND
- Click the Add button on the Text filters
- In the Column name parameter, enter Scenario
- Leave Case Insensitive unchecked
- From the Condition dropdown, select Equals
- In the Compare Text parameter, enter Actual
- Leave the Not and Trim checkboxes unchecked
- This filter will be used to remove any record where the Scenario field contains a value of Actual
- Click the Add button again on the Text filters section
- Set the Text filters Operator to OR
- Select the OR combination to filter out both Actual and New York
- In the second condition, specify Market as the Column Name
- Set the Condition to Equals
- Specify the Compare Text as New York
- Finally, and check the Not option
- Set the Text filters Operator to OR
- Leave the Preview Result option Checked
- If necessary, set the General Operator to AND
- In this exercise, we will apply Text-based filters so the global Operator can be either And or Or.
- The Inverse option allows the Smart Filter to either keep (checked) or remove (unchecked) all rows that match the filter criteria. In this exercise, we want to remove all of the rows where the Scenario is equal to Actual or the Market does not equal New York. To achieve this, we uncheck the Inverse option.
- Save the Command
Step 7: Add a Year Column
Finally, add an additional column to the data set to denote the Year of the data.
- Add an Insert Column Command from the Tabular Transformation Connector to the Chain
- Connect the Insert Column Command to the Smart Filter Rows Command
- Edit the Command and configure with the following:
| Name | Insert Column - Plan Year |
| Input File | Smart Filter Row Output |
| Output File | <leave blank> |
| Preview Results | Checked |
| Deliimiter | Comma |
| Header Text | Year |
| Data Value | 2022 |
| Insert Index | 3 |
- Save the Command
Step 8: Test the Exercise
Now that the Chain is complete, test the result.
- Publish the Chain
- Click Execute and then select Run Chain
- Once the Chain has completed, click the Insert Column - Plan Year node
- On the Outputs tab, confirm that 385 records were output
- Click the Logs tab and confirm the data per the image below
Output Tab
Logs Tab
To learn more about data transformation using Chains, check out the Connected Learning Paths - Transformation Introduction!