In this Connected Learning Path, we will create a Chain that illustrates how to convert an XML file to CSV. Also, we will explore how to compare datasets to isolate differences.
Primary Learning Objective | The XML Connector |
Secondary Learning Objectives | Tabular Transformation Smart Filters, Change Delimiter, and Advanced Query Commands |
Prerequisites | Configure XML Connector Connection |
Supporting Template | CLP | Accessing XML Data |
Step 1: Create a Chain
- Add a new Chain
- Name the Chain CLP | Accessing XML Data
- Create two Chain variables:
- Under Chain Variables, click the Plus Sign Twice
- XML Variable
- Name: cv-XML-Employee
- Value: https://cs-sftp-training-bucket.s3.amazonaws.com/cs-training/transformation-qs/employees.xml
- CSV Variable
- Name: cv-CSV-Employee
- Value: https://cs-sftp-training-bucket.s3.amazonaws.com/cs-training/transformation-qs/employee_detail.csv
- Save the Chain
Step 2: Retrieve Data in CSV Format
Use the HTTP Connector to retrieve headcount-related data in CSV format. In later steps, we will retrieve similar headcount information in XML format and then compare the datasets to identify differences.
- Add a GET Command from the HTTP Connector to the Start node.
- Configure the Command with the following:
Name | GET - Headcount in CSV Format |
User Name | <leave blank> |
Password | <leave blank> |
CA Certificate | <leave blank> |
Certificate | <leave blank> |
Certificate Private Key | <leave blank> |
Show Response | Checked |
URL | cv-CSV-Employee Chain Variable |
Query string | <leave blank> |
Content type | text/csv |
Response | <leave blank> |
- Save the Command
Step 3: Retrieve Data in XML Format
Use the HTTP Connector to retrieve headcount-related data in XML format. As noted in step 2, in a subsequent step, we will compare the XML dataset to the CSV dataset to identify differences.
- Add a GET Command from the HTTP Connector to the Chain.
- Connect the Start Node (GET - Headcount in CSV Format) to the GET Command.
- Configure the Command with the following:
Name GET - Headcount in XML Format User Name <leave blank> Password <leave blank> CA Certificate <leave blank> Certificate <leave blank> Certificate Private Key <leave blank> Show Response Checked URL cv-XML-Employee Chain Variable Query string <leave blank> Content type application/xml Response <leave blank> - Save the Command
Step 4: Convert the XML File to Delimited
Use the XML Connector to convert the XML data to a CSV format allowing us to leverage commands later in the Chain that expect data in a delimited format.
- Add an Element List to CSV Command from the XML Connector to the Chain.
- Connect the GET - Headcount in XML Format Command to the Element List to CSV Command.
- In the Input File parameter, select the Response Output from the GET - Headcount in XML Format Command.
- Leave the Input Text parameter blank.
- In the Path to root parameter, enter the below envelope information:
/employee_data/employee
- Check the Preview Result parameter.
- In the Columns section, click the Add button 12 times.
- For each of the columns added, enter the name of the field from the data record previewed in the previous step in both the Column Name and XPath fields.
- Example: emp_id
- ⚠️ The XPath field is case-sensitive and must match the casing of the envelope.
- In the Multi-value Delimiter parameter, enter a pipe (|).
- Save the Command
- For each of the columns added, enter the name of the field from the data record previewed in the previous step in both the Column Name and XPath fields.
- Publish, Execute, and Run the Chain.
- Review the Log tab of the Element List to CSV - All Employees Command node and confirm a list of employees is displayed.
⚠️ Do not skip the execution and results review of this step as a Copy of this Command will be created in the Chain and any errors would subsequently need to be corrected across multiple instances of the Command.
Step 5: Change the Converted XML from Tab-delimited to Comma-delimited
Use the Tabular Transformation Connector to convert the output generated from the XML conversion from a tab-delimited format to a comma-delimited format.
- Add a Change Delimiter Command from the Tabular Transformation Connector to the Chain.
- Connect the Element List to CSV - All Employees Command to the Change Delimiter Command.
- Configure the Command with the following:
Name | Change Delimiter - All Employees |
Input File | Converted File Output from the Element List to CSV - All Employees Command |
Input Delimiter | \t |
Output Delimiter | , |
- Save the Command.
Step 6: Compare the CSV and XML Datasets to Identify Differences
Use Advanced Query to systematically identify differences between the XML and native CSV data sets. In this example, identify any records in the XML dataset that are different or missing from the CSV dataset. Any additional records found in the CSV dataset that are not in the XML dataset will not be identified by this exercise.
- Add an Advanced Query Command from the Tabular Transformation Connector to the Chain.
- Connect the Change Delimiter - All Employees Command to the Advanced Query Command.
- Name the Command: Advanced Query - All Employees
- In the Tables section, click the Add button to add a second tab.
- For the first table, specify the CSV Result Output from the Change Delimiter - All Employees Command.
- Specify XML as the Table Name.
- For the second table, specify the Response Output from the GET - Headcount in CSV Format Command.
- Specify CSV as the Table Name.
- In the Query parameter, enter the following:
- For the first table, specify the CSV Result Output from the Change Delimiter - All Employees Command.
Select * from XML EXCEPT Select * from CSV
-
- In the Input Delimiter and Output Delimiter fields, specify Comma.
- Check the Preview Results option.
- Save Command
- Publish, Execute, and Run the Chain.
- Review the Log tab of the Advanced Query - All Employees Command node and confirm the employee IDs match the below for the records returned by the query.
Employee ID
01-6875791
10-4199621
38-3932553
63-0417180
78-3600453
85-6818050
97-2828467
Step 7: Add a Command Group
Add a Command Group to the Chain to better organize Commands. While this step is required in terms of this exercise, this is not a technical requirement for the Chain. This step is merely to illustrate the ability to organize nodes of a Chain.
- Add a Command Group to the Chain.
- Connect the GET - Headcount in XML Format Command to the In section of the Command Group.
- Click the Command Group and then select the dropper icon to change the Command color to red.
- Name the Group: Active Employees
Step 8: Filter CSV Data for Active Employees Only
Use a Smart Filter to keep only the employee records where the employee is active. When defining the filter, we assume that the terminated (termed) field being blank indicates that an employee is active.
- Add a Smart Filter Rows Command from the Tabular Transformation Connector to the Chain.
- Connect the Group Start from the Command Group to the Smart Filter Command.
- Edit the Command:
- Name the Command: Smart Filter Rows - CSV Data - Active Employees.
- In the Input file parameter, select the Response Output from the GET - Headcount in CSV Format Command.
- Leave the Output file parameter blank.
- Specify Comma as the Delimiter.
- In the Filters section,
- Click the Add button on the Text filters.
- In the Column name parameter, enter termed.
- Check the Case Insensitive option.
- From the Condition dropdown, select Equals.
- In the Compare Text parameter, enter true.
- Check the Not option.
- Leave the Trim checkbox unchecked. This filter will identify any record where the termed field does not contain a value of true (i.e., false or blank).
- Leave the Preview Result option Checked.
- The Inverse option allows the Smart Filter to either keep (checked) or remove (unchecked) all rows that match the filter criteria. In this exercise, we want to keep all of the rows where the termed field has a value of true or blank. To achieve this, we Check the Inverse option.
- Save the Command.
Step 9: Filter XML Data While Converting to Delimited
In this step, we filter the XML data while converting to a delimited format to identify only active employees. This dataset will subsequently be compared with the filtered CSV dataset to identify differences.
- Copy the Element List to CSV - All Employees Command.
- Connect the Group Start from the Command Group to the Element List to CSV - All Employees (Copy) Command.
- Edit the Command.
- Name the Command: Element List to CSV - Active Employees.
- In the Path to root parameter, enter the below. The format below filters based on the termed field having a value not equal to true.
/employee_data/employee[termed!='true']
- Save the Command.
Step 10: Change the Filtered XML Data to Comma-delimited
As with the full employee roster, the filtered active employee roster that has been converted from XML to tab-delimited needs to be converted to comma-delimited.
- Copy the Change Delimiter - All Employees Command.
- Connect the Element List to CSV - Active Employees Command to the Element Change Delimiter - All Employees (Copy) Command.
- Edit the Command.
- Name the Command: Change Delimiter - Active Employees.
- Modify the Input file parameter to use the Converted File Output from the Element List to CSV - Active Employees Command.
- Save the Command.
Step 11: Check for Differences with Active Employees
As with the full employee roster, the filtered active employee roster that has been converted from XML to tab-delimited needs to be converted to comma-delimited.
- Copy the Advanced Query - All Employees Command.
- Connect the Out of the Command Group to the Advanced Query - All Employees (Copy) Command. Connecting the Advanced Query - All Employees command to the Group Out ensures that everything connected to the Group Start finishes before the chain progresses to the commands in the Group Out.
- Edit the Command
- Name the Command: Advanced Query - Active Employees.
- In the Tables section, modify the File parameter for both tables:
Table | File |
XML | CSV Result Output from the Change Delimiter - Active Employees Command |
CSV | Smart Filter Row Output from the Smart Filter Rows - Active Only, CSV Data Command |
- Save the Command.
Step 12: Test the Exercise
Now that the Chain is complete, test the result.
- Publish the Chain.
- Click Execute and then select Run Chain.
- Once the Chain has completed, click the Advanced Query - Active Employees node.
- On the Outputs tab, confirm that 4 records were output.
- Click the Logs tab and confirm the employee IDs match the below for the records returned by the query.
Employee ID
01-6875791
63-0417180
78-3600453
85-6818050
To learn more about data transformation using Chains, check out the Connected Learning Paths - Transformation Introduction!