In this Connected Learning Path, we will create a Chain that illustrates how to convert a JSON dataset with nested objects to CSV. Also, we will explore how to use a cartesian join to flatten out this nested structure.
Primary Learning Objective | JSON Connector capability for nested JSON objects |
Secondary Learning Objectives | Tabular Transformation Advanced Query Command |
Prerequisites | Configure JSON Connector Connection Configure HTTP Connector Connection |
Supporting Template | CLP | Accessing JSON Nested Objects |
Step 1: Create a Chain
- Add a new Chain
- Name the Chain: CLP | Accessing JSON Nested Objects
- Create a chain variable
- Name: cv-JSON-Donut
- Value: https://cs-sftp-training-bucket.s3.amazonaws.com/cs-training/transformation-qs/donut.json
- Save the Chain
Step 2: Retrieve JSON data
Use the HTTP Connector to retrieve headcount-related data in JSON format from a web location.
- Add a GET Command from the HTTP Connector to the Start node
- Configure the Command with the following:
Name GET - JSON Data User Name <leave blank> Password <leave blank> CA Certificate <leave blank> Certificate <leave blank> Certificate Private Key <leave blank> Show Response Checked URL cv-JSON-Donut Chain Variable Query string <leave blank> Content type application/json Response
<leave blank>
- Save the Command
Step 3: Get Un-Nested JSON Data
Use the Object to CSV Command from the JSON Connector to extract the name and type keys, which are not nested, from the JSON object.
It is important to understand the schema of the JSON dataset. A List File Content Command from the File Utilities Connector can be utilized to visualize the schema. For reference, below is the schema of the donut JSON:
Schema:
{"id":"0001","type":"donut","name":"Cake","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's Food"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]}
- Add an Object to CSV Command from the JSON Connector to the Chain.
- Connect the Start Node (GET - JSON Data) to the Object to CSV Command.
- Name the Command: Object to CSV - Name & Type.
- In the JSON Data parameter, select the Response Output from the GET - JSON Data Command.
- Leave the Input Text and Path to root parameters blank.
- Leave the Multi-value Delimiter parameter as a comma (,).
- Check the Preview Result option.
- Select Pipe for the Delimiter parameter.
- The Columns section is used to specify which elements from the JSON object will be extracted to a columnar (CSV) dataset. Click the Add button once to add two columns.
- On the first column, enter name for the Column Name and .name for the JSONPath parameters.
- On the second column, enter type for the Column Type and .type for the JSONPath parameters.
- Save the Command
Step 4: Get the List of Toppings
Use the Array to CSV Command of the JSON Connector to get the list of toppings in the nested JSON array.
- Add an Array to CSV Command from the JSON Connector to the Chain.
- Connect the GET - JSON Data Command to the Array to CSV Command.
- Edit the Command
- Name the Command: Array to CSV - Toppings.
- In the JSON Data parameter, select the Response Output from the GET - JSON Data Command.
- Leave the Input Text parameter blank.
- In the Path to root parameter, type topping (in lower case) and press the enter key. Topping should appear in a grey bubble.
- Leave the Filter parameter blank.
- Leave the default value, comma (,), for the Multi-value Delimiter parameter.
- Check the Preview Result option.
- In the Columns section, we specify the key(s) in the JSON array for which to extract the value(s) to a column in the resulting CSV.
- In the Column name parameter, enter ToppingID and in the JSONPath parameter, enter .id.
- In the Column name parameter, enter ToppingType and in the JSONPath parameter, enter .type.
- In the Delimiter parameter, select Pipe.
- Save the Command.
Step 5: Get a List of the Batters
Use the Array to CSV Command of the JSON Connector to get the list of batters in the nested JSON array.
Use two Array to CSV Commands to extract the batters and toppings because of the different nesting of each array. Using an Object to CSV Command with nested JSONPaths (e.g., .topping[*].type) would have created multi-part values which are more difficult to use in a CSV dataset.
- Add an Array to CSV Command from the JSON Connector to the Chain.
- Connect the GET - JSON Data Command to the Array to CSV Command.
- Name the Command: Array to CSV - Batters.
- In the JSON Data parameter, select the Response Output from the GET - JSON Data Command
- Leave the Input Text parameter blank.
- In the Path to root parameter, type batters (in lower case) and press the enter key. Next type batter (lower case) and press the enter key. Batters and batter should appear in this order in two grey bubbles.
- Leave the Filter parameter blank.
- Leave the default value, comma (,), for the Multi-value Delimiter parameter.
- Check the Preview Result option.
- In the Columns section, we specify the key(s) in the JSON array for which to extract the value(s) to a column in the resulting CSV.
- In the Column name parameter, enter BatterID and in the JSONPath parameter, enter .id.
- In the Column name parameter, enter BatterType and in the JSONPath parameter, enter .type.
- In the Delimiter parameter, select Pipe
- Save the Command.
Step 6: Flatten the Data
Use a cartesian join in an Advanced Query Command from the Tabular Transformation Connector to flatten the dataset. A cartesian join creates all possible combinations of the elements that we extracted using the Object to CSV and Array to CSV Commands.
- Add an Advanced Query Command from the Tabular Transformation Connector to the Chain.
- Connect each of the Object to CSV - Name & Type, Array to CSV - Batters, and Array to CSV - Toppings Commands to the Advanced Query Command.
- Name the Command: Advanced Query - Flatten JSON Object.
- In the Tables section, click the Add button twice so that there are three available tables. Complete the Tables section per the below:
File | Table Name |
Select converted file Output from the Object to CSV - Name & Type Command | Name |
Select converted file Output from the Array to CSV - Batters Command | Batter |
Select converted file Output from the Array to CSV - Toppings Command | Topping |
- In the Query parameter, enter the following query:
Select Type as dessert_type, Name as variety, BatterType, ToppingType
from Name, Batter, Topping
- Specify Pipe for the Input Delimiter and Output Delimiter parameters.
- Check the Preview results option.
- Save the Command.
Step 7: Test the Chain and Review Results
- Publish the Chain.
- Click Execute and then select Run Chain.
- Once the Chain has completed,
- Click the Advanced Query - Flatten JSON Object node and select the Outputs tab.
- Confirm the Record Count is 28
- Confirm the Record Count is 28
- Select the Logs tab.
- Confirm the data preview matches the below screenshot.
- Confirm the data preview matches the below screenshot.
- Click the Advanced Query - Flatten JSON Object node and select the Outputs tab.
To learn more about data transformation using Chains, check out the Connected Learning Paths - Transformation Introduction!