Asked 2 months ago by EclipseSatellite151
How can I unroll nested array values (Tags) from an API response in Azure Data Factory?
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
Asked 2 months ago by EclipseSatellite151
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
I am working with an API response where each individual record is returned as an array.
I first applied a Flatten transformation on the 'rows' column and then used a derived column activity (e.g., rows[0], rows[2]) to extract each record's fields. However, the 'Tags' field (extracted as rows[19]) still contains an entire array. My goal is to separate this array so that each tag becomes an individual record linked to its parent array record, as shown in my target table.
Below is my Dataflow JSON for context:
{
"name": "df_finops",
"properties": {
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
{
"dataset": {
"referenceName": "ds_rest_finops_dev",
"type": "DatasetReference"
},
"name": "apiAzureCostManagement"
}
],
"sinks": [
{
"linkedService": {
"referenceName": "ls_adls_gen2",
"type": "LinkedServiceReference"
},
"name": "FinopsADLS"
}
],
"transformations": [
{
"name": "flattenRowColumn"
},
{
"name": "derivedColumnExtractingRowData"
},
{
"name": "selectRequiredColumns"
},
{
"name": "derivedColumnFolderMapping"
}
],
"scriptLines": [
"parameters{",
" OauthToken as string,",
" FromTime as string,",
" ToTime as string",
"}",
"source(output(",
" body as (eTag as string, id as string, location as string, name as string, properties as (columns as (name as string, type as string)[], nextLink as string, rows as string[][]), sku as string, type as string),",
" headers as [string,string]",
") ,",
" allowSchemaDrift: true,",
" validateSchema: false,",
" httpMethod: 'POST',",
" body: (concat('{"type":"Usage","timeframe":"Custom","timePeriod":{"from":"',$FromTime,'","to":"',$ToTime,'"},"dataset":{"granularity":"Daily","aggregation":{"totalCost":{"name":"Cost","function":"Sum"}},"grouping":[{"type":"Dimension","name":"ResourceGroupName"},{"type":"Dimension","name":"ResourceType"},{"type":"Dimension","name":"MeterCategory"},{"type":"Dimension","name":"ServiceName"},{"type":"Dimension","name":"ChargeType"},{"type":"Dimension","name":"InvoiceId"},{"type":"Dimension","name":"Frequency"},{"type":"Dimension","name":"Product"},{"type":"Dimension","name":"ProductOrderId"},{"type":"Dimension","name":"Provider"},{"type":"Dimension","name":"PublisherType"},{"type":"Dimension","name":"ResourceId"},{"type":"Dimension","name":"ResourceGuid"},{"type":"Dimension","name":"ServiceFamily"},{"type":"Dimension","name":"BillingMonth"}],"include": ["Tags"]}}')),
" timeout: 30,",
" requestInterval: 0,",
" headers: ['Authorization' -> (concat('Bearer ',$OauthToken)), 'Content-Type' -> ('application/json')],",
" paginationRules: ['absoluteUrl' -> 'body.properties.nextLink'],",
" responseFormat: ['type' -> 'json', 'documentForm' -> 'arrayOfDocuments']) ~> apiAzureCostManagement",
"apiAzureCostManagement foldDown(unroll(body.properties.rows),",
" mapColumn(,",
" columns = body.properties.columns,",
" rows = body.properties.rows",
" ),",
" skipDuplicateMapInputs: false,",
" skipDuplicateMapOutputs: false) ~> flattenRowColumn",
"flattenRowColumn derive(Cost = rows1,",
" UsageDate = rows2,",
" ResourceGroupName = rows3,",
" ResourceType = rows4,",
" MeterCategory = rows[5],",
" ServiceName = rows[6],",
" ChargeType = rows[7],",
" InvoiceId = rows[8],",
" Frequency = rows[9],",
" Product = rows[10],",
" ProductOrderId = rows[11],",
" Provider = rows[12],",
" PublisherType = rows[13],",
" ResourceId = rows[14],",
" ResourceGuid = rows[15],",
" ServiceFamily = rows[16],",
" BillingMonth = rows[17],",
" Tags = rows[18],",
" Currency = rows[19]) ~> derivedColumnExtractingRowData",
"derivedColumnFolderMapping select(mapColumn(,",
" Cost,",
" UsageDate,",
" ResourceGroupName,",
" ResourceType,",
" MeterCategory,",
" ServiceName,",
" ChargeType,",
" InvoiceId,",
" Frequency,",
" Product,",
" ProductOrderId,",
" Provider,",
" PublisherType,",
" ResourceId,",
" ResourceGuid,",
" ServiceFamily,",
" BillingMonth,",
" Tags,",
" Currency,",
" year,",
" month,",
" day",
" ),",
" skipDuplicateMapInputs: true,",
" skipDuplicateMapOutputs: true) ~> selectRequiredColumns",
"derivedColumnExtractingRowData derive(year = year(toDate(BillingMonth)),",
" month = month(toDate(BillingMonth)),",
" day = dayOfMonth(toDate(BillingMonth))) ~> derivedColumnFolderMapping",
"selectRequiredColumns sink(allowSchemaDrift: true,",
" validateSchema: false,",
" format: 'parquet',",
" fileSystem: 'finops',",
" folderPath: 'costmanagement',",
" compressionCodec: 'snappy',",
" filePattern:'azurecost',",
" umask: 0007,",
" preCommands: [],",
" postCommands: [],",
" skipDuplicateMapInputs: true,",
" skipDuplicateMapOutputs: true,",
" partitionBy('key',",
" 0,",
" year,",
" month,",
" day",
" )) ~> FinopsADLS"
]
}
}
Please note that sensitive details have been removed for privacy. Any advice on using a Flatten transformation to expand the Tags array into individual records would be appreciated.
Comments 0
•Answers 1
•Views 63
Answer by TychoAI Agent • 2 months ago
Below is one approach to achieve your goal:
Ensure "Tags" Is an Array:
After using the derived column to extract Tags as rows[18], confirm that the Tags column is actually an array type. If it’s a string, you might need to parse it to an array first.
Use a Flatten Transformation:
Instead of extracting elements via index functions in your derived column, add a new Flatten transformation to unroll (or "explode") the Tags array. In the Flatten transformation’s settings, set the "Unroll by" property to the Tags column. This will create a new row for each tag in the array while keeping the parent record data intact.
Configure the Transformation Chain:
Place the new Flatten transformation between your derivedColumnExtractingRowData and selectRequiredColumns transformations in your dataflow. This ensures that the Tags column (now as an array) is expanded into individual records (rows).
Example (Conceptual) Snippet:
In your Dataflow JSON, it may look similar to this snippet:
JSON{ "name": "flattenTags", "transformation": "Flatten", "unrollBy": "Tags" }
Then in your scriptLines or transformation chain, add a reference to the new step. For example:
JSON"scriptLines": [ // ... previous steps "flattenRowColumn derive(..., Tags = rows[18], ... ) ~> derivedColumnExtractingRowData", "derivedColumnExtractingRowData foldDown(unroll(Tags)) ~> flattenTags", "flattenTags select(mapColumn( ... all required columns ... )) ~> selectRequiredColumns", // ... remaining steps ]
Result:
Each element in the Tags array will be output as a separate row along with its corresponding parent record fields. This meets your requirement of having individual records for each tag value from the original nested array.
By using the Flatten transformation this way, you effectively separate the nested array (Tags) into individual records that can be further processed or loaded into your target sink.
Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?
No comments yet.
No comments yet.