What is the most straightforward way to detect changes/differences in datasets within Workiva?
Con rispostaI am trying to design a solution for automatically retrieving ESG metric data from a source system (using the Data Management Suite) that can identify adjustment/restatement situations. I.e., historical source data may change without notice, and I want to be able to identify this during ingestion so that I can take action on it (e.g., pausing the chain, creating a task—that sort of thing).
This problem is magnified by the fact that, per the current model, restated data is not ingested directly into the program. But if we automate the ingestion of ESG metrics, any updates to these values in the source will overwrite the original values in the program unless we can identify when they are happening. Obviously we could set up a business process to try to get around this—but then there goes part of the goal for automation 😆
What I would like to be able to do is essentially a diff between two datasets, so that I could have a node in a chain based on the Template: Load Data to Wdata | Replace Dataset child chain that would compare the existing and a newly pulled version of a historical data set and simply exit if there are no changes, but if there are, it could take action accordingly (again, likely continuing as normal albeit with a notification/task triggered).
Has anyone run into any flavor of this problem, perhaps with financial data? The only solution I can think of at this time would be to have some sort of approach involving one or more staging tables and having custom compare logic in SQL—which may work fine, but I wanted to see if there was a better solution.
-
Hi Andrew. There isn't a built in diff within a Wdata Table. You could build this logic into your Chain though. Along the lines you mentioned in your post, here is a potential solution (I have not tested this so please thoroughly test if you choose a route similar to this).
Your data could be loaded into a Wdata Table and the datasets saved with names indicating the time period of the data within such as March2024.csv, April2024.csv, etc. Your Chain would then run a Query to retrieve all data from the Wdata Table that was previously loaded in with the filename = April2024.csv. It would also pull current data from the source system for April2024.csv. This would mean that the Chain has access to both the current data from the source system and historical data previously loaded into the Wdata Table for April2024 period. Next, using a Tabular Transformation Advanced Query Command, bring in both datasets as tables A and B then run a query to find the rows that exist in table A but not in table B, and vice versa. That query may look something like the following (this is only an example query):
SELECT * FROM (SELECT * FROM A EXCEPT SELECT * FROM B) UNION ALL (SELECT * FROM B EXCEPT SELECT * FROM A)
You could use a Conditional node and branch off your Chain logic as needed based on the results of the query. If the result is empty, it means that the data in table A and table B is the same and you could exit. If there are rows in the result set, it means that there are differences between the two tables. In this case, handling your flagging of the differences and replace the old dataset with the new dataset in the Wdata Table.
1Great and thanks for the input Jeff Hickey! That both answers my question and gives me some additional ideas of how to approach the problem.
0Accedi per aggiungere un commento.
Commenti
2 commenti