Advantages of sharing a query vs. using a connected data set
I am trying to make sure I understand the key relative benefits of using a shared query to share data between workspaces vs. using a connected data set. Here's what I came up with, but any feedback, input, corrections, or additions would be appreciated.
- Fully automatable – queries can be refreshed automatically, so that data is essentially kept always refreshed and up-to-date without human intervention. (With Connected Data Sets, users must manually click to publish and then update the data each time a change is made, or as often as updated data needs to be shared.)
- No need for a separate “publish” step – i.e., even without automating the query refresh (see the above point), with queries, one only needs to hit “refresh connection” to pull fresh data in—no need to publish it first from the source (in general). So even with a manual approach, this reduces the number of clicks / manual effort by approximately half.
- Better performance / more scalable (potentially) – Connected Data Sets have a limited number of cells / data points that they can share (currently 2 million cells), whereas queries can return much larger datasets. (That said, if a query is connected to a spreadsheet, this same limit will apply.)
- Power of SQL queries – Queries work on datasets rather than spreadsheet sections. Therefore the data that queries pull is generally more structured, tabular data; queries can apply complex logic to filter/transform/restructure records in the dataset, and parameters can be used to selectively retrieve certain types of data over others.
-
Thank you for putting this together Andrew McKenzie! These are all very fair and good points! One thing I will clarify is
"Better performance / more scalable – Connected Data Sets have a limited number of cells / data points that they can share (currently 2 million cells), whereas queries can return much larger datasets."
A connected query will have the same limitations as we are bound by how much a spreadsheet can handle. While the underlying data the query is executing on or reading on could be well over 2 million but ideally the output is going to be something more manageable.
1Thank you Isabel Messore. I made a correction/clarification in my original post. And agreed that we won't plan on throwing around 2 million values willy-nilly regardless.
0サインインしてコメントを残してください。
コメント
2件のコメント