Data Cloud DX (September 2023)

As I said in my Dreamforce 2023 post I’ve been working on Data Cloud producing a demo. This has been my first real “in anger” exposure to Data Cloud (I’ve been on a couple of Data Cloud workshops previously).. So how has Data Cloud been to develop on? Read on to find out.

Before I describe the DX a (very) short overview of Data Cloud might be in order. This is still pretty new to the Salesforce ecosystem and most developers will not yet have played with it. Data Cloud is a Data Lake/Data Warehouse product (Salesforce like the term Data Lakehouse) that enables the ingestion of data from multiple sources on and off platform, the harmonisation of this data and actions to be taken based on insights pulled from the data. All at masive scale (trillions of records) and near-real-time performance.

Raw data ingested from any source will be stored in a Data Lake Object (DLO). These can be mapped to Data Model Objects (multiple DLOs can map to one DMO) and Data Transforms can manipulate the DLOs to map in additional fields, aggregate, filter etc. DLOs represent the Data Lake. DMOs are more like a Data Warehouse layer that is the business facing model of the data. Calculated Insights can run on the DMOs to produce windowed insights with dimensions and measures. Data Actions allow integration back to Salesforce orgs posting Platfrom Events whenever a DMO or CI records is written.

Now that we understand Data Cloud at a very high level how is it for a developer? Well the first thing to note is that its currently pretty hard for your average developer to get a Data Cloud enabled org. Certainly, as of today, you can’t just spin up a scratch org with a feature enabled.

The second general thing to note is that for iterative development Data Cloud is not great. Developers are used to writing a bit of code, pusing it to an org, trying it out, changing the code and repeating this loop multiple times. In a normal Salesforce org this works reasonably well (although could be faster here too). In Data Cloud this cycle is slow. Really slow. There is no real way to do something right now. Pretty much everything is actually scheduled and run in batches. So when you click a “Refresh Now” button what you are really saying is “schedule this for a Refresh soon”. So if you need to ingest some data, run a data transform, then a Calculated Insight you could be waiting around for half an hour.

This is all made worse by the lack of dependency based scheduling. There is no way of saying “when any of these data streams are completed run this data transform” for instance. Most processes can be scheduled to run at specific times. Which might suffice for production (although I have my doubts) but in development this means you start so,e ingests, refresh the screen until done, then run the transform, refresh that screen until done etc. This is a very frustrating way to work!

In the next post I’ll talk about the specific frustrations with the Data Transform editor/process…

Previous
Previous

Road To All Star Ranger

Next
Next

A Year As an EV Owner