Self-service Business Intelligence,
Powered by dbt
Self-service Business Intelligence (BI) is just around the corner, right?
25 years ago, we needed the right BI tool. 10 years ago, we needed a better tool. No, we need a great tool AND data literacy! Forget it, the business users will just never get it…
From the business user perspective, can I trust this data? How is this field calculated? Is the data fresh? Where does this field come from? What does this field mean? Why is this metric different on these two dashboards? We need to consolidate to a single BI tool! We need a data catalog! One year later, two new BI tools, data catalog gathering dust…
And so the story goes…
Two Big Problems
Yes, early BI tools were difficult to use and required too much training. This has improved with modern BI, but end-user adoption of BI tools still remains abysmal. There is no single reason or solution, but there are a couple top contenders:
- Data literacy
- Clean data that people can trust
This where dbt (Data Build Tool) comes into play. For those who are not yet familiar, dbt is a data tool that handles the “T” (transformation) in the extract, load, transform (ELT) process. You can read more about it from the founders’ article years ago (it’s grown exponentially since then).
In short, dbt brings everything together under one roof, where each part creates value for the next and feeds back into the whole. We’ll get to the details of how, but first let’s break down the problems that get in the way of self-service analytics.
Problem 1: Data Literacy
OK, the elephant in the room is the simultaneously beloved and dreaded data catalog. Raise your hand if, at some point in the past, you created a data catalog in a spreadsheet. Wow, that’s a lot of hands. Now raise your hand if you have no idea where that spreadsheet is, or haven’t opened it in months. Wow, are there more hands up? How’s that even possible?
Joking aside, I think that some form of data catalog can add value in defining complex business models. Many have graduated past the spreadsheet and use metadata management software. Prukalpa Sankar has an article Active Metadata & Third-Gen Data Catalogs, but the “Data Catalog 3.0” starts to sound a lot like a full-fledged BI tool. I think that BI tools need to have a thin (or even no) semantic layer, which I cover in detail in this article. Active metadata is necessary, but it cannot introduce more tools or complexity on the stack, which is why I argue that dbt is the place for it.
One of the more common places to manage the metadata (semantic layer) is inside your BI tool. You see this from old (Business Objects, Cognos) to new (Looker, Tableau). Thick semantic layers in BI tools open up a host of issues:
- BI semantic layers are a “black box” of complex business logic that is difficult to unravel
- They lock you in to the BI vendor because migrating is a herculean effort
- If you have multiple BI products, you have to repeat everything in each
- Data engineers don’t want to touch these low-code tools
- Much of the logic should be pushed down, closer to the database layer
- There’s little or no data lineage
- Limited support for data literacy, such as markdown/Html tooltips
- Limited support for source control
- No support for tests, or CI/CD
Problem 2: Trust
The next big problem: can people trust the data? If the answer is no, then there is NO WAY that a user will adopt a BI tool. Trusting the data, however, is a hugely complex task with many points of failure. Let’s look at some:
- Where does this field come from? A person may be familiar with a field from a source system (e.g. Salesforce) or transactional database table/column, but is this measure the same as the one that they’re seeing in the BI tool?
- Is the data fresh? This rears it’s head constantly when a transformation process fails without end users being notified of stale data.
- What does this field mean? A person see’s a number and can generally infer what the field should be based on it’s business name, but a detailed explanation (maybe even the calculation expression) sure would help.
- Why is a metric different on these two dashboards? Well, BI tool 1 has a calculated expression that’s different than BI tool 2’s. Let’s get the teams together for some discovery to figure out which one is right.
dbt to the Rescue
Data teams are excited about dbt for many reasons (modularity, portability, CI/CD, documentation), and I love it because at it’s core everything is in the universal language of data. SQL, of course. But I digress, as this is not the topic of my article. I’m here to talk about how and why dbt changes the game in BI. Let’s tie everything back to data literacy and trust.
Improving Data Literacy
- Shine light on the BI “black box” — with dbt, everything is in easy to read SQL and text. It’s all in one location, with source control for history and change management. And documentation is automatic!
- Eliminate vendor lock-in — you are not locked in to dbt because everything is compiled down to SQL that you could rip out and put in other tool pipelines. It’s free and open source, so you don’t have to worry about their prices going up. Even better, you’re not locked into a single BI tool because you’ve moved the complex logic out.
- Don’t Repeat Yourself (DRY) — your transformations, tests, data catalog, and everything are in dbt. Everything is defined once, in a single source of the truth that any number of BI products can simply consume.
- No longer a chore, but a joy — data teams love dbt, period.
- The right layer — with dbt, business logic is where it should be, pushed down closer to the database layer
- Data lineage — dbt has robust data lineage. If you question the definition of a metric, you can trace the logic all the way back to the original source.
- Documentation — dbt has excellent support for data literacy with documentation features and markdown capabilities
- Source control, support for tests, and CI/CD are at the heart of dbt
- Where does this field come from? dbt’s lineage solves this by allowing you to trace and validate logic all the way back to the original source.
- Is the data fresh? dbt has robust capabilities to track and display both source data freshness, as well as freshness of transformed models
- What does this field mean? dbt has robust built-in documentation, so your efforts to give meaningful descriptions are not wasted. You can use the dbt generated docs, but even better, BI tools can consume them.
- Why is this metric different on these two dashboards? Define it in one transformation, in dbt! All content from all BI tools now feeds from a single source.
OK, so dbt addresses data literacy and trust by bringing data transformation and data cataloging under one roof and allowing 3rd party tools (e.g. BI) to integrate. Let’s take a look at how they apply to Business Intelligence, with real examples from FlexIt Analytics’ BI integration with dbt.
One of the coolest, and most important, features is dbt’s lineage, especially since it’s automatically created from your models.
In the example below, the FlexIt uses dbt lineage, allowing users to see the upstream dependencies. It also allows for detail such as tooltip descriptions, refresh status and time for models and sources, and even the ability to view/run SQL on any dependency.
Following the “headless-BI” concepts, you can define your metrics in one place (dbt) and use them everywhere, knowing that your definitions will be a single source of the truth.
Business (Friendly) Names
One of the simplest, but most necessary, part of any data catalog is giving the entities and columns “friendly” names that mean something to the business. These are then displayed to end users in the following way:
Friendly business names are often not enough to help the end users understand how the field is defined. In this case, it is often helpful for BI tools to leverage dbt’s documentation features and leverage descriptions, as such:
Report and dashboard dependencies are called exposures in dbt. Exposures allow you to bring together all the downstream usage of a model (BI, ML, or the many other tools an organization has).
In the example below, the the BI tool uses dbt lineage to show reports that are dependent upon the dim_order model, as well as status that tells you if an exposure is OK, broken, needs documentation, or is an external tool.
This is where things get really cool! The preferred method uses dbt’s relationshipstest, and serves two purposes:
- It tests referential integrity, ensuring your data is correct
- Allows BI tools to know relationships to joining entities
Dbt’s relationships test works great for well modeled data (e.g. dimensional star schema), but there may be use cases for more complex joins (multi-column, expressions other than equals, etc.). In this case, we can leverage dbt’s meta property. Here’s an example of both ways to handle joins:
Now, BI tools can simply leverage these built-in definitions from dbt to use for joins and visualizing metadata, like this:
There’s no quicker way to lose trust than stale data. dbt gives you the ability to see very granular detail, such as the freshness of a model or source, as shown here:
You can also use dbt Dashboard Tiles to place freshness “tiles” into dashboards, giving users the ability to see high-level freshness and drill into detail, if necessary.
Information such as Number of rows or Size of a table/model can be very informative and build trust in the data. If the underlying data source supports these statistics, then the BI tool can make them available.
Synonyms (aliases) allow you to assign multiple potential names to a single column. For example, sales might be called revenue, receipts, proceeds, or other names depending on who you ask. Synonyms can be used for search and Natural Language Query (NLQ).
This is not a build-in dbt feature, so we use the meta tag, as shown here:
With synonyms defined in your model, you can then search and use Natural Language Query, like this:
Further uses of dbt’s meta tag can allow BI tools to standardize on things like formatting, model/column order, hide/show, date grain, security, and more.
You can get a much more in-depth explanation of all the concepts from the dbt Integration with FlexIt learning page:
There’s also a great dbt example project that has all the code from above:
Metrics Layer (headless BI)
There’s been a lot of talk about “headless BI”, often confusing or written by products selling headless BI. The Modern Data Stack article sums it up neatly:
The metrics layer (headless BI) sits between data models and BI tools, allowing data teams to declaratively define metrics across different dimensions. It provides an API that converts metric computation requests into SQL queries and runs them against the data warehouse.
There are two things that are clear:
- Not another tool in the stack — dbt is the perfect place for this. They have metrics, and are actively building out more full featured metric layer functionality.
- Not in the BI tool — the BI tool should have a thin semantic layer and simply consume these metrics. We covered all the reasons against jamming lots of complex business logic inside BI tools (e.g. thick semantic layer), so start at the top of this article if you need a refresher.
If you skipped right to the end, then here’s the main point:
With dbt’s powerful data transformation, metrics, lineage, and data cataloging capabilities, you can use a single tool to create a single source of the truth that drives self-service analytics.
Cool, one place for everything! But what’s even more incredible is that tools can integrate with dbt and expose this single-source of the truth in seconds.
Wait, we don’t have to consolidate to one BI tool? Yep! Think about it: if dbt creates the single source of truth, and BI tools can simply plug-in to the dbt metadata, then which BI tool a data analyst is using matters much less. That’s good news, because now nobody has to meet their fate in a knife fight pinning Tableau vs. Power BI (is that Looker in the back there?).
Sure, there are other legitimate reasons for organizations to consolidate around a single BI tool. One is administration of the BI environment, but this is less of a concern with cloud offerings. Another is having a single data portal, but embedding and better API integrations are making this easier. Also, it’s good to limit the number of BI tools so that you can improve internal knowledge transfer. These are all legitimate reasons, but none are as critical as having a single source of the truth. And let’s be honest, we’ve been talking about consolidating to a single BI tool for 20 years, but the opposite has happened — more tools than ever.
Join Me in this journey!
There is still a lot of work ahead of us to achieve real data democracy in BI, but this is a huge step. Perhaps it’s not in BI vendors’ interests since it allows organizations to easily jump between tools. But I’m optimistic that many will, and think that we can make the incentives too great to pass up.
With a set of BI standards for integrating with dbt, we can enable BI tools to speak the same language. Here is a dbt Github example project that takes a first crack at setting up the framework:
I do despise hype terms that are overused for marketing purposes, but do we need a name for this? If “headless BI” is the metrics layer, then BI is the head. So the BI tool is metadata-less, metaless, semantic-less, semantic-free? Yeah, let’s go with that.
Semantic-free BI: [si-man-tik free bee-eye] (noun) a business intelligence tool that requires no built-in semantic layer
Read more about why “semantic-free” is the future of BI.
Let me know your thoughts! Reach out to Andrew Taft