Analyzing an unset Custom Dimension in Google Analytics

So this was an interesting problem to run in to. Here’s the premise: I had a website set up with a handful of Custom Dimensions in Google Analytics for a variety of purposes. One of which was used to categorize pages by the client’s business units. Since their business units were not a one-to-one mapping to their website’s product taxonomy, having a Custom Dimension hit-scoped for pageviews and events made reporting life much easier.

So, we go ahead and expose that data based on tags set in the CMS, and expose them to the data layer on the front side. So far, so good. However, two issues surrounding this: 1.) not all pages had the tags set in the CMS yet, and 2.) we were implementing this post-site launch, so plenty of data had already been collected without this Custom Dimension.

For issue #1, since those tags were not set, our GTM implementation simply wasn’t sending any data over for the Custom Dimension, which, makes sense. For #2, obviously any data before this Custom Dimension was set up, no value for that Custom Dimension would exist.

Why not setting Custom Dimensions matters

However, the thing to note about Custom Dimensions is that if there is no value sent, the hit will include no data for that dimension, not even the (not set) value we’re used to seeing in other standard dimensions. Again, usually not an issue, unless you for some reason need to report pages or events with a value set vs those with no value set, then this is a problem.

To get around the fact that we were missing that particular Custom Dimension for some pages and for some time period, we had simply created an “OR” filter in Google Data Studio that was like:

Page RegExp Contains .*(page1|page2|page3).*
OR
Business Unit (Custom Dimension 1) Equal to (=) Business Unit 1

With the assumption that our RegEx page rule would capture the pages we know that should be included but aren’t properly tagged.

However… this didn’t work. It turns out the issue with this is that since we had that Custom Dimension as part of the filter, it would ONLY look at pages that had something set for that Custom Dimension, even though we had an OR statement in the filter.

For example, if we had 1000 unique pages in our reporting, and 500 of them matched that RegEx filter, but only 100 pages had a value for the Custom Dimension, the filter would only look within those 100 pages, ignoring the other 900 pages on the site, not even evaluating if they’d match the RegEx expression.

You can demonstrate this in practice by creating a table with a Page filter using RegEx, a table filtering just on the Custom Dimension, then a third table that filters on both Page (via RegEx) OR the Custom Dimension, and look at the number of results returned:

An example in Data Studio.

The first table is simply filtering all Pages by a RegEx filter, the second one just looking for a Custom Dimension, and the third is filtering by the same RegEx filter OR the Custom Dimension. While, of course, some of those pages could also be tagged with the same Custom Dimension, we’d expect the far right table (the OR filter) to have at least the same amount of pages and pageviews as the far left table (the Page RegEx).

Notice the Grand total is the same, but the number of rows (pages) is not.

It just so happens that if I also create a new table with Page and the Custom Dimension in the same table, and filter that by just the Page RegEx filter, we get this odd match up of the same exact pageview total as the Page-only filter, but with only rows of pages showing where a Custom Dimension was set.

Anyways, the moral of the story here is, this is going to produce unreliable results, and if you’re not careful, you may be reporting on metrics that aren’t really what you’re expecting.

So, what do I do about it?

Well, first off, for Custom Dimensions that are going to be used in such a way to filter all pages/events/etc, consider setting a default value for when there is not information to pass, such as “(not set)” or “Unknown”. Now this only helps you going forward, but is good to implement and keep in mind.

If you run into a case where you need to combine or report on Custom Dimensions that we’re never set, try creating a Blended Data set by blending your Google Analytics data source to itself.

Since the Blend Data feature in Google Data Studio uses a Left Outer Join, if we include the Dimensions and Metrics we want to report on the left-most version of our source data, and then simply the Custom Dimension in question on the right side, it’ll work as you’d expect!

Blending Google Analytics onto itself.

You can visualize the difference by recreating the same OR table we did earlier, but with this new Blended Data set. Also, notice if you add in the Custom Dimension to look at values, you’ll suddenly see “null” values for pages that did not appear before.

Same dates, dimensions, metrics, and filters, but with the regular Google Analytics source compared to the blended source.
Notice the “null” values. Those are not present in the un-blended Google Analytics data.

So there you have it. The results of a frustrating multi-day trial and error period for me to figure out how to report on all of this data in one place. Now, it’s not perfect, for instance, because you can only add so many Dimensions and Metrics into a Blended Data source.

Perhaps an edge-case, but thought I’d write it up anyway, in the hopes of helping someone else who may come across this problem of trying to analytics un-set Google Analytics Custom Dimensions!

Leave a Reply

Your email address will not be published. Required fields are marked *