Data Visualization Foundations: Color By…

We investigate how you can color your data values by dimension!

Skittles

Shareable snippets

In one of my earlier blog posts I talked about color being one of the fundamentals in data visualization, for this blog post I thought I’d talk a little bit more about how to use it practically. Especially now with Qlik Sense 3.1 as we have added some new powerful capabilities around using color for encoding additional information. Either as a measure or dimension.

So let’s start off with some practical rules, or at least design considerations.

  • Limit the amount of colors used in a visualization. Preferably no more than 10-20, but fewer is better. I try to keep it lower than 10.
  • When you color a data point based on belonging, make sure that each data point can only belong to one group.
  • Persistent colors across visualizations are great for seeing relationships.

There are of course many more things to consider, like what type of colors to use or scales, but that may be a topic for a later blog post!

Limited Colors

When you get many different data points you are often interested in using many colors to represent them. But before you do that: stop and consider the limitations we have as humans.

First off, we’re not really good at remembering many items in our short term memory. How many depends on which academic paper you read, but most likely it’s less than nine items. So by using more colors it gets harder for a user to remember the colors in the chart and match them with the colors of a legend or colors in a different chart.

The second reason is that although we humans can distinguish millions of colors it does become complicated to reliably distinguish colors from each other in a visualization and matching the colors between a legend and charts. This of course is also dependent on the saturation of the colors and if you’re using shades. So take the number limit with a grain of salt, but at least try to avoid hundreds of different colors (like what you see below). By the way, there are many great games out there to test your color perception by arranging shades of color in order.

Too many colors

In the chart above you can see that there are issues matching the color to the legend as many of these colors are very similar.

Color based on belonging

One of the great use cases for using color in a chart is to color by another dimension and to show the belonging of data points. In my previous blog post I used products in a bar chart as an example and then I colored the bars based on product category. This makes it easy for me to see which product category each product belongs to, I could also see which ones are our bestselling categories and which are worse.

Color by Dimension 

As you can see here, the two best selling products are in the men’s footwear category. But then it’s actually a steep drop down to the rest of the male clothes/footwear. For women, there are many more products near the top. Also of interest is the bath clothes category, which only has one product and it’s quite far down.

You can do this type of coloring using any dimension, but what you need to consider is that each bar can only belong to one value of the group you are coloring by. Let me show you an example:

Color by Dimension table

In this case, we see that each product can be sold in many different cities, hence if I look at sales by product and color by city there is not one unique value. So the bars would turn gray. But if I show sales by company you can see that each company only exists in one city. Hence I can use cities to color by.

Color by Dimension

So when attempting to color by a dimension, make sure the dimension you want to add to the chart doesn’t have multiple belongings.

Persistent colors

The final best practice is about persisting the colors across and within charts. With persistent colors, the color is based on the data item and therefore all charts that have that data value will have the same color. It’s great when using the same dimension across many charts on the dashboard as you will easily pick out the relationship between these data points. Below is an example with and without persistent colors. As you can see, the bar chart on the right better matches the colors of the tree map.

Persistent colors

Specifically, the bar chart on the right has the same color for the same values as the tree map, pink for the central region for example. While in the bar chart on the left, the color for the central region is blue, which is not the same as the tree map. Having the same color for data values across charts makes it easier to relate these values to each other.

That is all for now about color. There are still many topics within this area to cover, but hopefully you’ve learned a few new things!

Photo credit: greenzowie via Foter.com / CC BY-NC-ND

Share Your Comments