In my first post, you will remember that I talked about the process of choosing the right chart for your visualization. Reminder: if you need a refresher on that infographic, you can download it right here. Every chart’s selection begins with one common factor: the data. Every chart is trying to tell a story about your data, but people often run into problems trying to tell that story. Sometimes it’s incomplete, other times it’s misleading and often it’s confusing or just plain wrong.
I am hoping with this blog post we can write/right a few of those wrongs and offer beginners in data viz a crash course in how to interpret data attributes as they are building their charts. Before we dive in, I have to admit that I was inspired by my colleague Henric Cronström’s blog post from last September over on the Design Blog. He offered a great take on scales of measurement and I hope providing a few additional examples will enhance what he has already written.
When it comes to data attributes, there are two categories: quantitative data and qualitative data. Quantitative data is exactly what it sounds like: a numerical value placed on an ascending scale (i.e. I am 32 years of age, I drank two bottles of water today). Qualitative data refers to values that cannot be measured numerically, but can be described through language (i.e. I came in 3rd place at the swim meet, since I'm always on the run I prefer a laptop over a desktop). Within these two categories are a total of four subcategories as well:
Ratio (cost $10, $20, $30 or age 10 years old, 20 years old, 30 years old)
- Data you can perform arithmetic operations on (add, divide, etc)
- Example: How much do these clothes I want to buy cost if I add them all together?
Intervals (temperature -5°, 10°, 25° or time 1am, 5am, 9am)
- Data with a set value that you cannot perform all arithmetic operations on
- Example: You cannot calculate the sum of temperature during a week but you can calculate the average temperature per day and the high/low for each day.
Ordinal (size small, medium, large or position 1st place, 2nd place, 3rd place)
- Data with a fixed ranking with indeterminate distance between the values
- Example: A large soda in Sweden is very different from a large soda in the United States, but I don’t know exactly how much larger.
Nominal (sports NFL football vs. English football or computers laptop vs. desktop)
- Data where you can distinguish between values, but not order them
- Example: The term football can refer to NFL football or English football, there is no way to distinguish which one is better…I think I will leave that up to my colleagues in the US and UK!
Based on these classifications, the methods for aggregation and visualization of the data needs to adjust accordingly. For example, if you were to map car manufacturing data like the image below, and your data set included year-to-year manufacturing figures – it makes more sense to stick to an annual order. If you try to sort the values by highest value, your readers will have trouble following the order of the years (1978, 1979, 1980, etc). Ideally, ordinal data should be sorted by its order as opposed to the alphabetical sorting of the names in the values (if you were mapping month-by-month for example).
There is much more to cover but hopefully this post offers a basic guideline to help you determine what type of data you are trying to visualize. In my next installment of the Three Pillars of Data Series, I will address visual encoding and how to determine what markers to use in order to accurately display these data attributes.
This is just one example of how to classify data attributes and there are more advanced ones out there that may be even better to use. For example, it’s hard to classify data that is calculated in percentage. But I still believe this post is a good start and easy to remember. So now you can start to think about the data and what you can do, but also what you shouldn’t do! Just following some of these guidelines will get rid of some basic mistakes in your visualizations.
In the next post I will also show how we can use the step of classifying the data to better select the appropriate method to represent the data.
For more detailed reading on data attributes, I would recommend:
Mosteller, Frederick & Tukey, John W. (1977) Data analysis and regression. A second course in statistics ch.5 Addison-Wesley Series in Behavioral Science: Quantitative Methods, (Reading, Mass.: Addison-Wesley)
P.F.Velleman & L.Wilkinson (1993) "Nominal, Ordinal, Interval, and Ratio Typologies are Misleading" The American Statistician (1993), vol.47 no.1 pp.65-72