Data + Science

6/13/2017
How to Create a Wheat Plot in Tableau

Yesterday, Stephen Few published his quarterly newsletter and discussed issues around jittering dot plots. He proposed a new chart type or new version of jitter (whichever you prefer). He referred to this chart as a Wheat Plot or stripogram. Steve Wexler and I traded several emails with Stephen Few about this chart prior to the newsletter and Steve Wexler created several variations with different data sets. In this post I will outline how I built the Wheat Plot in Tableau.

Note: I will be using the World Indicators data set in Tableau, but this is for demonstration purposes only. The data is from 2000-2012 and as a result, the countries are repeated in the dot plot. Therefore, this plot would not be all that useful for analysis purposes.

Building a Wheat Plot

Step 1: Build the Dot Plot

   Move Region to Columns
   Move Life Expectancy of Female to Rows
   Select Circle on the Marks card
   Move Country to Detail
   Move Region to Color
   Change the Size of the dots to make them smaller

Step 2: Create a calculated field and bin

   Calculated Field Name: index
   Formula: index()

   Right-click on Life Expectancy Female and Create Bin
   Set the Bin size to 5

   Move index to Columns
   Move Life Expectancy Female to Details

Step 3: Set calculation and sort order of index

   Right-click on index on Columns and select Edit Table Calculation
   Choose Specific Dimensions
   Move Life Exp Female (bin) up to the top of the list
   Move Country up to the second on the list
   Check the box for Life Exp Female (bin) and Country and uncheck Region

Set Restarting every to Life Exp Female (bin)
Set Sort Order to Custom Sort and select Life Expectancy Female and Ascending

You now have a Wheat Plot. Change the Size of the dots as needed. Remember, the bin size is set to 5, so the dots will restart every 5.

You can adjust the bin size up or down. Changing the bin size to 2 brings the dots much closer, similar to a unit histogram.

If you don't want a fixed width colum then another option is to set the index to discrete (right-click on index on Columns and select Discrete). This will size the column width based on the number of dots.

Is a Wheat Plot useful?

Now onto the usefulness of a Wheat Plot. Here are my general thoughts.

I found the high slopes difficult to interpret. Based on the Twitter response of Steve's newsletter, I'm guessing most people will have the same reaction. However, once it settled in for me and I interacted with the data, I did find them useful. For example, in random jitter, once you hover or select a dot, you can't easily find the neighboring dots. Which dot is immediately above or below the value you are selecting? The Wheat Plot allows you to go in order, up and down the data, seeing all of the neighboring values. That said, I worry that people will struggle with the look of these charts and how to interpret the data.

When playing with the bin size on this data set, I prefered a bin size of 2, so I think the bin size will make a big difference on these plots. This will be based entirely on the data set, so it may require iterating through bin sizes to find the best bin size.

That brings me to the data set. The data set that Stephen Few used in his example is very specific. It has very tiny differences in the values that are being plotted precisely. This isn't always the case. For example, if I plot the grades on the exam of all of the students in my data visualization class, there are many with the exact same value and the dots would plot directly on top of each other. There will be many students that have a 92% and none of them will have 91.8% or 92.3%.

Even when the data isn't plotted directly on top of each other with the exact same values, there will be times when two decimal place accuracy is not meaningful. As an example, we visualized session ratings at a conference in The Big Book of Dashboards (Chapter 3, page 59). The conference sessions were rated on a scale of 1 to 5. When this is averaged to a session, there is no meaningful difference between a session rating of 4.23 and 4.21. These ratings can be rounded and binned to one decimal place (example below). In both of these cases, I find that the random jitter works well. However, my preferred view of this type of data is often the unit histogram. In the case of the speaker rating, we can also encode size of the dot with the number of people attending the session. This adds another level of detail, for example a session that has a 4.2 with a small number of attendees vs. the same rating and a very large number of attendees. Speaker 317 not only had great ratings, but it was also one of the largest sessions that was rated.

I hope you find this information helpful. If you have any questions feel free to email me at Jeff@DataPlusScience.com

Jeffrey A. Shaffer
Follow on Twitter @HighVizAbility