As I was working on building a Sankey Diagram with Tableau, I found the excellent articles by Jeffrey Shaffer from Data + Science for developing a methodology to build a Sankey in Tableau.
Starting with Jeff's method, I wanted to add some interactivity on my diagram and minimize the data preparation. I just wanted to be able to build my viz using the model provided by Jeff and a dataset with Step 1, Step 2 and Step 3 along with 1 indicator (size) and use dates to perform filtering. I didn't want to use SQL or an Excel Macro to prepare the data.
I also had an ID field, but I didn't want to use it.
I managed to build my viz using Jeff's method but faced an issue regarding the performance of the viz. Showing several thousands calculated lines + colors in the same viz takes some resources, so I wanted to simplify my viz. Because I could easily get all the lines describing each flow on the diagram, I could define the upper line and the lower line of each flow. And if I could do that, I could link these lines to build a polygon. I would then have far less lines to calculate and show on my screen. So here is the challenge.
Please first download the Excel model file from Data + Science.
In the Excel sheet with the model data, we will double the whole rows and add 2 columns:
A "path" column: order in which we will link the points
A "Min or Max" column: we will tag the curve whether it is the min line or the max line.
NOTE - the order of the path should be ascending when Min and descending when Max.
Once it is done, just add the "Link" column in the dataset.
Step 2: Connect to the data (with Tableau 8.2).
I work on a Mac. Connect to the excel file containing the two sheets (data and model). Let Tableau automatically build the inner join and go to work in Tableau!
Step 3: Build the Viz.
We will first build the dynamic aggregated steps using parameters and sets. Basically, we want to show Top N Step 1 sorted by the SUM of Size:
Create a "Top N Step 1" parameter: it should be an Integer with a min and a max.
Create a "Top Step 1" set:
Right-click on "Step 1" pill.
Create a set.
Sort the set using Top-By Field-Top-Top N Step 1(parameter)-Size(indicator)-Sum
Create an "Agg Step 1" dimension:
Formula:
IF [Top Step 1] (ie. the set) THEN
[Step 1]
ELSE
'x Other'
END
I use the x to easily sort the flows alphabetically.
Repeat the operation with Step 2 in order to build a new dimension called "Agg Step 2".
These operations will enable us to choose the number of "nodes" to display in the bar charts and also to generate the positions of the flows in the Sankey.
Step 4: Build the "Sigmoid" calculated field.
Calculated Field Name: Sigmoid Function
Formula: 1/(1+EXP(1)^-[t])
Step 5: Build the "Flow Size" calculated field.
Here is where it gets interesting. We want to build flows that will start in a certain order and end divided into several lines and in another order. We will use a simple table calculation for this.
The "Flow Size" definition will be: SUM([Size])/TOTAL(SUM([Size]))
Calculated Field Name: Flow Size
Formula: SUM([Size])/TOTAL(SUM([Size]))
We then obtain the relative size of the flow compared the total sum of the indicator (in percentage). This will enable us to build a stable display of the viz.
Click on "Default Table Calculation" on the top right of the Calculated Field window as shown below.
In "Compute using" choose "Advanced." Here you can choose how you want to address the partitioning. Choose "Agg Step 1" and "Agg Step 2" in this order. Also, choose to sort the results by Fields using the sum of Size as follows:
We will use this same technique for almost all of the calculated fields.
Step 6: Build position calculated fields.
Here we want to define the starting points and the ending point of our Max Curves and our Min Curves.
Create some calculated field to define Max and Min Positions:
Calculated Field Name: Max Position 1
Formula: RUNNING_SUM([Flow Size])
Compute along Agg Step 1, Agg Step 2 (in the advanced option of the Table Calculation.)
Define the Min Position.
Calculated Field Name: Min Position 1
Formula: RUNNING_SUM([Flow Size])-[Flow Size]
Compute along Agg Step 1, Agg Step 2 (in the advanced option of the Table Calculation.)
Calculated Field Name: Max Position 2
Formula: RUNNING_SUM([Flow Size])
Compute along Agg Step 2 then Agg Step 1 (in the advanced option of the Table Calculation.)
Define the Min Position.
Calculated Field Name: Min Position 2
Formula: RUNNING_SUM([Flow Size])-[Flow Size]
Compute along Agg Step 2 then Agg Step 1 (in the advanced option of the Table Calculation.)
NOTE - The advanced options of the Table Calculation enable to define the order in which the curves will start and in which they will end.
Step 7: Build the curves calculated fields.
We will build a Max curve and a Min Curve based on Jeff's work.
Calculated Field Name: Max Curve
Formula: [Max Position 1]+(([Max Position 2]-[Max Position 1])*ATTR([Sigmoid]))
Calculated Field Name: Min Curve
Formula: [Min Position 1]+(([Min Position 2]-[Min Position 1])*ATTR([Sigmoid]))
We can now build our polygon curves.
Calculated Field Name: Curve Polygon
Formula:
CASE ATTR([Min or Max]) // remember the data we added to the model !
WHEN 'Min' THEN [Curve Min]
WHEN 'Max' THEN [Curve Max]
END
When building the viz, the "path" will allow us to join the points to draw the polygons.
Step 8: Build the Sankey.
Choose the "polygon" graph type and place the following pills in the marks:
Agg Step 1
Agg Step 2
Min or Max
Path (as a path)
Place the "T" pill in "columns" and show it as a dimension (right click-dimension).
Finally place the "Curve Polygon" in the "lines."
Some final touches:
Change transparency to 50%.
Clean the "T" axis to a fixed range of -6 to 6 and hide the axis.
Clean the "Curve Polygon" axis to a fixed range of 0 to 1.
Reverse and hide the axis.
Et voilĂ !
Step 9: Build the bar charts.
Create a calculated field:
Calculated Field Name: Step Size
Formula: SUM([Size])/TOTAL(SUM([Size]))
Do not specify table calculations.
On a new sheet, place "Step Size" into rows and "Agg Step 1" in the marks (colors or details). I chose to customize using borders.
Hide axis and repeat the operation for Step 2.
Now you can build the Sankey in a Dashboard ! After a bit of cosmetic and customization, here is the final result (click image below for Tableau Public Viz):
It is fully interactive and responsive!
I also built a long version that can be found on this link.
Sorry that it is in French as well as the Tableau file and data, but this is my native language!
Conclusion:
This method has 3 advantages:
1. Few data preparation
2. No SQL or Macro required
3. Very responsive since just showing polygons
Feel free to improve the method or email me for any questions!
Guest Blogger,
Olivier Catherin
NOTE FROM JEFF:
Thanks Olivier for this excellent work and for creating this guest post to add to our collection on building sankey diagrams in Tableau.
I hope you find this helpful. If you have any questions feel free to email me at Jeff@DataPlusScience.com