Let me begin by saying, wow! Participating in the 2014 Tableau IronViz competition was an incredible experience. The Tableau conference was amazing, and I was able to meet so many friends in person whom I only knew from social media platforms. The IronViz stadium was equally amazing. Five large screens, 1,000 people watching at full-capacity for the room, and two amazing vizzers to compete against (and really nice guys too!). Lights, music, smoke, and blazingly fast viz action. I've included a commentary below if anyone is interested, but you're probably more interested in seeing my viz, so here it is (or at least what it would have been with my lost 3 minutes back - details below). Be sure to move the mouse around the roulette wheel!
First let me describe the data. We were using data extracted from Yelp. Over 1 million rows and around 1/2 gigabytes of data. It took nearly 20-30 seconds just to load the data into Tableau. The data itself was very messy. The description given to us was "data from Las Vegas, NV and Phoenix, AZ from Yelp from 2004 to 2014". It was a bit more complicated than that because the data was very messy.
Problems with data
There were more states in the data set than just NV and AZ. There were actually 18 states in the file, some of which are clearly wrong: AZ, CA, EDH, ELN, FIF, GA, KHL, MA, MLN, MN, NC, NTH, NV, NY, ON, SCB, WI, XGl. The first step in my data cleaning was to filter out everything except NV.
Dates - we were given data from 2004-2014. However there were only a few points in 2004 and no consistent volume in 2005. The reviews on Yelp don't really pick up until 2007. For the purposes of my analysis I filtered from 2012-2014. This gave me the bulk of the reviews and a good size of data to work with that was most relevant (assuming more reviews in recent years are more useful).
There were misspellings of the restaurant names that caused major analysis issues. For example, "Zeffirino's" was listed as the only 5 star restaurant on the Las Vegas Strip (with 5 reviews). However, there is another record for "Zeffirino" in the database. That record has the same address, but a different geocoding and neighborhood (welcome to the real-world of data analysis). The bigger issue was that this location had 191 reviews and only 3.5 stars. In other words, at first glance Zeffirino's was the only 5 star restaurant on the strip. If you didn't examine the data closely, you would have come to the wrong conclusion. This was just one of many traps in the data. To resolve this, I filtered for the neighborhood equal to the "The Strip" and added a filter for rating excluding 5, dropping the Zeffirino records completely.
For those in attendance, you witnessed my Tableau crash, but the software didn't actually "crash". I was moving so fast at the beginning that I filtered for "NY" by mistake instead of "NV" on my very first filter. This was immediately apparent when I was working with the neighborhood field 2 minutes later, so I quickly went back to the filter to adjust it. That's where things went screwy. For some reason, I couldn't filter out NY and get NV back in. It should have been two clicks, but I tried a few times, clicking and unclicking, selecting all and deselecting, and it wasn't working correctly. Being 2 minutes into the viz, I was worried about trying to debug it (of course it worked perfectly in my hotel room later that night, so maybe I was just moving too fast or had made some other mistake along the way). I made a decision in the moment to just start over. In hindsight, I would have been better off using the wonderful unlimited "back" in Tableau because it would have saved me about 20 seconds loading the massive workbook again.
After the "crash", a few people in the audience were yelling "save" during the competition and even my great souz-vizzer, Michael Kovner, came over and said "Control S". The problem was that because of the size of the data, it took about 45 seconds to save anything. So each vizzer knew intimately well that there were a few things NOT to do and trying to save the workbook during the competition was most certainly one to avoid. One option would have been to save the twb file without the data, but we weren't given the original data file to connect to; we were only given a twbx file to start with. At this point I was just trying to work as fast as possible. The other thing we all avoided was using the text field of the reviews. The text field was so big (the primary reason the file is 1/2 gig), that trying to do any calculations to parse that field, search for words, or use it without filtering it way down was just too costly on time.
Along those same lines, Ryan Sleeper (one of the judges), mentioned in his final comments that I could have used Tableau instead of Excel to create the data for the roulette wheel. He is completely correct. However, I chose Excel in this case because of speed. The time it was taking to do the extra calculations in Tableau for 60 records was too long. It was much easier to just copy and paste rows in Excel and then simply paste the finished data back into Tableau. A few people asked if the Roulette Wheel would spin. Michael and I actually discussed this and explored the idea of using a parameter to "spin the wheel" in some manner. I wasn't happy with anything I explored, so we decided to leave it as a Hover option, over both the wheel and the board. When you only have 20 minutes, 17 in my case after the crash, every second counts. So we had to make some design decisions on that one to make sure I could finish in 20 minutes.
Even with the 17 minutes I had, I was really close to where I wanted to be. There were some formatting things and a few dashboard actions, but overall I was nearly done. In the end John Mathis won the IronViz with a great visualization "Reviewing the Reviewers". Congrats John!
Below is a side-by-side comparison. On the left side is where I was when the time clock ran out in the IronViz contest. The right side shows the final version of where I was heading with it in the last few minutes.
I would like to thank all the community support. There were a number of Tableau Zen Masters, many others from the terrific Tableau community, and a great group of family and friends that were cheering me on, tweeting, texting, calling, and emailing. I even had someone approach me out of the blue at the airport on the way home and said, "Are you the professor from the IronViz? I voted for you. You should have won." Thank you all for your kind words and encouragement! Hopefully I'll see you guys in Vegas next year! If you go to any of the restaurant picks be sure to let me know how it was.
I hope you enjoy the viz. If you have any questions feel free to email me at Jeff@DataPlusScience.com