10/23/2014 (Updated with 2024 data)
Halloween Data Set Published for Data Visualization
I have lived in my house since 2002, and every Halloween I get inundated with trick-or-treaters. Year after year it's like Willy Wonka's Chocolate Factory at my house, with candy flying out the door. Now being a data guy, what do I do? Well that's simple, I started counting them. A friend and I keep meticulous count of the trick-or-treaters in 30 minute intervals, and starting in 2008, I've been tracking this data year after year. After a number of years of recording these numbers, I decided to use it as a data set for data visualization training. I've used this data set in my data visualization course at the University of Cincinnati, training at KPMG in their Advisory University and Global Analytics training, as well as other data visualization workshops and corporate training. At the time of updating this post, I would estimate that over 10,000 people have created a data visualization using this data. There is even a hashtag on Twitter, #HalloweenViz, where people post their visualizations.
There are a number of reasons why I really like this data set, and I've had great success using it as a teaching tool. Large data sets are very common nowadays. This very small data set forces people to be creative with the data and the story they tell. It's a fun topic; something most everyone can relate to (at least in the U.S.), and it's always interesting to see the wide range of data visualizations that are created during this exercise. I decided to make the data set public and to keep it updated.
The Data:
Yes, in 2011 I had 869 trick-or-treaters come to my house, and in 2020 during a global pandemic, I had 219 trick-or-treaters.
Instructions (typically 45-75 minutes in workshop format, small group or individual or as one assignment in class):
1.) Determine a story or goal for the visualization.
Examples:
Homeowner dashboard summarizing Halloween
Forecast future trick-or-treaters or estimate future candy need
Explore variation of the number of trick-or-treaters year by year
2.) This is a very simple data set. There are only a few years of data broken down into 4 half-hour time blocks with cumulative totals. Think broadly about the data.
Examples:
The data is time series data – any additional choices?
What comparisons can you make?
What table calculations can be made?
What additional data can be appended from other sources to help tell the story or complete an analysis?
NOTE - be very careful because there are many pitfalls at this step.
3.) Build a data visualization
Download Data sets (xlsx format):
Halloween Data Set in Excel (cross tab) (2008-2023): Download Here.
Halloween Data Set in Excel for Tableau (2008-2023): Download Here.
Additional Notes:
Numbers in data file for Excel are cumulative.
Numbers in data file for Tableau have been unpivoted and are not cumulative.
City: Cincinnati
State: Ohio
Zip Code: 45207
Neighborhood: East Walnut Hills/Evanston (being on the border likely increases the number of trick-or-treaters
The date for trick or treating has always been on 10/31 (some neighborhoods change the night for trick or treating)
The type of candy did not vary year by year. It is always a general mix of candy
Official trick or treat hours are from 6pm-8pm, but at 8pm I do not kick the children off my porch and say "no candy for you". These "stragglers" trickle down and by 8:15pm there are no more.
In most of the recent years there were a number of trick-or-treaters that showed up before 6pm. The earliest was in 2020 at 4:45pm (neighbors that wanted to come before anyone else because of COVID-19).
For 2020, because of COVID-19, I set up two long candy chutes so that trick-or-treaters could get candy from a safe distance.
Instructors who would like to use this data set for their class or workshop can contact me at Jeff@DataPlusScience.com for detailed instructor notes on this assignment/exercise.
I hope you find this data set useful. If you have any questions feel free to email me at Jeff@DataPlusScience.com