11/1/2015
Using the Gender Package in R: Part 3 - Running a List of First Names from Tableau
In part 1 of this series, I discussed the basics of the Gender package in R. In part 2, I demonstrated how to leverage parallel processing to speed up the processing of names. For example, if you need to determine the gender for a list of 10,000 names. In this post I am going to discuss how to use the Tableau integration with R to run gender on a list of names, with and without parallel processing.
Step 1: Install and load gender package in R, start Rserve and connect Tableau to R
Setup the R connection in Tableau.
Under "Help" and "Settings and Performance" select "Manage R Connection"
Choose Server "localhost" and Port "6311" and click OK
Step 2: Load the list of names into Tableau
Load a list of first names into Tableau using a field named "First Name". If you want to try this out, copy and paste this short lists of first names into Tableau.
Step 3: Create a calculated field in Tableau
Calculated Field: Gender from R
Step 4: Build a Quick Viz in Tableau
Move "First Name" to Rows
Move "Gender in R" to Color
You should now have a list of names that are color coded by gender, either male or female. You could also change the shapes at this point, using the built in Tableau shapes for male and female. Below is an example, simply adjusting the colors and using those Tableau default shapes for gender.
Using Parallel Processing
We can use the same parallel processing technique demonstrated in part 2 of this series, using Tableau. After you follow the steps above, create another calculated field and then simply use this field instead of the "Gender in R" field.
Calculated Field: Gender from R Parallel Processing
Note - Parallel processing will not be useful on short lists, but if you have multicore processors and a long list of names then this approach could be very useful.