From Messy Election Data to Clean Visualizations
Using Mapbox to Analyze Georgia’s 2020 Election Drop Box Distribution
If you are looking to create a visualization based on election data (that isn’t already packaged in a nice csv-esque format), you should first prepare yourself for messy data cleaning and potentially some internal screaming at how awful government websites are at displaying and formatting their data. (Just kidding on that second part. A little.)
I hope that in walking through the steps that I took during this project, you will be able to gain greater confidence in your data wrangling skills and draw some neat insights from your own clean visualizations. (To follow along, you can also open up the project’s repository to check out the code in further detail.)
Before getting further into the technical side though, here is some context for how this specific project came about.
One of the flashpoints during the 2020 general election cycle involved the accessibility of ballot drop boxes across the United States. Because of the COVID-19 pandemic, state officials expanded the use of such drop boxes to ensure that residents could quickly, safely, and securely drop off their mail-in ballots. However, some people (mostly Republicans) were concerned about the security of drop boxes, despite there being no evidence of drop box misuse, a talking point that President Trump had focused on throughout his campaign and during (and after) the election. Under the justification of security, many Republican-led states sought to restrict the drop box expansion, which many Democrats have argued to be thinly-concealed voter suppression tactics. Texas’s Republican governor, Greg Abbott, for example, controversially instituted a policy to limit each county in the state to one drop box, forcing residents to choose between traveling many miles to a single drop box that served up to 4.7 million people (as was the case in Harris County), relying on an overworked USPS to deliver their ballots on time, and risking exposure to COVID-19 that had already infected millions of Americans by voting in-person.
After the voter turnout totals and percentages were finalized for the 2020 general election, the degree to which turnout might have been depressed by broad inaccessibility to safe and reliable drop boxes was something that I wanted to explore. Georgia, in particular, was the state that had captured the most attention following November 2020, given that the balance of power in the Senate was now solely in the hands of Georgians for its January runoff elections. Georgia also was under heightened scrutiny for its voter suppression tactics in 2018, led by then-Secretary of State Brian Kemp, who was elected as governor in that same election.
In the lead-up to the January runoffs, this subject seemed like it deserved attention, as the pandemic continued to rage on and the drop box issue remained relevant. The goal of this project was to explore how turnout might have varied with election ballot drop box distribution. To do so, I wanted to visualize two data features: (1) the drop box distribution across the state and (2) the county-level turnout data.
Step 1: Assembling Data
Once we had the locations scraped and formatted on a Google Sheet (to be later converted into a csv file), we used one of Google’s geocoder extensions to convert the addresses into coordinates. There are programmatic ways of doing this with the Mapbox Geocoding API as well if you prefer this process to reside in in-house code.
In a separate turnout csv file, I also wanted to gather all of the Georgia county-level turnout data, including turnout percentage, total ballots cast, total registered voters, and total non-voting registered voters, which involved a similar scraping process to the drop box scraping. Once I had this data, I computed one more key metric, which was the average number of registered voters per county drop box. This metric is the basis for the choropleth side of the visualization, which will come into play after the coordinates are plotted.
Step 2: Plotting Coordinates
Once the drop box data was cleaned up and geocoded into a csv, the next step was to convert the data into a geoJSON format, which is just a file type that represents simple geographic features along with their non-spatial attributes. (In this case, the non-spatial attributes that I included were the drop box location name, address, and hours to indicate any time restrictions that voters may have faced.) There are a few online tools that will do this conversion, like this easy-to-use csv-to-geoJSON converter, but you can also do this programmatically, as outlined here.
Time to overlay the coordinates on a Mapbox basemap! If you don’t already have one, you should sign up for a free Mapbox account. Once you sign up, you will be able to render your favorite basemap style using one of your API access tokens. (As a computer science student, I was naturally drawn to the dark mode basemap.)
Then, using the geoJSON file of drop box coordinates and attributes, I used this guide as a base to set up custom markers for each drop box location to make them pop and easier to see (especially since these markers are going to overlay a choropleth map, as was done in the next step). When you click on each marker, you should also be able to see the attributes of each drop box, which in this case represents the information that the voter was given about the drop box as was laid out on the original Georgia drop box site.
Step 3: The Choropleth
This part is going to shade each county proportional to the average number of registered voters per county drop box metric so that when you hover over each county, you can clearly see the turnout data. This will be a bit involved because it requires generating a Mapbox tileset and customizing in Mapbox Studio, but rest assured that when you break the steps down, they primarily just involve a bunch of file and type conversions.
To associate each physical county with the turnout data, I had to join the turnout data csv with the most recent Georgia county-level shapefile. In addition to performing the join, I also needed to convert the file into a geoJSON format to act as the source for the custom Mapbox tileset. I opted to use QGIS, a free and open-source cross-platform desktop geographic information system application that supports viewing, editing, and analysis of geospatial data. QGIS makes it easy to add a shapefile and import your data, and can be joined using a common attribute (such as county FIPS codes in this case). Once joined, all you have to do is save the file as a geoJSON.
To convert the geoJSON into a Mapbox tileset, I used the Mapbox Tileset Service (MTS), which, using a set of configuration specifications set, takes geospatial data and converts it into vector tiles that can be hosted on the Mapbox application. The MTS is really easy to use, as laid out here. After the tileset is processed, a window should open up with the new tileset.
After generating the tileset, I fired up Mapbox Studio, which allows for complete customization over the data once the custom tileset layer is added over one of the default basemaps. I then specified a color palette that I wanted to use to show the differences in the average number of registered voters per county drop box, settling on six intervals. The process of actually deciding the appropriate interval classification was quite difficult because of the wide data variance. If you are looking to accurately represent your data without sacrificing aesthetic quality, you should look at your data distribution, consider different classification methods, and choose the most appropriate one. (QGIS actually has a neat way of visually experimenting with the popular interval classifications, if you want to use your joined shapefile and turnout data file from before. This is outlined in this guide’s step 12, which is what I ended up using to find my data ranges--using the Jenks natural breaks classification.)
After saving my new custom map style, I switched out the original default basemap style in my code with the url to the new style, which now showed Georgia in different shades of purple--the darker shades representing a large average of registered voters per county drop box and the opposite for the lighter shades. I also added a legend to break down the intervals that the shades represented.
Final Step: Adding Hover Interactivity
The final step is to add an interactive element when hovering over the different counties to show the full turnout data. To do so, I created an information window in the upper-right-hand corner as a container for the turnout information. To actually retrieve the information based on which county the mouse is hovering over, all I needed to do was add a listener to the mousemove event. The listener identifies the county using the custom layer (that was generated from the geoJSON-based tileset) from our style. I then displayed the corresponding turnout information on the information window.
(One additional bonus with Mapbox on an interactivity note is that there is no sacrifice in quality when you zoom in or out of a map--the level of detail is beautiful regardless of the level of granularity at which you want to explore the map. It adds to the interactive experience without having to do anything else on your end.)
And with that, here is the final live visualization!
About the Author
Riyam Zaman joined Bluebonnet Data as a data fellow, where she analyzed data to inform campaign strategy for a state-level political campaign in the November 2020 election. She enjoys investigating new ways to leverage data-driven solutions to fight against modern social issues. Riyam is currently in her final year at Rutgers University, studying computer science and political science.