Shreeya Chand
- Nov 16, 2023
- 6 min read

Analyzing Political Candidates’ Public Statements in the Age of Social Media

Our Organization: The Black Clergy Collaborative of Memphis (TN)

From April to July of 2023, my Bluebonnet Data team and I worked with the Black Clergy Collaborative of Memphis (BCCM), an organization “committed to expressing the voice of the Black Church and supporting its resurgence as a catalyst for social justice” in the city of Memphis and Shelby County, Tennessee. BCCM “helps their members to work at the intersection of theology and social justice, taking action on an agenda to address inequality in their city and county,” according to their website.

Grassroots organizations like BCCM are made up of some of the most hard-working, passionate, and inspiring activists and leaders organizing in their local communities. However, these kinds of local organizations often lack the resources to leverage advanced data and technical methods in their work. Bluebonnet Data teams like ours are able to fill that gap and provide a new set of tools (using data and technology) to these organizations that they otherwise would struggle to use or not even be aware exist. At the end of the day, however, the inspiring leaders and organizers on the ground are the ones using these tools to advocate for and make change for their communities.

The Task

One of the contributions we were able to make to BCCM’s mission was in-depth analysis using multiple sources about 2023 Memphis mayoral candidates for a voter guide, which wouldn’t have been achievable without the use of technology. We broke this task down into three smaller workstreams: voting records (for candidates that had held an office), public statements, and campaign finance. The public statements workstream broke down further into newspapers and social media. While there was significant work on the other components of the candidate analyses, my work on the mayoral candidate analysis project was focused on candidates’ social media activity, which will be my focus in this article.

Political Candidates and Social Media

In the age of social media, more and more candidates are taking advantage of platforms like Instagram and Twitter to mobilize voters and express their stances on various issues. When assessing a new candidate, their social media presence can often indicate more about their ideology and positions than anything else. Even though this was a local election, I found that many of the candidates were very active on social media and frequently expressed their stances on Memphis issues and policies, meaning these social media feeds were ripe for analysis to begin building a comprehensive profile of each of these candidates.

In the sections below, I break down methods I used to collect, aggregate, analyze, and visualize data about these mayoral candidates from Instagram and Twitter. After researching each candidate’s social media presence, I chose these two platforms due to their widespread use among the candidates.

Instagram

The challenging first step of analyzing raw social media data is collecting that data, sometimes referred to as “scraping” the data, since it’s often kept behind barriers by social media companies. Fortunately, I entered the project with previous experience scraping Instagram in particular. On a county executive campaign I interned on last year, I created a social media analytics dashboard for the campaign’s “student supporter” account using the Instagram Graph API, so I thought a similar approach would work for retrieving candidate Instagram posts in this project. However, there were some key differences that limited the usefulness of this approach: instead of looking at internal analytics for an account I had access to, I was trying to retrieve information about external accounts without being able to log in to them. This is possible with the Instagram API, but it wasn’t well-documented and would have slowed down the project too much.

I also tried using the snscrape library to fetch data from Instagram. Unfortunately, this did not work and was not well documented.

Instagram's GraphQL endpoint supports their own front-end applications and after finding some initial code online, I was able to make GET requests to it without API authentication. This method proved to be pretty intuitive, aside from all the keys there were to untangle in the json object returned by the request:

for e in r.json()['data']['user']['edge_owner_to_timeline_media']['edges']:
        if len(e['node']['edge_media_to_caption']['edges']) > 0:
            caption = e['node']['edge_media_to_caption']['edges'][0]['node']['text']

Additionally, I had some issues with being blocked by rate limiting from the Instagram GraphQL when I sent too many requests in a short period of time. As a result, it took a long time to get all of the data I wanted, so I had to run it manually multiple times, usually once for each candidate rather than all at once. Were I to be running this on many candidates or more historical data, I would use a package like this or develop a queue system to wait out these rate limits as needed.

Twitter

Several of the candidates were also active on Twitter, so I wanted to scrape candidates’ Tweets in order to provide a more complete picture of their online rhetoric. Because the Twitter API is now paid, I used the snscrape Python library, an open-source scraper for social networking services. Below is a basic look at that code:

import snscrape.modules.twitter as sntwitter
for n in range(len(df)):
    try:
        username = df.iloc[n]['Twitter']
        cand = df.iloc[n]['Name']
        # if username is present
        if '@' in username:
            for i,tweet in enumerate(sntwitter.TwitterSearchScraper('since:2012-06-23 from:'+username).get_items()):
                # omit short tweets (likely irrelevant to issues of interest)
                if len(tweet.rawContent) > 10:
                    # create each row of the dataframe, with the candidate, tweet, date, and url
                    tweets.loc[len(tweets)] = [cand,tweet.rawContent,tweet.date,tweet.url] 
    except Exception as e:
        print(e)

Analyzing the Data

Now that I had successfully scraped the relevant Instagram and Twitter data on candidate social media rhetoric, I had to analyze all this natural language data. BCCM outlined their policy areas of interest, based on which I researched relevant topics and created a list of related “keywords”. Here is an example of a list I created for the Memphis city budget policy area:

Policy issues:

Funding for public schools
Including community members in the budgeting process
Funding for healthcare
Equitable awarding/distribution of public contracts for minority and women business owners
Providing adequate social programming and youth engagement opportunities

Keywords: budget, funding, healthcare, health care, business owner, public contract, youth, social program, health

I searched for each keyword in candidates’ posts, compiling a spreadsheet of posts of interest and the policy issues they seemed to be related to. This strategy was mostly effective, but required me to manually filter some of the data where the post didn’t have anything to do with the context of the keyword. If I had more time, I would have opted to use a language model on each piece of text to determine if it was related to the policy categories of interest. A large language model like ChatGPT, or a question-answering model like deepset/roberta-base-squad2 given instructions to categorize the text of the social media post (“context” supplied to the model), would likely have been effective as well.

Visualizing the Data

One of the most important aspects of many Bluebonnet projects is presenting technical information non-technically to the client, who often has less technical knowledge. Even after creating a spreadsheet of relevant social media posts, I wanted to give BCCM and voters an easier way to understand each candidate without having to spend time going through the spreadsheet and reading candidates’ statements. I decided to create “word cloud” visualizations using the Python wordcloud library, which gave a visual explanation of the words that came up often on each candidate’s social media, with the largest words being the most frequent.

Since our speech and writing often include words–like articles and pronouns (the, it, a, …)–that add little meaning, known as “stop words,” I skipped over these words when generating the word clouds, because they would otherwise dominate the visualizations. I combined two established lists of “stop words” to filter out of the text data before creating my visualizations.

“Word cloud” of mayoral race winner Paul Young’s Instagram posts.

“Word cloud” of another candidate’s Tweets.

The second candidate’s rhetoric is much more controversial, often invoking Trump, race, and gender. It is clear that they use Twitter as a platform to mass-post controversial opinions and links to media, as shown by the “https”. Paul Young’s rhetoric, on the other hand, is much more focused on Memphis and his campaign. The word cloud also shows the campaign’s hashtag “#youngformemphis,” speaking to the role of social media as a tool and strategy in his campaign.

The client expressed that the word clouds were a great addition and could be used as a cover or lead-in to social media comments in the voter guide.

Reflection

Working with the Black Clergy Collaborative of Memphis on this project was incredibly fulfilling. I’m so grateful for this opportunity to have used my technical skills to empower voters in the Memphis mayoral election.

I would like to thank my fellow team members Paul Maschoff and Will Levinson for their support and work on this project. A huge thank you also goes to Beverly Popoola, our team lead, for her project management skills and for keeping us moving through this fast-paced cohort.

I look forward to continuing as a Bluebonnet fellow and working on more exciting projects!