July 12, 2023

By Sarah Gilbert, J. Nathan Matias, Ethan Zuckerman, Yellowmix, and James Mickens

Reddit’s recent API changes have led to a wave of protests and a great deal of consternation in the popular site’s user community. How have Reddit’s changes affected community moderators and researchers? And what could the company do differently to help its platform survive?

Last month, the Coalition for Independent Technology Research organized an open letter and survey of researchers and community moderators—with 357 individuals and organizations signing the letter and 118 participating in the survey. In this post, we report back what we learned, and how it will affect our ongoing negotiations with the company.

While Reddit’s API is used by third party developers to build apps for accessing the site, data access is also key for online safety—the site’s volunteer moderators rely on the API for information and to build tools that help protect users from spam, disinformation, hate, and harassment and researchers use it for work on issues like social media privacy, mental health treatments, child protection, COVID-19 response, and democratic discourse.

Reddit faces real challenges from free access to its API. Reddit data has been used to train large language models that underpin AI technologies, such as Chat-GPT and Bard—technologies that could erode trust and make the site harder to regulate. While Reddit’s leaders say that charging for access to Reddit’s rich data resources may help make the platform more profitable, limiting access also risks impacting the important work of two key stakeholders who rely on it, and who’s work improves the quality of the site.

In our survey, we learned the following about moderators and researchers’ use of the API:

    • API access is fundamentally about safety on the platform, through the direct work of moderators and the support for safety provided by researchers
    • Safety, accessibility, and spam management on Reddit relies on software created by moderators and researchers in the face of over a decade of under-investment by the company in content moderation— software that depends on API access
    • API disruptions are putting the careers of students and junior scholars at risk, as well as millions of dollars in grant funded research
    • Reddit has made vague promises to provide free API access to researchers and moderators. But the company’s promises fail to meet the full needs of researchers and communities, and gives Reddit the ability to block research uses
    • To date, negotiations with Reddit are stalled, and the company needs to be more responsive to community needs

What do researchers rely on API access for? 

Reddit data provides a unique source of data that has been used by hundreds of researchers from across disciplines. Respondents to our form noted that Reddit provides a source of research data that cannot be found on other platforms, notably because of its quantity, comprehensiveness, the variety of topics discussed and the depth of conversations.

Researchers accessed this data both through the API, as well as using existing archives of Reddit data, notably, Pushshift. Data archives like Pushshift provided needed functionality that the API does not provide, for example, historical Reddit data that could be queried by timeframe. The lack of historical access is particularly important for moderators, who rely on archival data to determine whether a user’s current bad behavior is part of a larger pattern.

Respondents also explained that the Reddit API limits the quantity of data that could be queried, hindering research in healthcare for at-risk groups:

“Without the kind of access pushshift provided, there is no practical way to gather the volume of data this project would need. People recovering without supports are a particularly difficult group to research, both in terms of recruiting and due to ethical challenges posed by working with members of an at-risk population who, by definition, are not engaged with formalised supports.

Qualitative research in this area is overwhelmingly retrospective, with much of the existing qualitative enquiry undertaken many years after individuals initiated recovery. This presents significant limitations when it comes to tangible insights into recovery needs and processes during early recovery stages, the time when individuals are most likely to need additional supports. Drawing data in this way would have enabled insight into these early stages with no additional risks to participants’ recovery processes. Without that access, the entire purpose of this research and any benefits it  could have provided will disappear.”   

While Reddit has promised that researchers will continue to access the API for free, respondents were concerned about interruptions caused by changes to the API, and in particular, disruptions caused by the sudden loss of Pushshift, which is no longer publicly accessible. These real and potential disruptions to access have in turn resulted in disruptions to various elements of research. For example, respondents noted that they can’t continue current, or move ahead with future research on a variety of topics, from social support and recovery, radicalization, racism, gender, social norms, empathy, and research about reddit or the development of tools to navigate reddit, or support its moderation.

Researchers were also concerned about how these disruptions could impact their careers, as well as those of their students. For example, whether or not they would be able to honour the work promised in awarded grants, their ability to collect additional data for projects, or their students’ ability to continue their research and how that might affect their ability to graduate.

What do mods rely on API access for?  

Moderators who responded to our form described using Reddit’s API for a number of purposes, such as growing and maintaining their communities. For example, respondents described using API provided data to help users find answers to their questions, to identify active users, and encourage them to continue to participate. Similarly, the ability to develop bots through the API that automatically rewards helpful users was used by one respondent to encourage positive interactions in their community.

They also described using the API to keep their communities safe. Overwhelmingly, the most commonly described use of API access was the ability to search for important context about users, such as whether or not they have a history of posting rule-violating content or engaging in harmful behavior. The ability to search for removed and deleted data allowed moderators to more quickly respond to spam, bigotry, and harassment.

Finally, moderators noted that they rely on third party apps to moderate. While Reddit has been making improvements to modding on its official app, current functionality is limited. These apps are particularly important for moderators and users who rely on screen readers, as the official Reddit app is inaccessible, in particular, it’s mod functions:

“As for some of the other uses, I work with people who are blind and low vision. The reddit native apps are not accessible. They do not work with Voiceover or other screen readers. 3rd party apps do. The native app also doesn’t zoom in well. And spoilers do not work. So 3rd party apps are necessary for any users or mods doing anything not on a computer.”

In addition to third party apps that draw from the API, moderators also described their reliance on moderator developed bots and tools that use the API, including indirectly through Pushshift data. For example, mods use bots like safestbot, which uses Pushshift, to help them keep their communities safe from trolls, and plugins, such as Moderator Toolbox, that provides critical moderation infrastructure such as easily accessible removal reasons.

Without access to the API, the site risks becoming overrun with spam and other inauthentic behavior, including campaigns to spread disinformation and propaganda,

The PushShift API also helps us understand brigades/sockpuppets/astroturfing in the instances where individuals post across various subreddits to push an agenda, deleting content quickly or automatically. PushShift allows us to see the whole picture, not just the picture the nefarious user wants us to see.

Moderators, particularly those supporting marginalized communities, anticipated difficulty keeping their members safe:

“[We use Pushshift for] Finding malicious users, frequently ones seeking to harm members of marginalized groups. It’s one thing to say “it lets us catch trolls better”, but so many trolls are innocuous that doesn’t communicate the real impact. Many of the most prolific trolls and ban evaders we deal with seek to spread hate and bigotry.”

“Easy and efficient search capabilities are crucial to identifying bad actors swiftly in order to protect our community, which is populated with a vast number of people from marginalized communities and users under the age of majority. It is vital in these uncertain political times, that we protect these users from hate speech, bullying and threats.”

Mods also described how the API helps them manage and respond to surges in participation, as well as help them proactively identify and remove harmful behavior:

“I moderate a large sports subreddit that drives tons of traffic to reddit from people who wouldn’t normally participate if not for sports. Moderating without RiF pretty much kills my ability to effectively moderate.

And with sports subs, you have thousands of users spiking into the community, drinking is involved, trolling from opposing fans is involved. All concentrated into like 3 or 4 hours of time.

It’s already a nightmare to moderate in those moments. I can’t imagine what losing these moderation tools we rely on will do. It’s going to drive mods away, which will create a toxic environment which will drive other users away.”

“We will simply be forced to take a more hands off attitude in terms of proactively searching out harmful content for removal/reporting to reddit, and there will be even longer delays in approving good content as we will need to wait for the next moderator to get to a desktop computer for the tasks that aren’t possible on the reddit app on mobile.” 

Moderation can also be unfair. Users from marginalized communities, for example, have reported that their posts have been removed at higher rates than anyone else, even when they haven’t broken any rules. API supported tools help mods make better decisions:

Pushshift helped us be fairer, more thoughtful and more consistent as moderators.  It meant that we could take the time to investigate and make better decisions rather than doing it on the spur of the moment because Reddit or the users would delete comments later.

Reddit is built on volunteer moderation labor. Moderators who do work that costs other companies millions of dollars per year are finding themselves pulling away from moderating their communities:

“I’m totally demotivated to continue moderation.  I’ve almost entirely stopped.”

“I definitely won’t be able to moderate on mobile without access to RiF [a third party mobile app]. Moderating without unddit and reveddit* (I don’t know if reveddit is gone, but it certainly doesn’t usually work) is beyond frustrating. Without toolbox, I don’t think I can. I may try it before giving up, just because I don’t like leaving things unfinished, but I’m not even sure how. Without toolbox, I don’t think I can. I may try it before giving up, just because I don’t like leaving things unfinished, but I’m not even sure how.”

This is particularly concerning, as moderators responsible for the development and maintenance of key moderation infrastructure, such as bots, services, and plugins have either announced that they are no longer able to provide essential services for the visually impaired, will pull bots down, or that they will resign from their role as moderators and tool developers.

Reddit’s Response So Far

Since we began our survey, Reddit has clarified some uncertainties: moderator developed tools for moderation purposes will not have to pay for API access. They have also set a number of goals to minimize disruption to moderating the site. For example, Reddit has negotiated moderator access to Pushshift, promised to greenlight accessibility apps and have made a number of commitments to improve accessibility in its own app. They have also shared a roadmap for the development of moderation tools.

However, there are a number of remaining uncertainties. For example, details about researcher access to the API and to Pushshift have not been shared, so it is not clear when and under what conditions researchers can access the API, or if there will be any changes to the data researchers were previously able to collect through the API.

Further, there has not always been alignment between what moderators view as a “moderation tool” and what Reddit views as a moderation tool. For example, respondents to our form viewed third party apps as moderation tools and expect to experience disruption when many of these tools are no longer available after the API changes take place. There is also the risk that current mod tools and bots will break when the change takes place, nor is it clear that the improvements outlined in the roadmap will be able to close the gap between what moderators need from mobile moderation tools and what is provided, particularly visually impaired moderators. Finally, these uncertainties, compounded by statements made by Reddit CEO Steve Huffman have had a considerable impact on the morale of volunteer moderators and their continued interest in serving their communities through Reddit’s platform.

Stalled Negotiations With Reddit

Reddit was initially very responsive when we launched the letter campaign, meeting with our group of researchers and moderators the day the letter came out. They committed to meeting with us and negotiating an approach that would protect independent research and support communities.

Unfortunately, Reddit has since limited its communication channels with researchers and moderators, making us unsure about the future of the platform. As we continue to hear silence from the company, we are actively looking for tactics that will allow us to proactively defend communities and independent research even if the company resists.

We will keep you posted as we learn more.