Reddit locks down its public data in new content policy, says use now requires a contract
Reddit on Thursday is rolling out a new policy aimed at balancing its desire to license its content to larger tech companies, like Google, and protecting usersâ privacy. The newly announced âPublic Content Policyâ will now join Redditâs existing privacy policy and content policy to guide how Redditâs data is being accessed and used by commercial entities and other partners. Related to this, the company also announced a subreddit dedicated to researchers working with Redditâs data.
The announcement comes shortly after Redditâs stock market debut, which sees the company positioning itself to grow revenue not only from the ads that run on its platform and API usage by developers but also from its corpus of data. The company in its IPO prospectus said it had already made $203 million through data licensing agreements and expects that number to increase over time.
While Reddit hadnât historically blocked access to its data for AI training purposes, it changed its course last year. Reddit CEO Steve Huffman told The New York Times that it didnât make sense for Reddit to continue to give âall of that value to some of the largest companies in the world for free,â signaling the companyâs plan to move into the data licensing space.
With those efforts now well underway, the new Public Content Policy will further lock down access to Redditâs data without an agreement.
âUnfortunately, we see more and more commercial entities using unauthorized access or misusing authorized access to collect public data in bulk, including Reddit public content,â Reddit writes in its blog. âWorse, these entities perceive they have no limitation on their usage of that data, and they do so with no regard for user rights or privacy, ignoring reasonable legal, safety, and user removal requests. While we will continue our efforts to block known bad actors, we need to do more to restrict access to Reddit public content at scale to trusted actors who have agreed to abide by our policies. But we also need to continue to ensure that users, mods, researchers, and other good-faith, non-commercial actors have access.â
In other words, access to Reddit data for research and other non-commercial efforts will continue, but those entities that wants to use Redditâs data for other purposes â including for AI training â will have to pay. In a graphic shared on the blog, Reddit makes this clear, saying that businesses interested in using Reddit data to âpower, augment or enhance your product for any commercial purposesâ requires a contract.
Advertisers, meanwhile, are directed to an ads API for managing campaigns and tracking their performance.
Because the company is essentially just a large website, indexable by search engines, this new policy aims to lock down Reddit content from any unauthorized collection while also respecting usersâ rights.
For instance, Reddit says that its partners will have to upload usersâ decisions to delete their content. So if users donât want their personal posts to become fodder for future AI engines, they should be able to opt out. Partners are also restricted by the new policy from using Redditâs content to identify individuals or their personal information, including for ad targeting. Partners also canât use Reddit content to spam or harass its users or to conduct âbackground checks, facial recognition, government surveillance, or help law enforcement do any of the above.â
The policy additionally restricts access to adult media and clarifies that Reddit wonât sell its usersâ personal information. The company notes also that it will never license non-public content like private messages or non-public account information, like usersâ emails or browsing history, among other things.
To help researchers who want to use Reddit data for non-commercial purposes, the company has established a new subreddit, r/reddit4researchers. The company says itâs partnering with OpenMined to also develop a program to guide and grow researchersâ collaboration with Reddit.