Reddit’s future as it approaches a stock market listing is more connected to its relationships with AI vendors like OpenAI than you might expect.
In Reddit’s IPO prospectus, which was filed with the U.S. Securities and Exchange Commission, Reddit emphasizes its potential gain from data licensing agreements with companies training AI models on its 1 billion posts and 16 billion comments.
The prospectus states, “In January 2024, we entered into certain data licensing agreements with an aggregate contract value of $203.0 million and terms ranging from two to three years. We expect a minimum of $66.4 million of revenue to be recognized during the year ending December 31, 2024 and the remaining thereafter.”
The specific AI vendors licensing data from Reddit remain unknown. However, recent reports suggest that a large, unnamed AI company, possibly Google, has signed a licensing agreement worth about $60 million on an annualized basis. OpenAI is also a potential customer, due to CEO Sam Altman’s stake in Reddit and past involvement with the company.
Reddit’s data is valuable for AI models as they learn from it to craft essays, code, emails, articles, and more. OpenAI and similar vendors scrape the web for these examples to add to their training sets. Reddit’s vast corpus of conversational data is thought to play a crucial role in training language models, which is why the company made a strategic shift to license its data for AI training purposes.
Various content producers are turning to data licensing agreements with AI vendors as chatbots like OpenAI’s ChatGPT and Google’s Gemini threaten to redirect traffic away from their platforms. As a result, AI vendors are entering into agreements to avoid lawsuits alleging wrongful use of data to train models without permission or payment, an issue that The New York Times has recently accused OpenAI of contributing to.