X is the latest social media site letting 3rd parties use your data to train AI models

Starting Nov. 15, X will share user data — including posts, likes, bookmarks and reposts — with third-party platforms that may use the information to train AI models.

More companies are signing content licensing deals with artificial intelligence firms

Jenna Benchetrit · CBC News · Posted: Oct 18, 2024 6:10 PM EDT | Last Updated: October 19, 2024

A photo taken on March 11, 2024 shows the logo of social media platform X. Starting Nov. 15, the social media site formerly known as Twitter will share user data — including posts, likes, bookmarks and reposts — with third-party platforms that may use the information to train AI models. (Kirill Kudryavstev/AFP/Getty Images)

Elon Musk's X was already using your data to train its own artificial intelligence. Soon, it'll let other companies do the same.

Starting Nov. 15, the social media site formerly known as Twitter will share user data — including posts, likes, bookmarks and reposts — with third-party platforms that may use the information to train AI models.

The company updated its privacy policy on Wednesday to detail the changes. When the policy takes effect, users are automatically opted in until they opt out.

"Depending on your settings, or if you decide to share your data, we may share or disclose your information with third parties," the updated policy reads.

"If you do not opt out, in some instances the recipients of the information may use it for their own independent purposes in addition to those stated in X's Privacy Policy, including, for example, to train their artificial intelligence models, whether generative or otherwise."

This is the latest arms race. Everyone is working towards AI supremacy.- Ritesh Kotak, cybersecurity expert

As user data becomes an increasingly valuable resource, social media platforms are sitting on a goldmine —and selling that information to artificial intelligence companies is a lucrative business.

"This is the latest arms race. Everyone is working towards AI supremacy," said Ritesh Kotak, a cybersecurity and technology analyst based in Toronto.

"The more data sets you have, the more people that are involved in that data is collected from, the more accurate your model is going to be."

Why sites such as Reddit are selling data to AI firms

The Reddit logo is seen in this illustration taken on Nov. 7, 2022. Like X, other social platforms have reportedly signed content licensing deals with AI giants, bringing in a new stream of revenue amid tough competition for advertising dollars (Dado Ruvic/Illustration/Reuters)

The change comes just a few months after X quietly shifted its privacy policy, giving itself permission to train the company's Grok chatbot on user data.

But that led to an investigation by the European Union's privacy regulator, which ended with X agreeing to stop collecting user data from that region for the purpose of training Grok.

LinkedIn has also given itself permission to train its artificial intelligence models on user data, and Meta used public Instagram and Facebook posts to train its own AI virtual assistant.

Elon Musk claims Apple's new AI tools are a privacy risk. How much of a concern are they?

Like X, other social platforms have reportedly signed content licensing deals with AI giants, bringing in a new stream of revenue amid tough competition for advertising dollars, noted Ajay Shrestha, a computer science professor at Vancouver Island University.

"The traditional processes that they have used [to] generate revenue, through advertising or through subscription methods, are not working well," said Shrestha.

The deals include:

Reddit reportedly closed one such agreement with Google this year, with Reuters reporting that the deal is worth $60 million US per year.
Stack Overflow, an online community for developers, started charging AI companies for scraping its data to train their bots last year.
Tumblr and WordPress reportedly struck a deal with generative AI companies Midjourney and OpenAI to sell user data to train their AI tools.

Some news publishers and stock image companies have made similar deals — Shutterstock's licensing business generated more than $100 million US last year, for example. Many others have sued AI giants for scraping their content without permission, or warned them against doing so.

WATCH | Why AI companies are hungry for Reddit's data:

Why AI firms are eyeing Reddit’s data, according to investment expert

12 months ago

1:09

Shane Obata, a portfolio manager with Middlefield Group in Toronto, explains what an IPO will do for Reddit and why the company could be a goldmine for artificial intelligence firms.

And what's in it for the big tech companies? Social media posts are a valuable form of data because they can convey emotion, reflecting how people actually speak and think, according to Kotak.

"Social media posts may pose very little quality content from a technical perspective or from what's going on in the world, but [they are] rich in sentimental analysis," he said.

Reddit has solid start on 1st day as publicly traded company. Here's what to know

Can you opt out?

As of Friday, X didn't appear to have updated its settings with an option to opt-out of the change in advance of the Nov. 15 start date. CBC News has reached out to the company.

"As a user, you may just not want your posts or personal information being used to train algorithms that the rest of the world is going to be able to leverage," said Kotak.

"These platforms literally making it by default that your data is going to be used to train these algorithms means that you no longer have a choice in the matter. Unless you go in and you prohibit that from happening."

Normally, users can opt out of such changes by going into settings, privacy and safety, and under the data sharing and personalization heading, toggling the "data sharing with business partners" option.

But opt-outs aren't always cut and dry, Kotak said, noting that an AI model can't necessarily unlearn the data it's been fed if a user opts out after the training has started.

"There's no way of reversing that and having any of the data that you've already put out essentially being taken out of the learning model as well," he said.

"If you're not paying for the product, you are the product. And in this case, the data is the product."

ABOUT THE AUTHOR

Jenna Benchetrit

Journalist

Jenna Benchetrit is the senior business writer for CBC News. She writes stories about Canadian economic and consumer issues, and has also recently covered U.S. politics. A Montrealer based in Toronto, Jenna holds a master's degree in journalism from Toronto Metropolitan University. You can reach her at jenna.benchetrit@cbc.ca.

CBC's Journalistic Standards and Practices·About CBC News

Corrections and clarifications·Submit a news tip·

X is the latest social media site letting 3rd parties use your data to train AI models

More companies are signing content licensing deals with artificial intelligence firms

Social Sharing

Why sites such as Reddit are selling data to AI firms

Can you opt out?

ABOUT THE AUTHOR

Related Stories