Science

Everything you've ever posted publicly on Facebook has probably been harvested. So what?

There's "a huge gulf" between what users expect when they give information to Facebook versus the reality — in this case, the mass scraping and aggregating of public profile data without their knowledge.

It may be public data, but it's a gold mine for marketers, spammers, fraudsters and cyber criminals alike

Joel Kjellgren, data centre manager, walks in one of the server rooms at the new Facebook Data Centre in Lulea, Sweden. At some point during the past few years, a person or group probably landed on your Facebook page and gathered the data that was public. (Jonathan Nackstrand/AFP/Getty Images)

Mark Zuckerberg didn't mince words during a phone call with reporters last week: If you had ever publicly posted something to Facebook, there's a pretty good chance that information was now in someone else's hands.

Put another way, at some point during the past few years, a person or group probably landed on your Facebook page. They might have found you by searching for the phone number or email address tied to your account — assuming you had that feature enabled, which Zuckerberg said most users did. But because you probably weren't friends with the person who had landed on your page, they could only see the information you had chosen to publicly share.

For some people, this isn't much — maybe a name, a photo, where you work, live or went to school. But for others — especially those who hadn't bothered or known to check their privacy settings in some time, if ever — there might have been a whole lot more data up for grabs, data you might reasonably consider private or personal. Your phone number, your email address, the events you've attended, the pages you've liked, the groups you're a part of, photos of your kids, even your comments and posts — all potentially are there for the taking if you hadn't chosen to shield that data from outside eyes.

The impact of this is simple: There are now people, groups or companies — "malicious actors," to use Facebook's words — that, through an automated process called scraping are likely in the possession of massive amounts of public Facebook profile data.

Ethical users can't do this stuff. You can't just writ large scrape personal information from Facebook- Fenwick McKelvey, assistant professor, Concordia University

That data may not be as personal or private or detailed as what a developer with Facebook's permission to access  your account could get — a developer like Aleksandr Kogan, who built a quiz app that mined people's profiles and those of their friends. Kogan is the Cambridge data scientist who later shared that data with Cambridge Analytica.

But it's valuable data nonetheless that can be sold, shared, processed — and used without our consent.

"It's less about you personally," says Sarah Roberts, an assistant professor of information studies at the University of California, Los Angeles, who also studies social media. "It's more about you as a part of a massive, at scale, data hoovering — the sucking up of all the data which can in turn be re-constituted, sliced and diced, and manipulated, and can then have repercussions back on you in a way you can't even perceive. And that's where it becomes disturbing."

"In fact it's your total lack of control and understanding that should worry you," Roberts said.

'It's your total lack of control and understanding that should worry you,' says Sarah Roberts of what happens to your Facebook data. (Associated Press)

Experts had warned Facebook for years that the ability to look up profiles using phone numbers or email addresses could be abused. Someone could use a program to run through a list of every possible phone number, or pull from a list of email addresses stolen in yet another high-profile hack, and see what Facebook profiles matched up, they said. Yet Facebook maintained it had systems in place to prevent such automated abuse.

Those systems were apparently not good enough — and it took the Cambridge Analytica scandal for Facebook to finally realize the extent to which the feature was being abused.

"Given the scale and sophistication of the activity we've seen, we believe most people on Facebook could have had their public profile scraped in this way," Facebook's chief technology Mike Schroepfer wrote in a blog post announcing their intent to shut down the feature that makes it so easy to look up people's profiles using a phone number or email address.

On the conference call, Zuckerberg didn't elaborate on what specifically the company found in recent weeks that convinced it to act now..

It was an unexpected about-face for a company typically loathe to talk about the mishandling of user data, said Fenwick McKelvey, an assistant professor at Concordia University who researches the workings of online social media platforms.

"The fact that Facebook would admit this seems to me that it's more important than perhaps we understand it to be as people on the outside," McKelvey said.

Sheryl Sandberg, chief operating officer of Facebook speaks at the WSJD Live conference in Laguna Beach in 2016. Facebook is taking criticism for not taking its 'great responsibility' seriously enough. (Mike Blake/Reuters)

'It's a treasure trove'

Scraping — the act of systematically trawling the web's public pages and copying some or all of the data — isn't inherently bad. It's how Google is able to index and rank web pages in its search results, for example, or how the Internet Archive is able to store historical copies of web pages as they change.

But where things get ethically and even legally murky, experts say, is when personally identifiable information is potentially involved, and those people have no knowledge any scraping might occur — let alone the ability to give consent.

"Ethical users can't do this stuff," McKelvey explained. In Canada, he said, "you can't just writ large scrape personal information from Facebook, because that could be seen as computer misuse."

That clearly hasn't stopped everyone, because, despite the measures Facebook put in place, they still found that malicious actors were abusing the system to look up profiles en masse. To what end isn't clear.

Roberts says the most obvious reason someone would want to scrape public Facebook profiles for personal information is for marketing and demographic reasons. "It's a treasure trove," she explains. The information may not mean much on an individual level, but becomes more useful when combined with other sources of data — much the same way Facebook itself builds advertising profiles of its users by combining its own data with third party sources.

A Facebook like button is pictured at the Facebook's France headquarters in Paris. You post your name, hometown and birthday and aggregators can link it to a voters list or other public information. (Benoit Tessier/Reuters)

"You can imagine how a combination of name, hometown, plus images, plus birthday, you could possibly start matching those data to other kinds of public records," says Alex Hanna, an assistant professor at the University of Toronto's faculty of information, using voter records as one example.

And there are more malicious uses, too. Harvested public contact information could be used to send people spam. Attackers could open credit cards in your name or commit other acts of identity fraud. Or they could cross-reference your Facebook data with previously leaked usernames and passwords from other apps and services, in an attempt to compromise your other accounts.

Scary? Potentially, yes. But it really depends on how much data a user left publicly exposed. "I'd say for the large majority of people it's probably not going to be an issue," Hanna said.

From public to private and everything in-between

It's not just "malicious actors" to blame here, but Facebook too, say experts such as McKelvey and Roberts. Facebook's ever-changing, difficult-to-understand privacy settings have only added to the confusion around what is public or private by default — especially in the social network's earlier years.

Much of that tension can be traced back to 2009, when Facebook started to make the transition from a private social network to something more public — first, by allowing users to make their profiles publicly accessible, and then by making new posts public by default.

Users are confused about Facebook's privacy settings and the company's repeated changes to its settings haven't helped. (Toby Melville/Reuters)

Those changes, and the ones that followed — and Facebook's repeated attempts to explain them to confused users — "are tells that this is something that perhaps is quite complicated" to understand, McKelvey says. It's no wonder some users, even today, leave so much data publicly exposed.

Roberts says there's still "a huge gulf" between what users expect when they give information to Facebook versus what ends up being the reality — in this case, the mass scraping and aggregating of public profile data without their knowledge. "Would that decision have been different for that person if they would have understood the likelihood of that happening?" Roberts asks.

"We might also argue it's unethical on the part of the platform to [...] not make those terms of engagement clear."

ABOUT THE AUTHOR

Matthew Braga

Senior Technology Reporter

Matthew Braga is the senior technology reporter for CBC News, where he covers stories about how data is collected, used, and shared. You can contact him via email at matthew.braga@cbc.ca. For particularly sensitive messages or documents, consider using Secure Drop, an anonymous, confidential system for sharing encrypted information with CBC News.