[BreachExchange] What Really Caused Facebook's 500M-User Data Leak?

Destry Winant destry at riskbasedsecurity.com
Thu Apr 8 10:46:07 EDT 2021


https://www.wired.com/story/facebook-data-leak-500-million-users-phone-numbers/

SINCE SATURDAY, A massive trove of Facebook data has circulated
publicly, splashing information from roughly 533 million Facebook
users across the internet. The data includes things like profile
names, Facebook ID numbers, email addresses, and phone numbers. It's
all the kind of information that may already have been leaked or
scraped from some other source, but it's yet another resource that
links all that data together—and ties it to each victim—presenting
tidy profiles to scammers, phishers, and spammers on a silver platter.

Facebook's initial response was simply that the data was previously
reported on in 2019 and that the company patched the underlying
vulnerability in August of that year. Old news. But a closer look at
where, exactly, this data comes from produces a much murkier picture.
In fact, the data, which first appeared on the criminal dark web in
2019, came from a breach that Facebook did not disclose in any
significant detail at the time and only fully acknowledged Tuesday
evening in a blog post attributed to product management director Mike
Clark.

One source of the confusion was that Facebook has had any number of
breaches and exposures from which this data could have originated. Was
it the 540 million records—including Facebook IDs, comments, likes,
and reaction data—exposed by a third party and disclosed by the
security firm UpGuard in April 2019? Or was it the 419 million
Facebook user records, including hundreds of millions of phone
numbers, names, and Facebook IDs, scraped from the social network by
bad actors before a 2018 Facebook policy change, that were exposed
publicly and reported by TechCrunch in September 2019? Did it have
something to do with the Cambridge Analytica third-party data sharing
scandal of 2018? Or was this somehow related to the massive 2018
Facebook data breach that compromised access tokens and virtually all
personal data from about 30 million users?

In fact, the answer appears to be none of the above. As Facebook
eventually explained in background comments to WIRED and in its
Tuesday blog, the recently public trove of 533 million records is an
entirely different data set that attackers created by abusing a flaw
in a Facebook address book contacts import feature. Facebook says it
patched the vulnerability in August 2019, but it's unclear how many
times the bug was exploited before then. The information from more
than 500 million Facebook users in more than 106 countries contains
Facebook IDs, phone numbers, and other information about early
Facebook users like Mark Zuckerburg and US secretary of Transportation
Pete Buttigieg, as well as the European Union commissioner for data
protection, Didier Reynders. Other victims include 61 people who list
the "Federal Trade Commission" and 651 people who list "Attorney
General" in their details on Facebook.

You can check whether your phone number or email address were exposed
in the leak by checking the breach tracking site HaveIBeenPwned. For
the service, founder Troy Hunt reconciled and ingested two different
versions of the data set that have been floating around.

“When there’s a vacuum of information from the organization that’s
implicated, everyone speculates, and there's confusion,” Hunt says.

“They’re kind of stuck now, because they apparently didn’t do any
disclosure or notification.”

The closest Facebook came to acknowledging the source of this breach
previously was a comment in a fall 2019 news article. That September,
Forbes reported on a related vulnerability in Instagram's mechanism to
import contacts. The Instagram bug exposed users’ names, phone
numbers, Instagram handles, and account ID numbers. At the time,
Facebook told the researcher who disclosed the flaw that the Facebook
security team was “already aware of the issue due to an internal
finding.” A spokesperson told Forbes at the time, “We have changed the
contact importer on Instagram to help prevent potential abuse. We are
grateful to the researcher who raised this issue." Forbes noted in the
September 2019 story that there was no evidence the vulnerability had
been exploited, but also no evidence that it had not been.

In its blog post today, Facebook links to a September 2019 article
from CNET as evidence that the company publicly acknowledged the 2019
data exposure. But the CNET story refers to findings from a researcher
who also contacted WIRED in May 2019 about a trove of Facebook data,
including names and phone numbers. The leak the researcher had learned
about was the same one TechCrunch reported on in September 2019. And
according to the September 2019 CNET story, it is the same one CNET
was describing. Facebook told TechCrunch at the time, “This data set
is old and appears to have information obtained before we made changes
last year [2018] to remove people’s ability to find others using their
phone numbers.” Those changes were aimed at reducing the risk that
Facebook's search and account-recovery tools could be exploited for
mass scraping.

Data sets circulating in criminal forums are often mashed together,
adapted, recombined, and sold off in different chunks, which can
account for variations in their exact size and scope. But based on
Facebook's comment in 2019 that the data TechCrunch reported on was
from mid-2018 or earlier, it seems not to be the currently circulating
data set. The two troves also have different attributes and numbers of
users impacted in each region. Facebook declined to comment for the
September 2019 CNET story.

If all of this feels exhausting to sort through, it's because Facebook
went days without giving a substantive answer and has left open some
degree of confusion.

“At what point did Facebook say, ‘We had a bug in our system, and we
added a fix, and therefore users might be affected’?" says former
Federal Trade Commission chief technologist Ashkan Soltani. “I don't
remember ever seeing Facebook say that. And they’re kind of stuck now,
because they apparently didn’t do any disclosure or notification."

Before its blog acknowledging the breach, Facebook pointed to the
Forbes story as evidence that it publicly acknowledged the 2019
Facebook contact importer breach. But the Forbes story is about a
similar yet seemingly unrelated finding in Instagram versus main
Facebook, which is where the 533-million-user leak comes from. And
Facebook admits that it did not notify users that their data had been
compromised individually or through an official company security
bulletin.

The Irish Data Protection Commission said in a statement on Tuesday
that it “received no proactive communication from Facebook" regarding
the breach.

“Previous data sets were published in 2019 and 2018 relating to a
large-scale scraping of the Facebook website, which at the time
Facebook advised occurred between June 2017 and April 2018 when
Facebook closed off a vulnerability in its phone look-up
functionality," according to the timeline the commission put together.
"Because the scraping took place prior to GDPR, Facebook chose not to
notify this as a personal data breach under GDPR. The newly published
data set seems to comprise the original 2018 (pre GDPR) data set and
combined with additional records, which may be from a later period.”

Facebook says it did not notify users about the 2019 contact importer
exploitation precisely because there are so many troves of semipublic
user data—taken from Facebook itself and other companies—out in the
world. Additionally, attackers needed to supply phone numbers and
manipulate the feature to spit out the corresponding name and other
data associated with it for the exploit to work, which Facebook argues
means that it did not expose the phone numbers itself. “It is
important to understand that malicious actors obtained this data not
through hacking our systems but by scraping it from our platform prior
to September 2019,” Clark wrote Tuesday. The company aims to draw a
distinction between exploiting a weakness in a legitimate feature for
mass scraping and finding a flaw in its systems to grab data from its
backend. Still, the former is a vulnerability exploitation.

But for those affected, this is a distinction without a difference.
Attackers could simply run through every possible international phone
number and collect data on hits. The Facebook bug provided bad actors
with the missing connection between phone numbers and public
information like names.

Phone numbers used to be public in phone books and often still are,
but as they've evolved to be ubiquitous identifiers, linking you to
different parts of your digital life, they've taken on new
significance and potential value to attackers. They even play a role
in sensitive authentication, by being the path through which you might
receive two-factor authentication codes over SMS or a phone call in
which you provide information to confirm your identity. The idea that
phone numbers are now critical to your digital security is not at all
new.

“It's a fallacy to think that a breach isn't serious just because it
doesn't have passwords in it or other maximally sensitive data,” says
Zack Allen, director of threat intelligence at the security firm
ZeroFox. “It's also a fallacy to say that a situation isn't that bad
just because it's old data. And furthermore, phone numbers scare the
crap out of me as a form of authentication, which unfortunately is how
they're often used these days.”

For its part, Facebook has repeatedly mishandled user phone numbers.
They used to be easily collectible on a large scale through the
company's Graph Search API tool. At the time, the company didn't view
that as a security vulnerability, because Graph Search surfaced only
phone numbers and other data that users set to be public on their
profiles. Over the years, though, Facebook started to recognize that
it was a problem to make such data so easy to scrape, even if
individual users chose to make their data public. In aggregate, the
information could still enable scamming and phishing on a scale that
individuals presumably did not intend.

In 2018, Facebook acknowledged that it targeted ads based on users'
two-factor authentication phone number. That same year, the company
also disabled a feature that allowed users to search for other people
on Facebook using their phone number or email address—a mechanism that
was again being abused by scrapers. According to Facebook, this is the
tool cybercriminals used to collect the data TechCrunch reported on in
2019.

Yet somehow, in spite of these and other gestures toward locking user
phone numbers down, Facebook still did not fully disclose the 2019
data breach. The contact import feature is somewhat beleaguered, and
the company also fixed vulnerabilities in it in 2013 and 2017.

Meanwhile, Facebook reached a landmark settlement with the FTC in July
2019 over what can only be described as a massive number of deeply
concerning data privacy failures. In exchange for paying a $5 billion
fine and agreeing to certain terms, like discontinuing its
aforementioned alternate uses of security-authentication related phone
numbers, Facebook was indemnified for all activity before June 12,
2019.

Whether any of the contact import exploitation occurred after that
date—and therefore should have been reported to the FTC—remains an
open question. The one thing that's certain in all this is that more
than 500 million Facebook users are less safe online than they
otherwise would be—and potentially vulnerable to a new wave of scams
and phishing that Facebook could have alerted them to nearly two years
ago.


More information about the BreachExchange mailing list