[BreachExchange] Covert 'Replay Sessions' Have Been Harvesting Passwords by Mistake
audrey at riskbasedsecurity.com
Tue Feb 27 18:45:51 EST 2018
Yes, websites track your behavior online. But some go much further than
what you'd reasonably expect, using so-called session replays to create a
detailed log of everything you do and type on a site. And new research
shows that in some cases these movie-like recordings are even storing your
Bulk data collection is always a privacy red flag. But the Princeton
research group that first published findings about session replay scripts
has uncovered a troubling series of situations where seemingly
well-intentioned safeguards fail, leading to an unacceptable level of
The investigation started with Mixpanel, a product analytics company that
offers a comprehensive user data collection service known as Autotrack. The
company admitted in an email to its customers at the beginning of February
that the feature had been unintentionally collecting password data, even
though Autotrack includes heuristics meant to prevent that very thing.
Autotrack isn't a session replay script, but it collects whole-hog user
interaction data so that Mixpanel's clients can query later for any
information about their users. Mixpanel corrected the password flaw and
issued an SDK update, but the Princeton researchers—Steven Englehardt,
Gunes Acar, and Arvind Narayanan—say they realized that these types of
password redaction failures were probably a larger problem.
"It kind of snowballed and I think it’s likely that there are other design
patterns out there that are also weakened," says Englehardt, a web privacy
PhD candidate. "We’ve highlighted some, but we could continue to go down
this road and find other things again and again just because of the way
that these scripts are designed."
You Shall Not Password
Even after Mixpanel issued fixes for the password retention issue, the
Princeton researchers still found situations in which Autotrack recorded
passwords. The feature tries to avoid retaining passwords by automatically
redacting input fields that have a name or ID that includes the term
"pass." The limitations are obvious: A password field might, say, be named
"pwd," or a site might use a language other than English.
One prevalent example the group found centers on "Show Password"
features—tools offered by many sites and browser extensions that allow
users to see the password they're entering in plaintext so they can catch
typos. The researchers discovered that on certain Mixpanel client sites,
like testbook.com, the feature confused the password redaction protections.
If a user clicked Show Password and then took any other action, like
re-obscuring the password or editing it in the text field, Autotrack
recorded the password, even if the user decided not to log in and didn't
submit it. This happens when the Show Password feature stores the password
in a second invisible field, so Autotrack is collecting it from that second
field, which it doesn't know to classify as sensitive. The researchers
found that this problem also came up when users added Show Password browser
extensions to the mix, altering website behavior in ways neither the site
nor its third-party services control.
"The structure of the rendered webpage is being modified, changing the type
of input field from a password field to a regular text field. When this
happens, Autotrack loses the ability to identify whether or not a field is
being used to enter a password," Mixpanel said in a statement. "Per our
documentation, if a customer is collecting sensitive information in
non-password fields, they should explicitly blacklist it for collection."
Mixpanel has also put its entire Autotrack feature "on hold" in recent
weeks, making the tool inaccessible to new users while the company
"evaluate[s] how to provide seamless, easy integration of Mixpanel in a way
that’s transparent and predictable to our customers." A spokesperson said
that the company has realized that some of its customers didn't understand
how much data Autotrack collected, and wanted more control over what
information the tool retained. Mixpanel also says it is developing
mechanisms to make it easier for customers to review the totality of the
data the feature collects, so they can more quickly spot things that don't
The researchers crawled publicly available data for the Alexa top 50,000
sites, looking at samples of thousands of sites at different tiers of
popularity, and found examples of misbehaving session replay scripts at all
levels. Not every site that uses replay sessions will retain sensitive
data—they may not scan pages where users enter personal data or may
correctly implement protective blacklists—but the relative ease of finding
examples indicates that the issue is widespread.
The researchers don't believe that any of the analytics firms they've
studied intend to collect the sensitive data, or do it with malicious
intent—unlike some hackers. And working with the companies and impacted
websites has motivated a number of improvements. But they note that the
privacy concerns are diverse and exist on an increasingly massive scale.
"We've had responses from the vendors, they promise to do more on detecting
these kinds of leaks," says Günes Acar, a postdoctoral researcher at
Princeton who studies online tracking. "But these leaks will happen no
matter what unless they stop collecting all inputs from fields. I’m not
really very optimistic."
Gotta Catch 'Em All
In addition to Mixpanel, the researchers looked at examples of accidental
password collection involving three other firms that specifically offer
session replay services—UserReplay, FullStory, and SessionCam. The
researchers continued looking at Show Password features and browser
extensions and found a number of situations in which privacy protections
Often this occurs even when the user doesn't actually use the Show Password
feature, simply because entering a password creates that additional
invisible field that holds the password in plaintext in case the user wants
to Show Password. Many session replay scripts fail to exclude this second
password field, as was the case with FullStory and a service called
"Session replay is a uniquely effective technology that helps businesses
fix bugs, offer excellent support, and make their websites easier to use,"
a FullStory spokesperson told WIRED. "We believe there is an opportunity to
ensure that session replay and privacy concerns are not at odds, and we
have a team internally dedicated to this effort. The work that Gunes [Acar]
and Steve [Englehardt] are doing at Princeton can only help us get better."
One particularly odd example the researchers found comes from Capella
University's admission login page, which produced a password leak through
the interaction of two third-party tools. When a user typed a password in
the password field, an Adobe Analytics ActivityMap script stored the
password in a cookie. At the same time, the session replay service
UserReplay was set up on the page to record all generated cookies. As a
result, UserReplay was inadvertently collecting passwords.
"Out of an abundance of caution, we are removing the Adobe Analytics code
that sets the cookie and we’re suspending our use of UserReplay," Capella
public affairs vice president Mike Buttry told WIRED. "We take student data
protection very seriously." UserReplay did not yet return a request for
comment, but the researchers say that the situation is extremely unusual.
Rather than reflecting a common problem, it shows how unpredictable
third-party interactions on web pages can be, whether they come from
analytics tools or completely dark horse services like Show Password
The password leaks the researchers found aren't direct exposures, because
they leak from one service to another rather than out into the public
sphere. But unintentional data exposures do increase the overall risk that
data will someday publicly leak or be breached. The more copies of
sensitive information that exist, the broader the attack surface, and when
data is being collected accidentally it may not be stored properly or have
"To me the worst case scenario is nothing happens, nothing changes and this
is just the new normal," Englehardt says. "When we first started talking
about session replays people were surprised, but over time that surprise
all goes away and this becomes an acceptable risk. I hope that our findings
encourage companies to change their practices, not just patching the
specific things we point out, but really change the design of the product."
As long as bulk analytics data and session replays help companies improve
their user experiences, optimize their products, and do better marketing,
though, radically altering these services will be a tough sell.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the BreachExchange