[BreachExchange] Public Transport Victoria in breach of Privacy Act after re-identifiable data on over 15m myki cards released

Destry Winant destry at riskbasedsecurity.com
Fri Aug 16 08:51:32 EDT 2019


Public Transport Victoria (PTV) has been found in breach of the
Privacy and Data Protection Act 2014 (PDP Act) by the Office of the
Victorian Information Commissioner (OVIC) for releasing data that
exposed the travel history of 15,184,336 myki cards.

The myki dataset contained a record of "touch on" and "touch off"
events recorded by the myki system between 1 July 2015 and 30 June
2018, amounting to approximately 1.8 billion events across the 15
million distinct myki cards.

Each event record comprises multiple data points, including date and
time, location information, card identifier -- a unique number
assigned to each myki card -- and the card type, of which there are 70
spanning student, police, and asylum seeker categories as some

The data allowed for individuals to be re-identified, and their travel
activity for the three years exposed.

OVIC on Thursday detailed the activities that led to the data being
easily re-identified, publishing a report [PDF] on the disclosure of
myki travel information.

In releasing the report, Victorian Information Commissioner Sven
Bluemmel said OVIC's investigation into the release of myki data
demonstrates that deficiencies in governance and risk management in
relation to data can undermine the protection of privacy, even where
the project is well-intentioned.

PTV mid-last year released the dataset to Data Science Melbourne for
use in its Datathon. Datathon is a competition where participants are
encouraged to find innovative uses for a dataset.

The data was provided by the Department of Premier and Cabinet (DPC),
which administers the state government's open data platform, DataVic.

While OVIC said some steps were taken by PTV to de-identify the
dataset before it was released, a Datathon participant successfully
re-identified individuals. The participant raised their concern with a
Victorian public sector representative.

Similarly, academics working at the University of Melbourne -- Dr
Chris Culnane, A/Prof. Benjamin I. P. Rubinstein, and A/Prof. Vanessa
Teague -- the same research team that re-identified the Medicare
Benefits Schedule and Pharmaceutical Benefits Scheme data in September
2016 and reported in further information such as medical billing
records of approximately 2.9 million Australians were potentially
re-identifiable in the same dataset, in addition to previously finding
flaws in the NSW voting system -- had also located the dataset online
and were able to identify themselves, and persons known to them.

Both instances were reported appropriately, OVIC said.

The University of Melbourne researchers similarly published [PDF]
their findings on Thursday, demonstrating the ease with which they
were able to re-identify individuals.

Offering further information on the availability of the dataset, the
researchers said access to it was unrestricted, with a URL provided on
the Datathon's website to download the complete dataset from an Amazon
S3 Bucket.

They said over 190 teams continued to analyse the data through the 2
month competition period.

In detailing how they were able to identify individuals, two of the
authors said it was a straightforward exercise to re-identify
themselves as both have their myki cards registered, however, knowing
for certain one trip undertaken by a friend, the researchers were able
to find previous trips made by this individual.

"This type of re-identification is particularly concerning, since it
allows an individual to leverage the ease of re-identifying themselves
to re-identify others, and from potentially only a single co-travel
event," they wrote.

"This presents a risk for anyone who has co-travelled with someone in
the past, for example, an ex-partner, a co-worker, or even just
someone they went on a single date with. Due to the large amount of
data provided, ie, all touch on and off events, it could allow a
malicious party to determine where someone lived, worked, or
socialised -- and when they visit these places and for how long."

The researchers also found the identity of a stranger in the dataset,
using merely his Twitter account.

OVIC said there were flaws in the process followed by PTV in
de-identifying the dataset, assessing the risk of re-identification,
and deciding to provide the dataset for use in the Datathon.

As the information contained within the dataset was personal
information, it must be handled in accordance with the Information
Privacy Principles (IPP) in the PDP Act.

"As PTV is required under the PDP Act to protect personal information
in the dataset, it is the Deputy Commissioner's view that PTV breached
IPP 2.1 by disclosing personal information for a purpose other than
that for which it was collected," OVIC wrote.

"In disclosing the dataset to Data Science Melbourne in or around July
2018, the Deputy Commissioner found PTV contravened IPP 2.1 and
therefore interfered with the privacy of the individuals whose
personal information was in the dataset. The Deputy Commissioner is
also of the view that PTV breached IPP 4.1 in failing to take
reasonable steps to protect the personal information contained in the
dataset from disclosure.

"The steps taken by PTV in both considering Data Science Melbourne's
request for the provision of myki data, and in preparing the dataset
for release and use in the Datathon, were inadequate and not
reasonable to protect the information contained in the dataset."

OVIC's report also said a request by Datathon in 2015 for the same
information was declined because of concerns about ownership of the

It handed the data over last year, however, as it thought a thorough
privacy impact assessment had already been conducted.

"PTV's decision-making processes were not clear or well documented and
appeared to lack both the support of an effective enterprise risk
management framework and suitable rigour in the application of a risk
management process," OVIC continued.

In conducting its investigation, OVIC engaged CSIRO's Data61, which
determined that "the detailed nature of the information in the dataset
created a high risk that some individuals may be re-identified by
linking the dataset with other information source".

In justifying allowing access to the dataset, PTV said: "PTV does not
consider the data extract is personal information as defined in the
[PDP Act]. PTV's view is that there has been no breach or
contravention of the Information Privacy Principles (IPPs) as result
disclosing the data extract to the Datathon".

Further justifications from the state entity included the idea that a
myki card may be shared by multiple people and therefore potentially
showing movements of people collectively.

"It is significant the dataset was released to Data Science Melbourne
without any restrictions on its use or further dissemination," OVIC
held firm.

"The Deputy Commissioner is of the opinion that the identity of a
substantial proportion of the individuals whose travel movements are
recorded in the dataset can reasonably be ascertained.

"The Deputy Commissioner found neither IPP 2.1(a), 2.1(c), nor any
other exception to IPP 2 permitted the disclosure of the personal
information contained in the dataset. In disclosing the dataset to
Data Science Melbourne on or around 12 July 2018, PTV contravened IPP
2.1 and therefore interfered with the privacy of the individuals whose
personal information was contained in the dataset."

More information about the BreachExchange mailing list