Research reveals de-identified patient data can be re-identified

A well-known example is the re-identification of a dataset from Netflix done by Arvind Narayanan. In September 2016, we found that the encryption of supplier IDs was easily reversed. Now, we find that patients can be re-identified , using known information about the person to find their record. Some data can be safely published online, such as information about government, aggregations of large collections of material, or data that is differentially private.


The process of re-identifying individuals refers to using anonymized data to find individuals in public datasets.

Narayanan and his team were able to re-identify the anonymous database and this study lead to a privacy lawsuit against Netflix that consequently cancelled a second contest in 2010. It is time-consuming, requires serious data management and statistics skills and it simply lacks the easy transmission and transferability as seen in computer viruses. Re-identification is possible only if the perceptual particular exists in something at least analogous to space and time.

De-identification is very unlikely to work for other rich datasets in the government's care, like census data, tax records, mental health records, penal information and Centrelink data.

