Privacy key on a computer keyboard.
Privacy-Preserving Analytics: software for analysing confidential data
CSIRO has developed technology called PPA that allows personal or commercially sensitive data to be statistically analysed while protecting confidentiality and privacy.
22 October 2010 | Updated 14 October 2011
Databases are a treasure trove of information for decision makers. However, there are privacy risks associated with statistical analysis of data, and care needs to be taken to not compromise confidentiality during data analysis.
CSIRO’s Privacy-Preserving Analytics (PPA) allows both trends in data and detailed information to be quickly and easily determined without revealing any individuals’ or businesses' private details.
The problem to be solved
Many organisations have large databases of confidential or private information. They want to allow analysis of their data to help them make better decisions, the outcomes of which may, in turn, benefit individuals.
These organisations, or data custodians, may include:
Privacy is a complex issue. People and organisations rightly expect that no-one will have access to their private information unless authorised. Some types of data are considered more sensitive than others. Privacy concerns, real or imagined, prevent some valuable analyses being done at all.
CSIRO has identified a clear need for a range of different solutions that allow a balance between the need to:
- maintain privacy
- provide information for evidence-based decision-making.
How we addressed the problem
CSIRO assessed various solutions and approaches being applied around the world.
Traditional approaches include:
- removing data, such as name and gender from each record,
- adding ‘noise’ to the data or swapping data between records.
However, these may:
- compromise data quality by distorting the dataset
- limit data usefulness by removing or changing important information
- introduce difficulties in the analysis stage
- fail to sufficiently confidentialise the data.
CSIRO’s solution is Privacy-Preserving Analytics, or PPA. These CSIRO-developed methods and demonstrator software perform statistical analyses in a secure environment, then filter the results delivered to the user so that confidentiality and privacy are protected.
PPA performs analysis on 'raw' unit record level data without the need for the data custodian to release the data. The data always remain under the direct control of the data custodian. Even the data analyst has no direct access to the data but works with a purpose-built ‘remote control’ panel.
PPA is designed to:
- analyse raw data remotely, in a secure environment
- prevent a user reconstructing any individual record
- be part of a bigger privacy solution which incorporates governance and security.
PPA can be used on a:
- single database
- data warehouse
- virtual data warehouse.
Analyses available in PPA include:
- exploratory data analysis
- statistical modelling
- survival analysis
Applications for the future
Future application of PPA could open up access to data which is currently not available for research and policy analysis.
For example, the Population Health Research Network has been established to provide improved accessibility to Australia health related data for the research sector, and PPA could provide an access pathway.
More widely, services improvement and innovation often requires access to personal and/or confidential services delivery data, and PPA could provide a means of balancing the use of the data with privacy and confidentiality protection.
Technology Evaluation licences are available for the research demonstrator software PPA.
To register your interest, contact Dr Christine O’Keefe.