‘The more data the better’ is a recurring request from organisations when it comes to mining data sets to solve complex problems. However, in unique cases, being exposed to too much data can be harmful to one’s mental health when working on the data-driven front line.
The Australian Federal Police are tasked with identifying abhorrent content during investigations, often resulting in higher rates of post-traumatic stress disorder among officers.
To help protect investigators from being exposed to vast amounts of such material, CSIRO’s Data61 is working with Monash University and the AFP to analyse potentially harmful data without the need to view it.
The Data Airlock platform uses artificial intelligence (AI) and machine learning to scan through and filter confronting images faster than previous methods, while also keeping analytics secure and restricted.
“Data Airlock focuses on three key principles: Protecting people from data, protecting data from people, and analysing sensitive data in a safe and secure manner,” says Data61’s Dr. Surya Nepal, who was the Board Sector Manager on the Data Airlock development task force.
In the case of a child exploitation investigation, AFP officers are given a couple of days to view large a large amount of material before reporting to court for a successful conviction. The original method required officers to view thousands of images, comparing photo files to identify similarities. In 2018, a method known as ‘perceptual hashing’ was introduced, and used algorithms to look for similarities between the content of the images, leaving a digital watermark to identify various forms of material.
‘Perceptual hashing measures recurring similarities to predict potential outcomes, in this case, if material was created by the same person or included the same people. "
“The old approach lacks predictive analysis. When there’s a minor amount of distortion in the image, such as changing a pixel, that can change the whole hash. So that means the two images, which are perceptually similar, can no longer be detected,” says Dr Nepal.
One of the challenges of using machine learning algorithms to draw predictions from sensitive data is maintaining privacy and security of the data.
“What Data Airlock provides is a kind of isolated and secure environment where people can put their algorithms and models in, execute them against the data, and get the research out,” says Dr. Nepal.
“That data never leaves the data owner’s data-isolated environment, so that owner has full control of that data all the time.”
The design enables researchers to develop new algorithms against sensitive data without being exposed to the data, using a Model-to-Data (MTD) paradigm: keeping information in secure vaults and permitting only manually vetted algorithms to operate on the data in isolated environments called airlocks.
Full analytical capability is achieved while keeping data custodians in absolute control. Researchers receive updates during executions and vetted outputs on completion for evaluation and action. Data Airlock’s composition also allows trusted third parties to host the system securely.
“Organisations with sensitive data want to have universities or the researchers involved in analysing their data, but at the same time, they don't want to release their data to them.”
“We provide an environment where new innovations can be created by engaging with researchers in universities, without them having access to the data.”
Using Data Airlock, AFP aims to utilise academia and their counterparts around the world to make identification of such materials more efficient and accurate.
Interest in Data Airlock has extended to the Department of Home Affairs, NSW Police and Australian Institute of Health and Welfare (AIHW) and researchers from CSIRO’s Data61 researchers are working to adapt the program to their specific needs.
Within the next 12 months, CSIRO’s Data61 plans to equip Data Airlock with cryptography and differential-privacy algorithms to improve its usability in domains including healthcare.