Blog icon

By Alison Donnellan 5 November 2020 9 min read

The journey towards understanding user trust and how to design trustworthy systems

Many methods have been developed to promote fairness, transparency, and accountability in the predictions made by artificial intelligence (AI) and machine learning (ML) systems. A technical approach is often the focal point of these methodologies, however, to develop truly ethical machine learning systems that can be trusted by their users, it is important to supplement this with a human-centred approach.

A deep understanding of the people, contexts and environments connected to these systems can provide the foundations for designing and presenting information in such a way that the people that use the systems can make balanced decisions on whether or not they should trust them.

In the complex and very technical world of machine learning, User Experience Designers within the Product & Design group and a Social Scientist in the Responsible Innovation Future Science Platform at CSIRO's Data61 established the key considerations for taking a human-centred approach to building machine learning systems.

Here, Product & Design Lead Georgina Ibarra explores the process they took to understand the components of trust and the development of best practice guidelines for designing trustworthy machine learning systems using the example of a recent project involving Australian law enforcement agencies.

Connecting technology with the people that will use it

As a product and design team working for CSIRO's Data61, we take a human-centred design approach to developing technology. This means researching the users of the technology we are creating to ensure we understand their context - their problems, needs and jobs to be done. We collaborate with social scientists to further examine the social identities and dynamics associated with these environments, contexts and people, together establishing the key insights needed to create an inclusive system.

This process lifts our focus from deliverables, enabling us to consider the social and ethical consequences of the technology we are creating. We consider it our responsibility to create responsible technology.

Project context

Partnered with Australian law enforcement agencies, our brief was to extend agencies' graph data capabilities and the maturity of their graph data infrastructure by developing a graph analytics software system driven by machine learning.

High security restrictions meant that working with and talking to experts in law enforcement had many limitations, restricting the teams' ability to piece together a full picture of what we needed to solve with complete context.

What we could establish is that domain and data experts building criminal investigation cases use data to create intelligence insights with the goal of identifying connections and patterns within the data that can form the basis of further action by investigators.

We also learned that many of their data-assisted decisions have high consequences.

A human centred approach to user trust

Important frameworks have been established in recent years that outline the risks and weaknesses of artificial intelligence and machine learning, which has contributed to the dialogue around the ethical implications of this technology.

The user experience (UX) component of this case study takes a domain-focused and human-centred design approach to facilitating human interaction with a specific machine learning system.

This enables the dissection of critical components that a particular user needs to build their trust in the model outcomes and ultimately the trustworthiness of a particular system.

The Product and Design team at Data61 work closely with stakeholders, users and beneficiaries of the system to design these components so they can be interpreted, evaluated and trusted by the people that use the system.

This lens is what led us to our key research question: For experts making high-risk and high-consequence decisions, what are the conditions required for them to accept and use predicted data from a machine learning model to assist decision-making in their work?

Exploring the social identities and dynamics of our users

To build on our knowledge regarding the social context of the humans that needed to interact with our machine learning system for this project,we collaborated with the Responsible Innovation Future Science Platform to explore concepts in machine learning, trust in automation and criminal investigation. This culminated in the recently published report Machine Learning and Responsibility in Criminal Investigations.

The report explains the responsibilities of criminal investigators and how these responsibilities could be affected when using ML. It identifies the factors that influence the level of trust users place in automated systems and how this trust can be influenced by different social and environmental factors.

Through a combination of user experience and social science approaches, we examined how the level of trust investigators place in the findings of ML systems can be calibrated to reflect the trustworthiness of those systems.

Three interwoven concepts of trust

The concept of trust, trustworthiness, and calibrated trust is crucial in the human use of ML systems for high-risk decision making.These concepts of trust are interdependent, making it vital to clarify their terms:

  1. User Trust - 'I trust the system to perform this goal.'

    This trust is embodied by the user and can be triggered by multiple factors, such as their general willingness to trust automation, training and experience, situational factors such as their mood, workload, and the complexity of the task.

  2. System Trustworthiness - ‘The system is trustworthy enough to perform this function.'

    The onus is on the system to demonstrate how worthy it is of a user's trust, usually for a particular function in a specific context.

  3. Calibrated Trust - 'The user's trust is calibrated to the trustworthiness of the system.'

    Ideally, the user's trust in an ML system corresponds with the system's capabilities. Regardless of how trustworthy the system is, the user is able to make a judgement on the best use of its predictions.

Like any good design challenge, the issue of trust in machine learning is much easier to comprehend when it is in context. Who needs to trust the outcomes from the machine learning system and why do they need to trust it? What do they need to do with the information, how well can they evaluate it, and what goal are they seeking to achieve by using it?

The data users

There are two primary users that employ data to understand networks as part of criminal investigations:

The main reason these people use data is to build intelligence that can influence investigations and ultimately lead to prosecution. Many data assisted decisions have high consequences, making the trustworthiness of the system critical for user trust.

There is another, more secondary user who we call the 'intermediary, usually a project manager or business analyst managing data projects and processes, often championing new tools and methodologies.

A graphic displaying a conceptual diagram titled "The data users." Top center shows a large blue circle labeled Project teams. Inside this circle are small stylized figures representing different roles, including Research scientist, Strategic decision maker and Subject matter expert.

Below the project teams circle are two overlapping circles forming a Venn diagram.

On the left is a blue circle labeled Data science teams. Within this circle, arranged around the data scientist role are icons and labels for different roles which include:

  • Data scientist (central larger figure)
  • Data engineer
  • Software engineer
  • DevOps/ML engineer
  • Data visualisation developer. 

On the right is a purple circle labeled Investigation teams. Inside it, arranged around the central investigative analyst are icons and labels for roles including:

  • Investigative analyst
  • Intelligence analyst
  • Forensic accountant
  • Data analyst.

Where the two larger circles overlap, there is a shaded purple‑blue area indicating collaboration between the Data science teams and the Investigation teams. Arrows point upward from both the data science and investigation circles toward the Project teams circle, showing that both groups feed into and support the project teams.

The overall layout visually communicates that project teams are supported by both specialised data science roles and investigative analysis roles working together, with shared skills in the overlapping area.

Designing trustworthy machine learning: the data users.

The data ecosystem

The following diagram is a visual representation of where data commonly originates in criminal investigations, the transformation it underwent when applied to our machine learning system, and some of the pathways that may result from the data insights produced by the system.

A horizontal workflow infographic illustrating how data is collected, processed, and used in intelligence or investigative operations. It is divided conceptually into upstream, processing, and downstream stages.

Upstream (left side)

Three types of data sources feed into the process:

  • Warrant data - represented by a small icon of a document with a legal seal.
  • Core structured data - represented by a database cylinder icon.
  • Partner semi-structured data - represented by a folder with connected dots icon.

Arrows from each source converge into the next stage.

Processing (central section)

Data moves sequentially through several steps, each shown with labeled boxes and arrows:

  • Data acquisition - initial collection and ingestion of all data streams.
  • Cleaning and pre-processing - removing errors, standardizing formats, and preparing the data.
  • Convert to graph - organizing data relationships into a graph format for analysis.
  • Model creation - applying analytics, algorithms, or AI to the structured data graph.
  • Visualisation - displaying patterns, trends, and insights in graphs or charts.

Above the visualisation stage are icons representing key personnel: a data scientist and an intelligence analyst, indicating collaboration.

Outputs (right side)

After processing, data produces two main outputs:

  • Intelligence reports - detailed analytical summaries.
  • Combined case information - aggregation of evidence and processed insights for operational use.
  • Further downstream, these outputs feed into practical applications represented by three icons:
  • Evidence in court - icon of a government building.
  • Supplemental information in the field - icon of a police car.
  • Information required for another warrant - document with a badge icon.

Dashed lines indicate optional or alternate data pathways between stages.

The overall layout and meaning can be summarised as:

  • The workflow flows left-to-right, showing how raw and partner data is transformed into actionable intelligence.
  • Overlapping stages and personnel icons emphasize collaboration between technical data teams and investigative teams.
  • The image conveys both the sequential processing of data and the interconnected nature of analytic and operational roles.
Designing trust in machine learning: The data ecosystem.

By overlaying the data ecosystem with some of the issues that surfaced during our research (e.g. the information needed to accurately communicate the prediction data, the need for machine learning literacy in the industry's workforce, etc), these mappings provided us with a way to promote the dialogue around developing user trust, system trustworthiness and calibrated trust.

They also provided a starting point for understanding who is responsible for each part of the system and the role they play in mitigating challenges that arise.

Infographic showing a horizontal process flow from left to right that outlines key considerations when working with predictive data and machine learning insights. The diagram is visually divided into three main sections:

  1. a data collection phase on the left
  2. a central modelling and communication phase
  3. an insights/decision‑making phase on the right.

The infographic shows text grouped around and connected by lines and simple icons. Colors used include black for main text and connection dots, purple for certain key phrases and blue for category headings.

Collecting and preparing data (left section)

At the top left is a blue heading that reads Monitoring the environments that data is collected in

Under this are four bullet categories of data sources, arranged vertically, with dashed lines connecting them to the central flow:

  • Warrant (ad‑hoc, mostly unstructured)
  • Core data (streaming, structured)
  • Partner data (regular, semi‑structured)
  • Upholding the privacy and digital rights of citizens

Below this list and to the right is another cluster of purple and black text:

  • Reliance on historical data (purple)
  • Data acquisition (connected to the data sources list with a dashed line) (in black)

Under that, Complying with legislation restrictions around how data is used (in purple) and Communicating the provenance of the data (in purple) appear left of the main flow.

Data processing and communication (Central Section)

Across the center is a horizontal sequence representing stages of handling data. Black circles represent key steps along this timeline; between them, small descriptive texts are placed:

  1. Data acquisition
  2. Cleaning and pre‑processing
    Near it: Methods for capturing bias (in blue) and Communicating the structure and properties of the data (in purple)
  3. Converting to graph
    Above is a heading: Implementing changes to existing accountability systems (in blue)
  4. Model creation
    In the space above a heading states: Clearly communicating the levels of uncertainty, confidence and probability of predictions. (in purple)and below Using interpretable ML models instead of a black box model (in purple)

  5. Visualisation – depicted with a stylized graphic of a network diagram icon near the timeline.
    Above this, in blue is a heading Increasing data and machine learning literacy across the workforce where it will be deployed.
  6. To the right of “Visualisation” there are two more icons representing reports or documents arranged in a horizontal cluster. Next to them in black is Combine insights with other info in intelligence report. To the right again, another icon cluster representing combined information in another report with black text: Combine intelligence report with other case information in.

Above these later steps in purple is Regular usability testing to understand how predictive data is perceived.

Outputs / data insights pathways (right section)

On the far right is a black circle labeled Data insights pathways. From this point, several dashed lines branch out to outcomes or considerations, arranged in a radial pattern with short text labels. These include:

Upwards branch (greyed or lighter text):

  • Other pathways we are not aware of (greyed/lighter text)
  • Evidence in court
  • Supplemental information in the field
  • Information required for another warrant
  • Other pathways we are not aware of

Ending the infographic are heading in blue text stateing:

  • Evaluating the validity of ML tools.
  • Upholding the privacy and digital human rights of citizens.
  • Accessibility of data insights.

All text is horizontally oriented; there are no photographic elements, only typographic and simplified icon representations as described.

Points of intervention in the ecosystem. The text highlighted in purple most closely concerns us as a technology team; the text highlighted in blue concerns stakeholders in the broader ecosystem.

Existing literature

To understand more about trust in machine learning,a literature review was undertaken to explore the methods and practices currently in use to build trust in machine learning algorithms. Written by The Partnership on AI, Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System provided a robust assessment of the ‘paradigmatic example of the potential social and ethical consequences of automated AI decision-making.' It highlights three key challenges:

These challenge areas reflect many of the important issues surfaced in our review of other literature, which are detailed in the report Machine Learning and Responsibility in Criminal Investigation mentioned earlier in this article.As user researchers and designers focused on the interaction component of ML systems, the third challenge area was our cornerstone. However, it's important to note that considerations must be made across all these areas for technology teams building ML systems that are safe for production deployment.

Guiding best practice

After completing our research on visualising predicted data and our users' needs, context, and social dynamics,the team intended to dive straight into designing data scenarios and visualisations to test how well users understood and trusted the properties of predicted data in different situations.

But our research uncovered a gap; best practice did not yet appear to have been established. As a result, the team designed a set of practical guidelines(see footnote) to enable informed decisions at each stage of the design and development process,and ensure a human centred approach was always front of mind.

Unfortunately, high security restrictions prevented the application of our guidelines to designing realistic scenarios with real data used for criminal investigations.

What next?

During this project, we progressed our knowledge of this complex area significantly, developing our understanding of key issues and considerations in this emerging space of designing for trust.

It is now time for the team at Data61 to put this knowledge to the test with our next research project, and to apply this knowledge in practice. We'll be constructing realistic scenarios using data related to a specific domain, designing the critical components needed for users of a machine learning system to build their trust in the model outcomes and ultimately the trustworthiness of a particular system.

Stay tuned for our next blog post where we will share the guidelines we developed: “Guidelines for the Creation of AI Tools for Humans,” followed by a case study for how we have applied the guidelines in practice

CSIRO's Data61 Investigative Analytics program was a three year technology program funded by the Australian Government's Modernisation Fund with the mission of “increasing the graph analytics capabilities of Australia”. Over the course of the program an interdisciplinary team of data scientists, engineers, user experience (UX) designers and product managers developed a graph analytics software system driven by machine learning