The researchers have used this finding to develop a standardised data collection template, which can be implemented on repositories like GISAID, without identifying the patient and making it easier for clinical teams treating patients to share more of their knowledge.
This enables the scientific community to access important information including symptoms, vaccine status and travel history and in doing so build a more complete picture of the impact of COVID-19 on each patient.
SARS-CoV-2, the virus that causes COVID-19, is one of the most sequenced viruses in history, with over 200,000 sequences on GISAID as of 16 November 2020.
The last 100,000 sequences of the virus were uploaded in the past two months, a global record.
The study, a collaboration with GISAID and other academic partners, proposes a standardised data collection method to help scientists and clinicians around the world gather and share vital information in the fight against COVID-19.
CSIRO researcher and senior author of the paper Dr S.S. Vasan said it is critical to collect the 'patient journey' in as much detail as possible to understand the impact of virus evolution on the disease and its consequences.
"We urgently need de-identified patient data associated with these virus genome sequences in order to decipher whether disease outcomes are due to a mutation, or multiple mutations, in the virus or host factors such as age, gender and co-morbidities," Dr Vasan said.
"It's very likely this information is known to the clinical teams who treated the patient but does not make its way to public repositories such as GISAID, due to the number of steps involved."
Recognising this need for clinical data, GISAID made 'patient status' a compulsory field for uploading virus sequences since 27 April 2020.
However, the study showed a lack of digital infrastructure for collecting clinical information has hampered progress.
It also identified the need for a standardised vocabulary and mechanism for linking in with health systems as key factors for capturing the necessary information.
Lead author and CSIRO researcher Dr Denis Bauer said with the adoption of the study's proposed data collection template, future sequences shared through the GISAID initiative could contain more meaningful de-identified patient information.
"We have identified steps in the clinical health data acquisition cycle and workflows that likely have the biggest impact in the data-driven understanding of this virus," Dr Bauer said.
"Following the 'Fast Healthcare Interoperable Resource' implementation guide, we have introduced an ontology-based standard questionnaire consistent with the World Health Organization's recommendations."
Barwon Health's Director of Infectious Diseases Professor Eugene Athan welcomed the new data collection template.
"Barwon Health is leading a study on the long-term biological, physiological and psychological effects of COVID-19, in partnership with CSIRO and Deakin University, and we intend to implement this mechanism for our data collection and reporting," Professot Athan said.
"Having a simplified and standardised approach to sharing relevant patient information alongside genome sequences will enable critical research into COVID-19 and comparisons between different studies and population sets.
"I encourage clinicians and scientists around the world to share, wherever possible, de-identified patient information and clinical outcomes using this template to support ongoing research efforts."
The paper 'Interoperable medical data: the missing link for understanding COVID‐19' was published in the Transboundary and Emerging Diseases journal.
Images
Background information
Note
The institutions that collaborated in this work include:
- Agency for Science, Technology and Research (A*STAR) is Singapore’s lead public sector R&D agency and a statutory board under the Ministry of Trade and Industry of Singapore. www.a-star.edu.sg
- CSIRO, the Commonwealth Scientific and Industrial Research Organisation, is Australia’s national science agency. www.csiro.au
- GISAID Initiative promotes the rapid sharing of data from all influenza viruses and the coronavirus causing COVID-19. Headquartered in Munich, it was set up in the Sixty-first World Health Assembly in May 2008. www.gisaid.org
- Institut Pasteur is a French non-profit private foundation since 1887, dedicated to the study of biology, micro-organisms, diseases, and vaccines. www.pasteur.fr
- Macquarie University is a public research university based in Sydney, Australia. www.mq.edu.au
- National University of Singapore, established in 1905, is the oldest and highest ranked academic institution in Singapore. www.nus.edu.sg
- Sorbonne Université, established in 1257, is a public research university in Paris, France. www.sorbonne-universite.fr
- University of York, is a collegiate research university that is part of the UK’s prestigious Russell Group. www.york.ac.uk
For details on the CSIRO’s COVID-19 systems biology case-control study in collaboration with Barwon Health, see How our researchers are making discoveries about coronavirus