Leveraging artificial intelligence to enhance systematic reviews in health research: advanced tools and challenges

Ge, Lixia; Agrawal, Rupesh; Singer, Maxwell; Kannapiran, Palvannan; De Castro Molina, Joseph Antonio; Teow, Kiok Liang; Yap, Chun Wei; Abisheganaden, John Arputhan

doi:10.1186/s13643-024-02682-2

Letter
Open access
Published: 25 October 2024

Leveraging artificial intelligence to enhance systematic reviews in health research: advanced tools and challenges

Lixia Ge ORCID: orcid.org/0000-0001-8080-7020¹,
Rupesh Agrawal²,
Maxwell Singer³,
Palvannan Kannapiran¹,
Joseph Antonio De Castro Molina¹,
Kiok Liang Teow¹,
Chun Wei Yap¹ &
…
John Arputhan Abisheganaden^1,4

Systematic Reviews volume 13, Article number: 269 (2024) Cite this article

7165 Accesses
37 Altmetric
Metrics details

Abstract

Artificial Intelligence (AI) is transforming systematic reviews (SRs) in health research by automating processes such as study screening, data extraction, and quality assessment. This perspective highlights recent advancements in AI tools that enhance efficiency and accuracy in SRs. It discusses the benefits, challenges, and future directions of AI integration, emphasising the need for human oversight to ensure the reliability of AI outputs in evidence synthesis and decision-making in healthcare.

Introduction

Systematic reviews (SRs) are crucial for synthesising evidence from multiple studies to inform clinical practice and future research. However, they are labour-intensive, particularly in the phases of study screening and data. Since 2005, artificial intelligence (AI) tools have gained increasing attention [1] for their ability to automate these processes, offering increased efficiency and accuracy [2]. This letter outlines recent developments in AI for different SR processes, highlights commonly used tools, and discusses the challenges and future directions.

AI tools for research question development and search strategy

Developing a clear research question and a comprehensive search strategy to identify relevant studies is a crucial yet time-consuming step in conducting SRs. AI tools like OpenAI’s ChatGPT can assist by generating PICO-based research questions and tailored search strings for databases like PubMed and Embase [2]. Additionally, ChatGPT can create custom code to automate the search and retrieval process using the National Center for Biotechnology Information E-utilities application programming interface (API).

Platforms like searchrefiner employ automation tools to identify frequent MeSH terms from selected references and categorise them into health condition, treatment, and study design, which are subsequently used to develop Boolean queries [3, 4]. Such AI tools save time and enhance the thoroughness of search strategies, helping ensure relevant studies are not overlooked. However, careful human oversight remains essential to ensure alignment with research goals.

AI in study searching and screening

Study searching and screening are laborious but essential processes of SRs as a significant portion of retrieved records often turn out to be irrelevant. AI-powered tools can greatly reduce this workload by automating or semi-automating relevance assessment using machine learning (ML) techniques.

SR software platforms like Nested Knowledge and DistillerSR have integrated search engine functionalities, allowing the automatic import of records from databases via respective APIs and efficiently filtering duplicates. However, not all the database APIs are embedded in these platforms, and delays in API updates can affect accuracy.

Several software tools support titles and abstract screening, with Abstrackr and Rayyan being the most validated [5, 6]. Abstrackr helps save time with minimal risk of missing relevant records by assisting one or two reviewers [7]. Rayyan, a widely used web-based software platform, integrates AI tools to facilitate the screening process. Table 1 lists commonly used AI-enabled software and platforms developed or enhanced in the past 5 years.

Table 1 The commonly used AI tools in different systematic review processes

Full size table

The performance of these tools depends on the effectiveness of the underlying ML paradigms and algorithms, the quality and quantity of data used for training, and the degree of human involvement [9]. Studies have demonstrated that well-developed AI tools, though not yet fully automated, can match or even surpass human reviewers in screening efficiency, accuracy, and thus accelerating the SR process. Hence, AI application in this process is among the most developed and explored. However, the effectiveness of AI in screening largely depends on factors such as the quality of training data and the extent of human oversight or verification.

AI for data extraction

Data extraction is a manual, error-prone task that AI tools are increasingly automating using natural language processing (NLP) and ML algorithms. Tools like RobotReviewer have high accuracy in extracting relevant details from randomised control trials (RCTs) [10]. Nested Knowledge and Abstrackr (Table 1) highlight relevant sections for data extraction, while SciSpace and Elicit.org offer intuitive interfaces to automatically extract data from multiple papers and present in tables. SciSpace also provides citations for extracted data and enables PDF interactions via Copilot for validation. SciSpace’s GPT can extract data by interacting with individual uploaded PDFs through specific prompts. Although not formally validated, these tools show promise in improving efficiency and saving time, though variability in content identification may affect reproducibility and accuracy.

Despite their advancements, AI tools continue to face challenges with nuanced interpretations and poorly reported data. Their accuracy and reliability depend on factors such as algorithms, the quality and diversity of training data, the complexity of source documents, and the precision of prompts. While AI reduces data extraction time, it still requires cautious use and human oversight to ensure accuracy.

AI for quality assessment

AI tools can streamline quality assessment by applying specific criteria to studies. For instance, RobotReviewer uses NLP to identify relevant text and assess the risk of bias in RCTs, achieving an accuracy rate of 71–78.3% [11], though it is less precise than Cochrane reviews. An AI extension for the Prediction model Risk Of Bias Assessment Tool (PROBAST) is under development to evaluate bias in clinical prediction model studies [12]. Additionally, SciSpace’s GPT can provide quality assessment for uploaded papers with justifications when properly prompted. While promising, these tools still lack the precision of manual assessments by experienced reviewers, highlighting the need for further development and validation.

AI for synthesis and summarisation

Traditional ML and NLP tools are not capable of fully automating data synthesis [10]. However, advanced AI tools offer valuable assistance in aggregating information from large volumes of research evidence. Platforms like Nested Knowledge and DistillerSR (Table 1) automatically generate and update PRISMA diagrams and synthesis tables, while ChatGPT and SciSpace’s GPT can assist with academic writing. While AI tools cannot fully automate this process and the generated summaries must be cross-checked by humans, it can significantly aid in organising and summarising research findings, accelerating the preparation of SR reports.

Challenges and future directions

AI tools have shown great potential in streamlining various SR processes, yet challenges persist. The risk of AI-induced biases, particularly in sensitive fields like healthcare, raises concerns. To address this, robust evaluation frameworks and standards are needed to ensure these tools produce reliable and unbiased results. Transparency in AI algorithms is also crucial for fostering trust and reproducibility.

The advancement of large language model (LLM) technology, exemplified by models like ChatGPT, offers new avenues for automating and enhancing SRs by improving the understanding and processing of complex language data. APIs enable LLMs to be integrated into code for efficient, large-scale data processing. A ChatGPT v4.0 API script achieved 96% screening specificity and 93% sensitivity when prompted with inclusion and exclusion criteria [13], showcasing the capability of base models without fine-tuning. Consecutive scripts using LLM APIs could automate SRs from screening to report writing. However, errors compound across steps, with a 5% error per step resulting in an overall accuracy of about 81.5%. This highlights the need to minimise errors at each stage and consider human-in-the-loop mechanisms for complex tasks like SRs.

While AI tools, particularly LLMs, offer immense potential to streamline various SR processes, the role of AI must be carefully defined to avoid misuse. Clear guidelines for AI integration are essential to prevent AI-generated content from slipping through peer-review processes. AI tools should augment, not replace, the meticulous work of researchers. For example, LLMs can assist in generating initial drafts, but human experts must review, refine, and validate these outputs to ensure scientific rigour and accuracy. Additionally, integrating checks such as provenance tracking and flagging AI-generated sections can enhance accountability. Furthermore, AI-generated summaries or syntheses should always be cross-checked against source data by experienced reviewers to prevent the propagation of errors. Embedding human oversight at each stage ensures that the final SR remains credible and unbiased.

Integrating critically appraised LLM-based AI tools with existing SR software could further optimise the process, although creating effective and efficient prompts remains a challenge. Collaboration between AI developers, SR methodologists, and healthcare experts is essential to refine these tools to maximise their potential.

Conclusion

AI holds transformative potential to enhance SRs by increasing efficiency and quality, but human judgment remains essential to ensure reliability. Defining the complementary roles of AI and human reviewers will help maintain SR integrity while leveraging AI’s efficiencies. Continued development and validation of AI tools are vital to fully realise their benefits. By integrating AI judiciously with ongoing human oversight, researchers can conduct more efficient, accurate, and comprehensive SRs, advancing evidence-based healthcare.

Data availability

Not applicable.

Abbreviations

AI:: Artificial intelligence
API:: Application programming interface
GPT:: Generative pre-trained transformer
LLM:: Large language model
ML:: Machine learning
NLP:: Natural language processing
RCT:: Randomised control trials
SR:: Systematic review

References

Cohen AM, Hersh WR, Peterson K, Yen P-Y. Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc. 2006;13:206–19.
Article CAS PubMed PubMed Central Google Scholar
Fabiano N, et al. How to optimize the systematic review process using AI tools. JCPP Advances. 2024;4: e12234.
Article PubMed PubMed Central Google Scholar
Scells H, Zuccon G, Koopman B, Clark J, et al. A computational approach for objectively derived systematic review search strategies. In: Jose JM, et al., editors. Advances in Information Retrieval. Cham: Springer International Publishing; 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-3-030-45439-5_26.
Chapter Google Scholar
Scells H, Zuccon G. searchrefiner: a query visualisation and understanding tool for systematic reviews. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York: Association for Computing Machinery; 2018. p. 1939–42. https://doiorg.publicaciones.saludcastillayleon.es/10.1145/3269206.3269215.
Chapter Google Scholar
Harrison H, Griffin SJ, Kuhn I, Usher-Smith JA. Software tools to support title and abstract screening for systematic reviews in healthcare: an evaluation. BMC Med Res Methodol. 2020;20:7.
Article PubMed PubMed Central Google Scholar
Hamel C, et al. An evaluation of DistillerSR’s machine learning-based prioritization tool for title/abstract screening–impact on reviewer-relevant outcomes. BMC Med Res Methodol. 2020;20:256.
Article CAS PubMed PubMed Central Google Scholar
Gates A, et al. The semi-automation of title and abstract screening: a retrospective exploration of ways to leverage Abstrackr’s relevance predictions in systematic and rapid reviews. BMC Med Res Methodol. 2020;20:139.
Article PubMed PubMed Central Google Scholar
van de Schoot R, et al. An open source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell. 2021;3:125–33.
Article Google Scholar
O’Connor AM, et al. A question of trust: can we build an evidence base to gain trust in systematic review automation technologies? Syst Rev. 2019;8:143.
Article PubMed PubMed Central Google Scholar
Marshall IJ, Wallace BC. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst Rev. 2019;8:163.
Article PubMed PubMed Central Google Scholar
Marshall IJ, Kuiper J, Wallace BC. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J Am Med Inform Assoc. 2016;23:193–201.
Article PubMed Google Scholar
Collins GS, et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021;11: e048008.
Article PubMed PubMed Central Google Scholar
Li M, Sun J, Tan X. Evaluating the effectiveness of large language models in abstract screening: a comparative analysis. Syst Rev. 2024;13:219.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

No funding was received for the study.

Author information

Authors and Affiliations

Health Services and Outcomes Research, National Healthcare Group, Level 4 @ NSC, 1 Mandalay Rd, Singapore, 308205, Singapore
Lixia Ge, Palvannan Kannapiran, Joseph Antonio De Castro Molina, Kiok Liang Teow, Chun Wei Yap & John Arputhan Abisheganaden
Tan Tock Seng Hospital, Singapore, Singapore
Rupesh Agrawal
Department of Ophthalmology and Visual Science, Yale University School of Medicine, New Haven, CT, USA
Maxwell Singer
Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
John Arputhan Abisheganaden

Authors

Lixia Ge
View author publications
You can also search for this author inPubMed Google Scholar
Rupesh Agrawal
View author publications
You can also search for this author inPubMed Google Scholar
Maxwell Singer
View author publications
You can also search for this author inPubMed Google Scholar
Palvannan Kannapiran
View author publications
You can also search for this author inPubMed Google Scholar
Joseph Antonio De Castro Molina
View author publications
You can also search for this author inPubMed Google Scholar
Kiok Liang Teow
View author publications
You can also search for this author inPubMed Google Scholar
Chun Wei Yap
View author publications
You can also search for this author inPubMed Google Scholar
John Arputhan Abisheganaden
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

LG and JADCM conceptualised the study. LG prepared Table 1, tested out the tools, and drafted and revised the manuscript. RA and MS substantially revised the manuscript with valuable inputs. PK, KLT, and CWY provided comments on AI and ML techniques. JADCM and JAA supervised the work. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Lixia Ge.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ge, L., Agrawal, R., Singer, M. et al. Leveraging artificial intelligence to enhance systematic reviews in health research: advanced tools and challenges. Syst Rev 13, 269 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13643-024-02682-2

Download citation

Received: 24 September 2024
Accepted: 04 October 2024
Published: 25 October 2024
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13643-024-02682-2

Leveraging artificial intelligence to enhance systematic reviews in health research: advanced tools and challenges

Abstract

Introduction

AI tools for research question development and search strategy

AI in study searching and screening

AI for data extraction

AI for quality assessment

AI for synthesis and summarisation

Challenges and future directions

Conclusion

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Systematic Reviews

Contact us

Leveraging artificial intelligence to enhance systematic reviews in health research: advanced tools and challenges

Abstract

Introduction

AI tools for research question development and search strategy

AI in study searching and screening

AI for data extraction

AI for quality assessment

AI for synthesis and summarisation

Challenges and future directions

Conclusion

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Systematic Reviews

Contact us