- Letter
- Open access
- Published:
Leveraging artificial intelligence to enhance systematic reviews in health research: advanced tools and challenges
Systematic Reviews volume 13, Article number: 269 (2024)
Abstract
Artificial Intelligence (AI) is transforming systematic reviews (SRs) in health research by automating processes such as study screening, data extraction, and quality assessment. This perspective highlights recent advancements in AI tools that enhance efficiency and accuracy in SRs. It discusses the benefits, challenges, and future directions of AI integration, emphasising the need for human oversight to ensure the reliability of AI outputs in evidence synthesis and decision-making in healthcare.
Introduction
Systematic reviews (SRs) are crucial for synthesising evidence from multiple studies to inform clinical practice and future research. However, they are labour-intensive, particularly in the phases of study screening and data. Since 2005, artificial intelligence (AI) tools have gained increasing attention [1] for their ability to automate these processes, offering increased efficiency and accuracy [2]. This letter outlines recent developments in AI for different SR processes, highlights commonly used tools, and discusses the challenges and future directions.
AI tools for research question development and search strategy
Developing a clear research question and a comprehensive search strategy to identify relevant studies is a crucial yet time-consuming step in conducting SRs. AI tools like OpenAI’s ChatGPT can assist by generating PICO-based research questions and tailored search strings for databases like PubMed and Embase [2]. Additionally, ChatGPT can create custom code to automate the search and retrieval process using the National Center for Biotechnology Information E-utilities application programming interface (API).
Platforms like searchrefiner employ automation tools to identify frequent MeSH terms from selected references and categorise them into health condition, treatment, and study design, which are subsequently used to develop Boolean queries [3, 4]. Such AI tools save time and enhance the thoroughness of search strategies, helping ensure relevant studies are not overlooked. However, careful human oversight remains essential to ensure alignment with research goals.
AI in study searching and screening
Study searching and screening are laborious but essential processes of SRs as a significant portion of retrieved records often turn out to be irrelevant. AI-powered tools can greatly reduce this workload by automating or semi-automating relevance assessment using machine learning (ML) techniques.
SR software platforms like Nested Knowledge and DistillerSR have integrated search engine functionalities, allowing the automatic import of records from databases via respective APIs and efficiently filtering duplicates. However, not all the database APIs are embedded in these platforms, and delays in API updates can affect accuracy.
Several software tools support titles and abstract screening, with Abstrackr and Rayyan being the most validated [5, 6]. Abstrackr helps save time with minimal risk of missing relevant records by assisting one or two reviewers [7]. Rayyan, a widely used web-based software platform, integrates AI tools to facilitate the screening process. Table 1 lists commonly used AI-enabled software and platforms developed or enhanced in the past 5 years.
The performance of these tools depends on the effectiveness of the underlying ML paradigms and algorithms, the quality and quantity of data used for training, and the degree of human involvement [9]. Studies have demonstrated that well-developed AI tools, though not yet fully automated, can match or even surpass human reviewers in screening efficiency, accuracy, and thus accelerating the SR process. Hence, AI application in this process is among the most developed and explored. However, the effectiveness of AI in screening largely depends on factors such as the quality of training data and the extent of human oversight or verification.
AI for data extraction
Data extraction is a manual, error-prone task that AI tools are increasingly automating using natural language processing (NLP) and ML algorithms. Tools like RobotReviewer have high accuracy in extracting relevant details from randomised control trials (RCTs) [10]. Nested Knowledge and Abstrackr (Table 1) highlight relevant sections for data extraction, while SciSpace and Elicit.org offer intuitive interfaces to automatically extract data from multiple papers and present in tables. SciSpace also provides citations for extracted data and enables PDF interactions via Copilot for validation. SciSpace’s GPT can extract data by interacting with individual uploaded PDFs through specific prompts. Although not formally validated, these tools show promise in improving efficiency and saving time, though variability in content identification may affect reproducibility and accuracy.
Despite their advancements, AI tools continue to face challenges with nuanced interpretations and poorly reported data. Their accuracy and reliability depend on factors such as algorithms, the quality and diversity of training data, the complexity of source documents, and the precision of prompts. While AI reduces data extraction time, it still requires cautious use and human oversight to ensure accuracy.
AI for quality assessment
AI tools can streamline quality assessment by applying specific criteria to studies. For instance, RobotReviewer uses NLP to identify relevant text and assess the risk of bias in RCTs, achieving an accuracy rate of 71–78.3% [11], though it is less precise than Cochrane reviews. An AI extension for the Prediction model Risk Of Bias Assessment Tool (PROBAST) is under development to evaluate bias in clinical prediction model studies [12]. Additionally, SciSpace’s GPT can provide quality assessment for uploaded papers with justifications when properly prompted. While promising, these tools still lack the precision of manual assessments by experienced reviewers, highlighting the need for further development and validation.
AI for synthesis and summarisation
Traditional ML and NLP tools are not capable of fully automating data synthesis [10]. However, advanced AI tools offer valuable assistance in aggregating information from large volumes of research evidence. Platforms like Nested Knowledge and DistillerSR (Table 1) automatically generate and update PRISMA diagrams and synthesis tables, while ChatGPT and SciSpace’s GPT can assist with academic writing. While AI tools cannot fully automate this process and the generated summaries must be cross-checked by humans, it can significantly aid in organising and summarising research findings, accelerating the preparation of SR reports.
Challenges and future directions
AI tools have shown great potential in streamlining various SR processes, yet challenges persist. The risk of AI-induced biases, particularly in sensitive fields like healthcare, raises concerns. To address this, robust evaluation frameworks and standards are needed to ensure these tools produce reliable and unbiased results. Transparency in AI algorithms is also crucial for fostering trust and reproducibility.
The advancement of large language model (LLM) technology, exemplified by models like ChatGPT, offers new avenues for automating and enhancing SRs by improving the understanding and processing of complex language data. APIs enable LLMs to be integrated into code for efficient, large-scale data processing. A ChatGPT v4.0 API script achieved 96% screening specificity and 93% sensitivity when prompted with inclusion and exclusion criteria [13], showcasing the capability of base models without fine-tuning. Consecutive scripts using LLM APIs could automate SRs from screening to report writing. However, errors compound across steps, with a 5% error per step resulting in an overall accuracy of about 81.5%. This highlights the need to minimise errors at each stage and consider human-in-the-loop mechanisms for complex tasks like SRs.
While AI tools, particularly LLMs, offer immense potential to streamline various SR processes, the role of AI must be carefully defined to avoid misuse. Clear guidelines for AI integration are essential to prevent AI-generated content from slipping through peer-review processes. AI tools should augment, not replace, the meticulous work of researchers. For example, LLMs can assist in generating initial drafts, but human experts must review, refine, and validate these outputs to ensure scientific rigour and accuracy. Additionally, integrating checks such as provenance tracking and flagging AI-generated sections can enhance accountability. Furthermore, AI-generated summaries or syntheses should always be cross-checked against source data by experienced reviewers to prevent the propagation of errors. Embedding human oversight at each stage ensures that the final SR remains credible and unbiased.
Integrating critically appraised LLM-based AI tools with existing SR software could further optimise the process, although creating effective and efficient prompts remains a challenge. Collaboration between AI developers, SR methodologists, and healthcare experts is essential to refine these tools to maximise their potential.
Conclusion
AI holds transformative potential to enhance SRs by increasing efficiency and quality, but human judgment remains essential to ensure reliability. Defining the complementary roles of AI and human reviewers will help maintain SR integrity while leveraging AI’s efficiencies. Continued development and validation of AI tools are vital to fully realise their benefits. By integrating AI judiciously with ongoing human oversight, researchers can conduct more efficient, accurate, and comprehensive SRs, advancing evidence-based healthcare.
Data availability
Not applicable.
Abbreviations
- AI:
-
Artificial intelligence
- API:
-
Application programming interface
- GPT:
-
Generative pre-trained transformer
- LLM:
-
Large language model
- ML:
-
Machine learning
- NLP:
-
Natural language processing
- RCT:
-
Randomised control trials
- SR:
-
Systematic review
References
Cohen AM, Hersh WR, Peterson K, Yen P-Y. Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc. 2006;13:206–19.
Fabiano N, et al. How to optimize the systematic review process using AI tools. JCPP Advances. 2024;4: e12234.
Scells H, Zuccon G, Koopman B, Clark J, et al. A computational approach for objectively derived systematic review search strategies. In: Jose JM, et al., editors. Advances in Information Retrieval. Cham: Springer International Publishing; 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-3-030-45439-5_26.
Scells H, Zuccon G. searchrefiner: a query visualisation and understanding tool for systematic reviews. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York: Association for Computing Machinery; 2018. p. 1939–42. https://doiorg.publicaciones.saludcastillayleon.es/10.1145/3269206.3269215.
Harrison H, Griffin SJ, Kuhn I, Usher-Smith JA. Software tools to support title and abstract screening for systematic reviews in healthcare: an evaluation. BMC Med Res Methodol. 2020;20:7.
Hamel C, et al. An evaluation of DistillerSR’s machine learning-based prioritization tool for title/abstract screening–impact on reviewer-relevant outcomes. BMC Med Res Methodol. 2020;20:256.
Gates A, et al. The semi-automation of title and abstract screening: a retrospective exploration of ways to leverage Abstrackr’s relevance predictions in systematic and rapid reviews. BMC Med Res Methodol. 2020;20:139.
van de Schoot R, et al. An open source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell. 2021;3:125–33.
O’Connor AM, et al. A question of trust: can we build an evidence base to gain trust in systematic review automation technologies? Syst Rev. 2019;8:143.
Marshall IJ, Wallace BC. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst Rev. 2019;8:163.
Marshall IJ, Kuiper J, Wallace BC. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J Am Med Inform Assoc. 2016;23:193–201.
Collins GS, et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021;11: e048008.
Li M, Sun J, Tan X. Evaluating the effectiveness of large language models in abstract screening: a comparative analysis. Syst Rev. 2024;13:219.
Acknowledgements
Not applicable.
Funding
No funding was received for the study.
Author information
Authors and Affiliations
Contributions
LG and JADCM conceptualised the study. LG prepared Table 1, tested out the tools, and drafted and revised the manuscript. RA and MS substantially revised the manuscript with valuable inputs. PK, KLT, and CWY provided comments on AI and ML techniques. JADCM and JAA supervised the work. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ge, L., Agrawal, R., Singer, M. et al. Leveraging artificial intelligence to enhance systematic reviews in health research: advanced tools and challenges. Syst Rev 13, 269 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13643-024-02682-2
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13643-024-02682-2