Skip to main content

Breast thermography: a systematic review and meta-analysis

Abstract

Background

Breast thermography originated in the 1950s but was later abandoned due to the contradictory results obtained in the following decades. However, advances in infrared technology and image processing algorithms in the twenty-first century led to a renewed interest in thermography. This work aims to provide an updated and objective picture of the recent scientific evidence on its effectiveness, both as a screening and as a diagnostic tool.

Methods

We searched for clinical studies published between 2001 and May 31, 2023, in the databases PubMed and Scopus, that aimed to evaluate the effectiveness of digital, long-wave infrared imaging for detecting breast cancer. Additional documents were retrieved from the studies included in the systematic reviews that resulted from the search and by searching for the names of commercial systems. We limited our selection to studies that reported the sensitivity and specificity of breast thermography (or the data needed to calculate them) using images collected by themselves, with at least five breast cancer cases. Studies that considered breast diseases other than cancer to be positive or that did not use standard tests to set the ground truth diagnosis were excluded, as well as articles written in a language other than English and documents we could not access. We also conducted meta-analyses of proportions of the sensitivity and specificity values reported in the selected studies and a bivariate meta-analysis to account for the correlation between these metrics.

Results

Our systematic search resulted in 22 studies, with an average pooled sensitivity and specificity of 88.5% and 71.8%, respectively. However, the differences in patient recruitment, sample size, imaging protocol, equipment, and interpretation criteria yielded a high heterogeneity measure (79.3% and 99.1% \({I}^{2}\) value, respectively).

Conclusions

Overall, thermography showed a high sensitivity in the selected studies, whereas specificity started off lower and increased over time. The most recent studies reported a combination of sensitivity and specificity comparable to standard diagnostic tests. Most of the selected studies were small and tend to include only patients with a suspicious mass that requires biopsy. However, larger studies with a wider variety of patient types (asymptomatic, women with dense breasts, etc.) have been published in the latest years.

Peer Review reports

Introduction

Breast cancer is the most common cancer and the leading cause of cancer deaths in women, accounting for nearly 700,000 fatalities in 2022 [1]. Although the incidence of breast cancer has increased in recent decades, mortality rates remain low and stable in high-income countries due to high adherence to screening programs. Mammography is the standard screening modality, with a reported sensitivity of approximately 90% [2]. However, its high cost, patient discomfort, and concerns about radiation-induced breast cancer have motivated the search for alternative screening techniques. In addition, its sensitivity is lower for dense breasts, which are common in young women. This is critical because high breast density increases the risk of developing breast cancer by up to 4 to 6 times [3]. By contrast, ultrasonography is low-cost and safe, but less sensitive and more operator-dependent. It is most commonly used to distinguish solid lesions from cysts. Magnetic resonance imaging (MRI) is highly sensitive but slow, sometimes invasive (due to the intravenous administration of contrast agents), operator-dependent, and its economic cost is even higher than that of mammography.

Thermography is an imaging technique that measures skin temperature without exposing the patient to ionizing radiation. The increase in metabolism and blood flow to cancer cells causes a local rise in temperature that may be detected by cameras that capture the radiation emitted spontaneously by the human body [4]. Modern thermal cameras are inexpensive, fast, and portable, making thermography a candidate imaging modality for breast cancer screening as an adjunct to mammography. Because it does not use ionizing radiation and its sensitivity does not depend on breast density, it may be particularly relevant for imaging young women and women with dense breasts. Given its relatively low cost and ease of use, it could even serve as the primary screening technique or as an adjunct to ultrasound in low-income countries where mammographic screening is not affordable. Another potential use of thermography is to avoid the biopsy of suspicious lesions that turn out to be benign, which currently account for about 80% of all biopsies performed; in the case of a doubtful mammogram, a normal thermogram could be indicative of non-malignancy.

Despite the promising results reported in the 1950s–1970s suggesting the potential of thermography as a screening tool for the early detection of breast cancer [5,6,7,8,9,10,11,12,13,14,15,16], critics soon emerged with completely opposite results, claiming that thermography added no value to the existing screening techniques [17,18,19,20,21]. As a result, interest shifted to mammography alone [22]. The FDA approved thermography for breast cancer screening in 1982, but only as an adjunct to mammography [4].

With the development of faster, digital, and high-resolution new-generation thermal cameras in the late 1990s and the boom in artificial intelligence (AI), interest in breast thermography has revived in the twenty-first century. Numerous articles have been published proposing advanced image-processing algorithms for thermography. The generation of a public database containing breast thermograms and medical records of both sick and healthy patients contributed to an exponential rise in the number of scientific publications. This database, namely the Database for Mastology Research (DMR), was developed by the Federal Fluminense University and the Federal University of Pernambuco (UFPE) in Brazil [23]. It currently contains 293 patients, of which 185 are labeled as “healthy,” 104 as “sick,” and the remaining 4 have an unknown diagnosis.

Numerous surveys on breast thermography have been published in recent years, most of them aimed at reviewing different computer methods to process and classify thermograms. However, most of these surveys did not conduct a systematic search or follow a structured approach that guarantees objectivity in their conclusions [4, 22, 24,25,26,27,28,29,30,31,32,33,34]. To date, a few systematic reviews have been published that, overall, conclude that there is insufficient scientific evidence to support the use of thermography, either as a screening tool or as a diagnostic test for patients with breast symptoms, clinical findings, or abnormalities in standard imaging modalities [35,36,37,38,39,40,41]. Some of these systematic reviews were commissioned by national health authorities in Australia [40], New Zealand [35, 41], or Malaysia [39]. Kerr et al. [35] and Beresfold et al. [40] acknowledged the need for large prospective studies and randomized control trials. Beresfold et al. [40] and Ammer et al. [42] also pointed out that most of the current research focuses on different image processing algorithms rather than on the actual clinical use of thermography for breast cancer detection. The most recent systematic reviews limited their search to papers using machine learning strategies [43, 44]. The purpose of our paper is to provide an updated overview of the clinical evidence on breast thermography, either as a screening or as a diagnostic tool. To achieve this, we conducted a systematic review of clinical studies published in the last decades. In contrast to the most recent systematic reviews, which aimed to survey the different algorithms for processing thermal images, our work focuses on the clinical evaluation of breast thermography. Our search identified new, large, high-quality studies that were published after, and therefore not included in, the most recent clinically focused systematic reviews. We also performed a meta-analysis of the selected studies to estimate the pooled sensitivity and specificity of breast thermography.

Materials and methods

Study design

We carefully designed a protocol according to the guidelines outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist (see Additional file 1). This protocol defined the research question, eligibility criteria, information sources, and search terms to be used, as well as the methods for the posterior synthesis of the sensitivity and specificity results reported in the selected studies.

Search strategy

We searched the PubMed and Web of Science databases for studies published in the twenty-first century (between 2001 and May 31, 2023) to account for the improved high-resolution thermal cameras developed in the late 1990s. We used the keywords breast and mammary along with cancer-related terms (cancer, carcinoma, malignan*, neoplasm*) and various ways to denote thermography (thermograph*, thermogram*, thermolog*, infrared imag*, infra-red imag*, thermal imag*), combined with the Boolean operators AND and OR (see Additional file 2 for the complete search queries). Additional articles were identified by searching for the names of commercial systems and by examining the references included in the retrieved systematic reviews.

Study selection

Articles were included in the analysis if they met the following criteria:

Topic and technology

The focus of our review is the clinical evaluation of long-wave, digital infrared breast thermography [45]. Studies on a different topic or disease were excluded, as well as those using a different thermography method, including the following:

  • Imagers operating in a spectral range other than the long-wave infrared band (i.e., 8–14 µm), such as the near-infrared or the microwave bands, because infrared emissions from human skin peak at about 10 µm at normal room temperature [45];

  • Contact thermography, in which the surface temperature is measured by placing either a heat-sensitive liquid crystal-coated film (liquid crystal thermography, LCT) or multiple temperature sensors directly on the skin;

  • Active thermography, in which an external agent is used to enhance the contrast between the target and the surrounding tissue, either by exciting the tissue with antennas or by administering fluorescent dyes that bind to cancer cells and emit radiation when excited with lasers. We did include, however, studies that applied cold stress by means of cool air or another cooling method to induce a vascular response and thus identify unresponsive blood vessels affected by malignancy. This type of imaging protocol is known as dynamic, as opposed to static, where images are acquired after a period of acclimation to reach thermal equilibrium with the room temperature.

Application of breast thermography

We included only studies that evaluated the diagnostic ability of breast thermography, either for screening asymptomatic patients or for diagnosing patients with symptoms or with abnormal findings on a previous imaging technique, i.e., mammography and/or ultrasonography. Consequently, we excluded publications that studied thermography for other breast cancer applications, such as monitoring ongoing treatment, estimating the prognosis of a malignant lesion, guiding hyperthermia treatment, or predicting treatment-related complications, such as skin toxicity after radiation therapy. We also excluded studies that focused on other aspects of thermal image processing, including segmentation, blood vessel detection, feature extraction, or lesion localization.

Document type

Because our goal is to numerically review the effectiveness of infrared thermography in detecting malignant lesions, we were interested in clinical studies. Other types of documents were excluded because they lacked numerical results (i.e., narrative overviews and opinion articles, such as comments or letters to the editor), a detailed description of the methodology (i.e., conference abstracts, summaries, and posters), or objectivity (i.e., literature review articles). Articles that described a device, an imaging protocol, or an algorithm without including any experimentation to evaluate it quantitatively were also excluded.

Population

We included studies with patients attending either routine screening or follow-up tests, with no restrictions on age, sex, or breast density. Animal studies and articles that used phantoms or computer simulations were excluded, as well as studies that used images obtained from an external database instead of collecting their own data. This also includes studies that mixed their own data with those from external sources. Studies with a setting other than screening or diagnosis were also excluded, such as those that examined patients with a known diagnosis. In addition, for the sake of significance, we considered only studies that included at least five cancer cases.

Evaluation metrics

We required that studies report data on sensitivity (defined as the number of true positives divided by the sum of true positives and false negatives) and specificity (defined as the number of true negatives divided by the sum of true negatives and false positives), or at least provide the information necessary to calculate them. We excluded studies that considered non-malignant tumors as positives, as well as articles that did not use standard tests to establish the ground truth diagnosis, i.e., cancer must be confirmed by biopsy, whereas healthy and benign cases should be diagnosed with at least one standard imaging test, either mammography or ultrasound.

Screening and data extraction

One reviewer screened article titles and abstracts applying the inclusion and exclusion criteria. The full texts of the remaining articles were retrieved and read for eligibility screening. Documents that did not provide enough information to determine the recruitment process, the reference tests, or the criterion for positivity (i.e., using ambiguous labels like abnormal, sick, or unhealthy without describing them) were excluded. Any doubts were resolved by discussion with the other authors.

Data extracted from each article included the topic or technology (thermography or other), the body part or disease (breast cancer or other), the cancer-related task (diagnosis, screening, or other), the type of article (clinical study or other), the goal of image processing (classification or other), the subject type (human or other), the source of the data (internal or external), patient recruitment (e.g., asymptomatic women undergoing routine screening, patients with a palpable lump, patients scheduled for breast biopsy due to suspicious mammographic findings, etc.), what is considered to be positive (only biopsy-proven cancer or other breast diseases too), the reference tests for both positive and negative cases, the sample size, the number of positive cases, the size of the subset used for evaluation (i.e., the test set if machine learning algorithms were used), the imaging protocol (static or dynamic), the technical characteristics of the thermal camera (spectral range and thermal and spatial resolution), the funding source, and the evaluation metrics (sensitivity and specificity).

To analyze trends in sensitivity and specificity over time, we plotted publication years against the corresponding sensitivity and specificity values reported in the selected studies. Linear regression models were fitted to each dataset to evaluate the relationships between the publication year and the respective metrics. The equations of the fitted regression lines, along with the p values for the slopes, were extracted to assess the statistical significance of any observed trends. A positive or negative trend is considered statistically significant if the p value of the slope is less than 0.05.

Risk of bias in individual studies

The methodological quality of the selected studies was assessed independently by the same reviewer who screened the documents using the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies) tool [46]. It covers four key domains, each focusing on different aspects of study quality: (1) patient selection, which assesses whether the patients enrolled in the study were representative of those who would typically receive the diagnostic test and whether the selection process avoided bias; (2) index test, which evaluates whether the diagnostic test under investigation was performed and interpreted in a manner that avoided bias and maintained applicability to the review question; (3) reference standard, which examines the reliability and applicability of the reference standard used to verify the results of the index test; and (4) flow and timing, which assesses the time intervals between the index test and the reference standard and the patient flow through the study to ensure consistency and reduce bias. The tool helps reviewers systematically identify potential biases, determine the relevance of study findings to clinical practice, and guide evidence synthesis.

Meta-analysis

For each study that met the eligibility criteria, we computed the confusion matrix with the number of true positive (TP), false negative (FN), true negative (TN), and false positive (FP) cases. If a study compared the results obtained with different algorithms or interpretation criteria, we selected the one with the highest F-score, which is defined as follows:

$$F= 2\times \frac{\text{Precision}\times \text{Recall}}{\text{Precision}+\text{Recall}} ,$$

where

$$\text{Precision}=\frac{\text{TP}}{\text{TP}+\text{FP}},$$

and

$$\text{Recall}=\frac{\text{TP}}{\text{TP}+\text{FN}} .$$

The F-score ranges from 0 to 1, with values closer to 1 reflecting a better balance between the correct detection of positive instances (sensitivity or recall) and the accuracy of positive predictions (precision or positive predictive value, PPV). We chose to use the F-score instead of other metrics that combine sensitivity and specificity because of the high class imbalance present in the studies’ datasets and the clinical implications of missing positive cases (i.e., cancer patients). In the context of cancer detection, the primary goal of a diagnostic test is to identify as many true positive cases as possible (high recall) while ensuring that false positives remain at a manageable level (high precision). The F-score effectively balances these two factors.

In machine learning studies, the data is split into a training set, used to build the model, and a test set, to evaluate it. In the meta-analysis, we selected the results obtained on the test set because they measure the expected effectiveness in clinical practice.

We conducted two meta-analyses of proportions, one for sensitivity and one for specificity, to statistically combine the results of the selected studies [47]. For this purpose, we compared two methods: inverse-variance and generalized linear mixed models (GLMMs). The inverse-variance method uses the inverse of the variance of each study’s effect size (sensitivity and specificity) as weights, giving more weight to studies with more precise (less variable) estimates. GLMMs extend generalized linear models by incorporating both fixed effects and random effects, allowing for the analysis of data with complex, hierarchical structures, and accounting for variability at multiple levels. Because proportions can be skewed, especially when they are close to 0 or 1, transformations are often applied to stabilize variances and normalize the data before pooling [48]. We compared the following combinations of transformations and meta-analysis methods:

  • Inverse-variance with no transformation

  • Inverse-variance with logit transformation

  • Inverse-variance with arcsine transformation

  • GLMMs with logit transformation.

Meta‑bias assessment

Differences between the studies in terms of methodological factors, such as the purpose of the examination, the imaging protocol followed, or the thermal sensitivity of the infrared camera, may lead to differences in the results obtained. Heterogeneity, i.e., the variability between studies, was measured with the \({I}^{2}\) statistic [49], defined as follows:

$${I}^{2}=\frac{Q-(k-1)}{Q}\times 100\% ,$$

where \(Q\) is the chi-squared statistic of the reported proportions and \(k\) is the number of studies in the meta-analysis, so there are \(k-1\) degrees of freedom. \({I}^{2}\) describes the percentage of variability in the estimation of sensitivity or specificity that is due to differences between studies rather than to sampling errors (chance). Heterogeneity was considered statistically significant when p value < 0.05 and/or \({I}^{2}\) > 50%.

We performed a subgroup analysis stratifying by imaging protocol (static or dynamic) and type of interpretation to explore possible sources of heterogeneity (see Additional file 3: Tables S2 and S3). However, the results were so similar to the general analysis that we decided not to include these results in the article. We also attempted to perform a cumulative analysis, given the recent improvement in the resolution of thermal cameras, but unfortunately, this did not yield the results we had hoped for.

Publication bias was assessed by visual inspection of funnel plots [50]. In the absence of publication bias, the plot should resemble a symmetrical inverted funnel. By contrast, asymmetry in the funnel plot may indicate the presence of publication bias, where smaller studies with nonsignificant or negative results are less likely to be published.

The statistical computations and visualization of results were performed using the “meta” package in the R programming language, a user-friendly tool for meta-analysis [51].

Meta-analysis of diagnostic test accuracy

Although not initially considered in the pre-defined protocol, we included a bivariate analysis of sensitivity and specificity to consider the correlation between them. For this analysis, we used MetaDTA (Meta-analysis of Diagnostic Test Accuracy), an online tool specifically designed for the meta-analysis of diagnostic test accuracy studies [52]. MetaDTA employs advanced statistical methods tailored for bivariate data. Among other features, MetaDTA produces summary receiver operating characteristic (SROC) plots that graphically summarize the diagnostic accuracy across multiple studies by plotting the sensitivity against 1-specificity, or the false positive rate (FPR), of each study. The resulting curve represents the overall trade-off between sensitivity and specificity.

Results

Literature search results

The literature search yielded 1552 documents; 13 additional articles were identified by searching for commercial thermal systems, and another 50 were obtained from the 7 systematic reviews retrieved, as shown in Fig. 1. After removing 406 duplicates, only 26 studies met our eligibility criteria, but 4 were excluded for reporting results obtained from a dataset that had already been used in previous studies.

Fig. 1
figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram of the literature search process

We excluded 5 additional publications written in a language other than English and 17 whose full text we could not access because we could not determine whether they met our inclusion criteria just from their titles and abstracts. Table 1 summarizes the 22 selected studies.

Table 1 Studies retrieved from a systematic review of literature on breast thermography published in the twenty-first century

Figure 2 shows that overall, thermography has a high sensitivity but also a high false positive rate (low specificity): Twenty out of the 22 included articles reported a sensitivity above 75%, but only 8 of them had a corresponding specificity over 75%. Figure 2 also shows that sensitivity has remained high over the years (with the exception of the study by Kontos et al. [59]), whereas specificity demonstrates a statistically significant upward trend, indicating that diagnostic accuracy in identifying true negatives has improved significantly in recent years. It is worth noting that studies from the early 2000s included patients undergoing biopsy for suspicion of breast cancer, and thermography was studied as a complementary tool to identify benign lesions and avoid performing unnecessary biopsies. Parisky et al., for instance, reported a sensitivity of 97% with a specificity of 14%, meaning that 14% of biopsies of benign lesions could have been avoided at the cost of missing 3% of cancers [53]. More recently, larger studies with a wider variety of patient types (asymptomatic patients and symptomatic patients who did not necessarily require a biopsy) were conducted, reporting both sensitivity and specificity values over 70%.

Fig. 2
figure 2

Sensitivity (blue) and specificity (green) reported in the 22 selected studies. Regression lines, computed with the least squares method, are displayed along with the corresponding equations and P values of the slopes, to visualize the trends of sensitivity and specificity over time

Quality of the selected studies

The overall risk of bias in the four domains was low. This is because we applied exclusion criteria that ensured that only high-quality studies were selected, such as not using standard reference tests to determine the ground truth diagnosis or including subjects with a known diagnosis. It is worth noting that most of the studies did not enroll consecutive patients or did not explicitly disclose this detail, which can potentially introduce bias in patient selection. In addition, some studies may have introduced some bias in the interpretation of the thermograms by not using a pre-defined threshold to classify them; instead, some authors determined their cutoff values based on the statistical comparison of the average values of different features of cancer and non-cancer subjects. The results presented in some studies using machine learning to classify thermograms may also be dataset-specific due to the small sample sizes and, therefore, may not generalize well. This is not the case in studies evaluating commercial systems, as the image-processing software was developed in advance. Studies that included patients who did not necessarily require biopsy, such as asymptomatic women or patients with a palpable lump that required further testing, used the result of mammography and/or ultrasound as the reference test for negative cases. Although a biopsy is the definitive method to confirm or rule out cancer, it is invasive and, therefore, unethical to perform it based solely on suspicion from an unrecognized technique like thermography. Thus, cases with abnormal thermography but negative mammography and/or ultrasound are considered false positives, even though this may not always be accurate. This issue could be solved by follow-up over time to ensure that the patient does not develop cancer in the subsequent months or years, or by using another highly sensitive and non-invasive test like MRI, as done by Hellgren et al. [68]. However, MRI is very costly, both economically and in terms of time. Consequently, it can be considered that all these studies, except the one by Hellgren et al., have an unavoidable bias regarding the reference test used for negative cases. Finally, only a handful of studies explicitly disclosed the time interval between the performance of thermography and the reference test. This is likely due to the varying diagnostic pathways based on initial test results (e.g., an abnormal mammogram requiring additional tests like ultrasound, MRI, and/or biopsy), making it impossible to establish a fixed time interval for all patients.

Meta-analysis results

In both sensitivity and specificity meta-analyses, the GLMM method with logit transformation yielded the lowest heterogeneity value for the selected studies, closely followed by the approach using the inverse of the variance and the logit transformation. Only the results examined with the first method are discussed in detail next; however, the results from all approaches are provided in Table S1 of Additional file 3.

Analysis of sensitivity

The pooled sensitivity of the 22 selected studies is 88.67%, with a 95% confidence interval of (83.03%, 92.61%). However, the \({I}^{2}\) value of 79.3% indicates a very high heterogeneity. This is also clearly illustrated in the forest plot shown in Fig. 3, with the study by Kontos et al. [59] standing out considerably from the pooled sensitivity value.

Fig. 3
figure 3

Forest plot of the meta-analysis of sensitivity results across selected studies

Although there are no specific heterogeneity tests for meta-analyses of proportion studies, Barker et al. proposed using classical techniques like \({I}^{2}\), but keep in mind that some heterogeneity is expected due to the small variance of proportional data [49]. Thus, high heterogeneity in the analysis of proportional data like sensitivity and specificity does not necessarily imply that the results are inconsistent.

Although the use of funnel plots to show publication bias in meta-analyses of proportions is not advised [75], in Fig. 4, we can observe some asymmetry between the very small studies (therefore with higher standard error, located at the bottom of the graph) and the larger studies (therefore with lower standard error, located at the top of the graph). However, because the number of small studies is very low, we cannot ensure that the asymmetry in the distribution of studies is due to statistical bias.

Fig. 4
figure 4

Funnel plot of the meta-analysis of sensitivity results across selected studies

Analysis of specificity

The pooled specificity is 71.77% (see Fig. 5), with a 95% confidence interval of 61.41% and 80.24%. The tests indicate an even higher heterogeneity than for sensitivity, with an \({I}^{2}\) value of 99.1%. In this case, the funnel plot in Fig. 6, with several studies falling outside the funnel, also suggests higher publication bias.

Fig. 5
figure 5

Forest plot of the meta-analysis of specificity results across selected studies

Fig. 6
figure 6

Funnel plot of the meta-analysis of specificity results across selected studies

Meta-analysis of diagnostic test accuracy

Figure 7 shows the SROC plot derived from our bivariate meta-analysis, where the sensitivity of each study is plotted against its FPR (1—specificity). The blue square indicates a pooled sensitivity of 88.4% and an FPR of 28.2% (corresponding to a specificity of 71.8%), very close to the values estimated in the previous meta-analyses of proportions. These estimates suggest that thermography performs reasonably well in correctly identifying breast cancer while maintaining moderate specificity. The dispersion of individual data points suggests a high variability in sensitivity and specificity across studies. The SROC curve, which depicts the estimated correlation between sensitivity and specificity, bends towards the upper left corner, indicative of reasonable discriminatory ability. However, its non-steep slope at the pooled estimate also reflects the high variability across studies, which is more pronounced for specificity. The 95% confidence region, shown as a dashed blue line around the pooled metrics, appears relatively narrow. This indicates some uncertainty in estimating the true pooled sensitivity and specificity, with the latter showing higher uncertainty. By contrast, the 95% predictive region (represented by a dotted blue line), which forecasts where 95% of new studies might fall, is much wider. This underscores the variability observed among the studies included in the meta-analysis.

There is a clear outlier in Figs. 3, 5, and 7, the study by Kontos et al. [59], which had a high specificity (low FPR) but a sensitivity of 25%. Figure 8 shows the SROC plot after removing this outlier. The estimated pooled sensitivity increased from 88.4% to 89.5%, whereas the pooled specificity decreased from 71.8% to 71.0%. The lower limit of the 95% confidence region for sensitivity increased from 83.0% to 86.2%, whereas the top remained very similar (92.2% vs 92.1%); by contrast, the 95% confidence region for specificity widened, from 61.5% and 80.2% to 60.3% and 79.8%. The updated SROC curve is closer to the upper left corner and lies within the 78% to 98% sensitivity range, but is almost diagonal. This suggests that when the outlier is excluded, thermography demonstrates consistently high sensitivity but continues to exhibit high variability in specificity. An almost diagonal SROC curve indicates a balanced trade-off between sensitivity and specificity, meaning that an increase in sensitivity is expected to be accompanied by an increase in specificity. The new 95% predictive region is much narrower vertically, indicating reduced variability in sensitivity, but remains relatively unchanged horizontally, reflecting persistent variability in specificity. In other words, although different study settings and populations can lead to varying specificity results, thermography consistently maintains high sensitivity.

Fig. 7
figure 7

Summary receiver operating characteristic (SROC) plot showing the sensitivity and 1 - specificity of each selected study (white dots), pooled sensitivity and 1 - specificity estimate (blue square), 95% confidence region (dashed blue line), 95% predictive region (dotted blue line), and SROC curve (black line)

Fig. 8
figure 8

Summary receiver operating characteristic (SROC) plot after removing an outlier

Discussion

Infrared technology has improved greatly since it was first studied as an imaging tool for breast cancer detection in the 1950s. Modern thermal cameras are lightweight, fast, high-resolution, and low-cost. These advances, coupled with progress in computerized image processing, have led to a renewed interest in this technique in the twenty-first century.

Our literature search on breast thermography yielded 1209 documents. During screening, we observed that the overall quality of articles on breast thermography was generally poor, which led to only 22 studies meeting our inclusion criteria designed to ensure acceptable quality. Many of the articles examined sought to develop an image-processing algorithm rather than to evaluate breast thermography in a clinical population. Consequently, they often included a detailed description of the mathematical reasoning behind the proposed solution while overlooking important methodological details about data acquisition and patient description. In particular, 101 articles were excluded for using images obtained from an external database instead of collecting their own. In 90 of them, one source was the aforementioned public database, DMR. At least another 24 studies that were excluded for other reasons also used this database. Numerous errors and anomalies have been detected in this database [76, 77], and thus, the results obtained with it must be treated with caution.

The 22 studies that met our inclusion criteria reported sensitivity and specificity values comparable to those of standard imaging techniques like mammography and ultrasonography. The pooled sensitivity and specificity resulting from our meta-analyses of proportions and bivariate meta-analysis ranged between 88.4%–88.67% and 71.77%–71.8%, respectively. In comparison, mammography typically demonstrates higher specificity but potentially lower sensitivity, whereas ultrasound tends to exhibit variable sensitivity and specificity depending on factors like operator skill and equipment. The high variability in the reported results, shown by the long lines or whiskers in the forest plots in Figs. 3 and 5, and the large 95% predictive region in Fig. 7, emphasizes the need for caution when generalizing these findings. This variability may stem from the broad differences in sample size, patient selection, imaging methodology, and thermal cameras used, which are reflected by a heterogeneity close to 100%. The sample size in the selected studies is generally small, with only 13 of them including 100 or more patients. Note that 10 of the 22 selected studies were somehow supported by the company commercializing the imaging system (either financially or by supplying the camera) and/or the analysis algorithm used. The results derived from them should be approached with caution, as it introduces a potential source of bias. On the other hand, studies carried out to validate a commercial product are typically conducted with greater rigor and more resources, which would explain why half of the studies that provided a clear description of their methodology were funded by the respective manufacturers, including 6 out of the 10 largest studies.

Other sources of heterogeneity include differences in patient selection and in the methodology and technology used. Although several imaging protocols have been proposed, there are currently no standard guidelines. Generally speaking, the patient is asked to undress from the waist up and, after a 10–15 min acclimation period for the body to reach thermal equilibrium with the room temperature, 3 to 5 images of the breasts are taken from different angles. If, instead, a dynamic protocol is followed, cool air or some other cooling method is used to induce a vasoconstrictive response, and a sequence of images is taken for a few minutes to study the recovery. Sometimes the patient is asked to avoid the application of ointments and activities that can affect body temperature before the exam, such as physical exercise, smoking, caffeine and alcohol intake, or bathing. Some even schedule the exam within specific days of the menstrual cycle to ensure the most thermally stable conditions. Although some researchers followed a strict protocol that tried to avoid all the possible sources of thermal instability, others were more relaxed. There are also important differences in the technological setup used to acquire images: although, in general, the patient stood or sat at a certain distance from the camera, in a few studies, the patient lay prone with the breasts suspended through openings in the imaging bed [53, 63]. Another system consisted of two cameras placed at an angle so that a three-dimensional view of the patient is acquired without her having to move [68], whereas another solution employed a chair equipped with two lateral mirrors to obtain multiple views in a single image [56]. The thermal sensitivity of the cameras in the studies also varied widely, from 0.02 to 0.1 K, but only 10 of them disclosed this information. Given that commercial systems available today offer a thermal sensitivity of a few tens of mK, a camera with a sensitivity of 0.1 K can now be considered relatively poor. The criteria used to interpret the thermograms also differed considerably, from simple visual inspection looking for areas of increased temperature and asymmetry to sophisticated deep learning algorithms that combine information from multiple views and clinical information. In a few studies, the interpretation of thermograms required prior manual delineation of the suspicious lesion by using other imaging techniques. Such approach has been strongly criticized for thermography not being blinded to the results of other modalities [78], but because thermography was intended to be used as an adjunct to other standard tests and not as a stand-alone modality, this would not be an issue.

The bivariate meta-analysis was repeated after identifying an outlier. This study reported a sensitivity of only 25% with 63 symptomatic patients, considering each breast as a distinct case [59]. The criteria used to read the thermograms were clearly flawed: instead of using sophisticated algorithms based on exact temperature measurements, they based their diagnosis on color differences between adjacent areas within a breast and between breasts, with only 16 colors to represent the wide range of temperatures in the torso. Removing this outlier resulted in minor changes in the pooled sensitivity and specificity. This suggests that the overall performance of thermography, in terms of sensitivity and specificity, remains relatively stable even after adjusting for this extreme case. However, the 95% predictive region for sensitivity narrowed significantly, now ranging from 65 to 98%, and the specificity remained highly variable. Although thermography shows promise as a diagnostic tool due to its consistently high sensitivity, its moderate and highly variable specificity highlights the need for careful consideration in clinical settings. The test may be better suited for initial screening purposes, where detecting as many true cases as possible is critical, but further confirmatory testing might be necessary to improve the accuracy of identifying true negatives.

The main limitation of our work is that a single individual screened through the documents for eligibility, and only in doubtful cases did the other reviewers intervene. This approach may have led to unintentional omissions or the inclusion of studies that did not strictly meet the eligibility criteria, thereby potentially affecting the robustness and validity of our findings. In standard systematic reviews, it is recommended that at least two independent reviewers assess all documents for eligibility to ensure a more objective and comprehensive evaluation. Discrepancies between reviewers can then be resolved through discussion or by consulting a third reviewer, which enhances the accuracy and credibility of the inclusion decisions. In addition, searching for only two databases—PubMed and Web of Science, may have resulted in the omission of studies that could meet our inclusion criteria. Although we considered other databases, our institution only had access to those two. However, considering that PubMed is the largest scientific database for medical research and a primary source for clinical studies, it is unlikely that we missed relevant articles, as most clinically significant articles are accessible through PubMed.

Conclusion

Our systematic review identified 22 studies on breast thermography that met our inclusion criteria. The average pooled sensitivity was 88.5%, and specificity showed a clear increasing trend over the years. In particular, some recent studies, not included in previous reviews, showed an increase in sensitivity when this technology was combined with mammography for screening women with dense breasts. This suggests that in high-income countries thermography could be used as an adjunct to mammography, along with other techniques such as ultrasound and MRI. In low-income countries, the combination of ultrasound and thermography seems to be the most effective approach for breast cancer screening.

Our review also found substantial differences between studies in the selection of patients, the protocols for acquiring the images, and the algorithms for processing them. Not surprisingly, our meta-analyses detected this heterogeneity, with an \({I}^{2}\) value of 79.3% for sensitivity and 99.1% for specificity. Therefore, more studies are needed to assess the value of thermography for breast cancer screening and diagnosis, including cost-effectiveness analyses to find the optimal combination of imaging techniques for each woman in each country.

Data availability

Publications to be reviewed were retrieved from two databases, namely PubMed and Web of Science, applying the search queries provided in Additional file 2.

Abbreviations

DMR:

Database for mastology research

GLMMs:

Generalized linear mixed models

MRI:

Magnetic resonance imaging

References

  1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, Jemal A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74:229–63. https://doiorg.publicaciones.saludcastillayleon.es/10.3322/caac.21834.

  2. Pötsch N, Vatteroni G, Clauser P, Helbich TH, Baltzer PAT. Contrast-enhanced mammography versus contrast-enhanced breast MRI: a systematic review and meta-analysis. Radiology. 2022;305:94–103. https://doiorg.publicaciones.saludcastillayleon.es/10.1148/radiol.212530.

    Article  PubMed  Google Scholar 

  3. Melnikow J, Fenton JJ, Whitlock EP, Miglioretti DL, Weyrich MS, Thompson JH, Shah K. Supplemental screening for breast cancer in women with dense breasts: a systematic review for the U.S. preventive services task force. Ann Intern Med. 2016;164:268. https://doiorg.publicaciones.saludcastillayleon.es/10.7326/M15-1789.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Singh D, Singh AK. Role of image thermography in early breast cancer detection - past, present and future. Comput Methods Programs Biomed. 2020;183:105074. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.cmpb.2019.105074.

    Article  PubMed  Google Scholar 

  5. Hoffman RL. Thermography in the detection of breast malignancy. Am J Obstet Gynecol. 1967;98:681–6. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/0002-9378(67)90181-0.

    Article  CAS  PubMed  Google Scholar 

  6. Wallace JD, Dodd GD. Thermography in the diagnosis of breast cancer. Radiology. 1968;91:679–85. https://doiorg.publicaciones.saludcastillayleon.es/10.1148/91.4.679.

    Article  CAS  PubMed  Google Scholar 

  7. Lilienfeld AM, Barnes JM, Barnes RB, Brasfield R, Connell JF, Diamond E, Gershon-Cohen J, Haberman J, Isard HJ, Lane WZ, Lattes R, Miller J, Seaman W, Sherman R. An evaluation of thermography in the detection of breast cancer. A cooperative pilot study, Cancer. 1969;24:1206–11. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/1097-0142(196912)24:6%3c1206::AID-CNCR2820240624%3e3.0.CO;2-V.

    Article  CAS  PubMed  Google Scholar 

  8. Isard HJ, Becker W, Shilo R, Ostrum BJ. Breast thermography after four years and 10,000 studies. Am J Roentgenol. 1972;115:811–21. https://doiorg.publicaciones.saludcastillayleon.es/10.2214/ajr.115.4.811.

    Article  CAS  Google Scholar 

  9. Stark AM, Way S. The screening of well women for the early detection of breast cancer using clinical examination with thermography and mammography. Cancer. 1974;33:1671–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/1097-0142(197406)33:6%3c1671::AID-CNCR2820330630%3e3.0.CO;2-4.

    Article  CAS  PubMed  Google Scholar 

  10. Jones CH, Greening WP, Davey JB, McKinna JA, Greeves VJ. Thermography of the female breast: a five-year study in relation to the detection and prognosis of cancer. Br J Radiol. 1975;48:532–8. https://doiorg.publicaciones.saludcastillayleon.es/10.1259/0007-1285-48-571-532.

    Article  CAS  PubMed  Google Scholar 

  11. Raskin MM, Martinez-Lopez M. Thermographic patterns of the breast: a critical analysis of interpretation. Radiology. 1976;121:553–5. https://doiorg.publicaciones.saludcastillayleon.es/10.1148/121.3.553.

    Article  CAS  PubMed  Google Scholar 

  12. Haberman JD, Love TJ, Francis JE. Screening a rural population for breast cancer using thermography and physical examination techniques: methods and results-a preliminary report. Ann N Y Acad Sci. 1980;335:492–500. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/j.1749-6632.1980.tb50774.x.

    Article  CAS  PubMed  Google Scholar 

  13. Gautherie M, Gros CM. Breast thermography and cancer risk prediction. Cancer. 1980;45:51–6. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/1097-0142(19800101)45:1%3c51::AID-CNCR2820450110%3e3.0.CO;2-L.

    Article  CAS  PubMed  Google Scholar 

  14. Nyirjesy I. Breast thermography. Clin Obstet Gynecol. 1982;25:401–8. https://doiorg.publicaciones.saludcastillayleon.es/10.1097/00003081-198206000-00023.

    Article  CAS  PubMed  Google Scholar 

  15. Amalric R, Giraud D, Thomassin L, Altschuler C, Spitalier JM. Detection of subclinical breast cancers by infrared thermography. In: Ring EFJ, Phillips B, editors. Recent advances in medical thermology. Boston: Springer New York; 1984. p. 575–579. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-1-4684-7697-2_81.

  16. Hobbins WB. Abnormal thermogram. Significance in breast cancer. Interamer J Rad. 1987;12:337–43.

    Google Scholar 

  17. Hitchcock CR. Thermography in mass screening for occult breast cancer. JAMA. 1968;204:419. https://doiorg.publicaciones.saludcastillayleon.es/10.1001/jama.1968.03140190001001.

    Article  CAS  PubMed  Google Scholar 

  18. Furnival IG, Stewart HJ, Weddell JM, Dovey P, Gravelle IH, Evans KT, Forrest APM. Accuracy of screening methods for the diagnosis of breast disease. BMJ. 1970;4:461–3. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmj.4.5733.461.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Nathan BE, Burn JI, MacErlean DP. Value of mammary thermography in differential diagnosis. BMJ. 1972;2:316–7. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmj.2.5809.316.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Egan RL, Goldstein GT, McSweeney MM. Conventional mammography, physical examination, thermography and xeroradiography in the detection of breast cancer. Cancer. 1977;39:1984–92. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/1097-0142(197705)39:5%3c1984::AID-CNCR2820390513%3e3.0.CO;2-Q.

    Article  CAS  PubMed  Google Scholar 

  21. Sterns EE, Curtis AC, Miller S, Hancock JR. Thermography in breast diagnosis. Cancer. 1982;50:323–5. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/1097-0142(19820715)50:2%3c323::AID-CNCR2820500226%3e3.0.CO;2-S.

    Article  CAS  PubMed  Google Scholar 

  22. Lozano A, Hassanipour F. Infrared imaging for breast cancer detection: an objective review of foundational studies and its proper role in breast cancer screening. Infrared Phys Technol. 2019;97:244–57.

    Article  Google Scholar 

  23. Silva LF, Saade DCM, Sequeiros GO, Silva AC, Paiva AC, et al. A new database for mastology research with infrared image. J Med Imaging Health Inform. 2014;4:92–100.

    Article  Google Scholar 

  24. Faust O, Acharya UR, Ng EYK, Hong TJ, Yu W. Application of infrared thermography in computer aided diagnosis. Infrared Phys Technol. 2014;66:160–75.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Sathish D, Kamath S, Rajagopal KV, Prasad K. Medical imaging techniques and computer aided diagnostic approaches for the detection of breast cancer with an emphasis on thermography - a review. Int J Med Eng Inform. 2016;8:275. https://doiorg.publicaciones.saludcastillayleon.es/10.1504/IJMEI.2016.077446.

    Article  Google Scholar 

  26. Ibrahim A, Mohammed S, Ali HA. Breast cancer detection and classification using thermography: a review. 2018. p. 496–505. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-3-319-74690-6_49.

  27. Jawzal H, Ekici S. Trends in breast cancer screening using thermography: a review. Int J Latest Technol Eng, Manag; Appl Sci. 2018;7:100–4.

    Google Scholar 

  28. Raghavendra U, Gudigar A, Rao TN, Ciaccio EJ, Ng EYK, Rajendra Acharya U. Computer-aided diagnosis for the identification of breast cancer using thermogram images: a comprehensive review. Infrared Phys Technol. 2019;102: 103041. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.infrared.2019.103041.

    Article  Google Scholar 

  29. Gonzalez-Hernandez JL, Recinella AN, Kandlikar SG, Dabydeen D, et al. Technology, application and potential of dynamic breast thermography for the detection of breast cancer. Int J Heat Mass Transf. 2019;131:558–73.

    Article  Google Scholar 

  30. Zuluaga-Gomez J, Zerhouni N, Al Masry Z, Devalland C, Varnier C. A survey of breast cancer screening techniques: thermography and electrical impedance tomography. J Med Eng Technol. 2019;43:305–22. https://doiorg.publicaciones.saludcastillayleon.es/10.1080/03091902.2019.1664672.

    Article  CAS  PubMed  Google Scholar 

  31. Hakim A, Awale RN. Thermal imaging - an emerging modality for breast cancer detection: a comprehensive review. J Med Syst. 2020;44:136. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10916-020-01581-y.

    Article  PubMed  Google Scholar 

  32. Al Husaini MAS, Habaebi MH, Hameed SA, Islam MR, Gunawan TS. A systematic review of breast cancer detection using thermography and neural networks. IEEE Access. 2020;8:208922–37. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ACCESS.2020.3038817.

    Article  Google Scholar 

  33. Mashekova A, Zhao Y, Ng EYK, Zarikas V, Fok SC, Mukhmetov O. Early detection of the breast cancer using infrared technology – a comprehensive review. Therm Sci Eng Prog. 2022;27:101142. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.tsep.2021.101142.

    Article  Google Scholar 

  34. Roslidar R, Rahman A, Muharar R, Syahputra MR, Arnia F, Syukri M, Pradhan B, Munadi K. A review on recent progress in thermal imaging and deep learning approaches for breast cancer detection. IEEE Access. 2020;8:116176–94. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ACCESS.2020.3004056.

    Article  Google Scholar 

  35. Kerr J. Review of the effectiveness of infrared thermal imaging (thermography) for population screening and diagnostic testing of breast cancer. 2004.

  36. Irwig L, Houssami N, van Vliet C. New technologies in screening for breast cancer: a systematicreview of their accuracy. Br J Cancer. 2004;90:2118–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Brennan M, Houssami N. Thermography in breast cancer diagnosis, screening and risk assessment: systematic review. Breast Cancer Manag. 2013;2:163–72. https://doiorg.publicaciones.saludcastillayleon.es/10.2217/bmt.13.4.

    Article  CAS  Google Scholar 

  38. Vreugdenburg TD, Willis CD, Mundy L, Hiller JE. A systematic review of elastography, electrical impedance scanning, and digital infrared thermography for breast cancer screening and diagnosis. Breast Cancer Res Treat. 2013;137:665–76. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10549-012-2393-x.

    Article  PubMed  Google Scholar 

  39. Yussof NA. Infrared regulation thermography for cancer. Putrajaya; 2014. https://www.moh.gov.my/index.php/database_stores/store_view_page/30/247.

  40. Beresford S, Cording J, Gribble A, Haddow J, et al. The future of breast screening: a literature review of emerging technologies in breast cancer screening. 2018.

  41. Fitzgerald A, Berentson-Shaw J. Thermography as a screening and diagnostic tool: a systematic review. N Z Med J. 2012;125:80–91 http://www.ncbi.nlm.nih.gov/pubmed/22426613.

    PubMed  Google Scholar 

  42. Ammer K, Ring EF. Standard procedures for infrared imaging in medicine. In: Diakides M, Bronzino JD, Peterson DR, editors. Medical infrared imaging. Principles and practices. Boca Ratón: CRC Press; 2013. p. 32.1–14.

  43. Yassin NIR, Omran S, El Houby EMF, Allam H. Machine learning techniques for breast cancer computer aided diagnosis using different image modalities: a systematic review. Comput Methods Programs Biomed. 2018;156:25–45.

    Article  PubMed  Google Scholar 

  44. Magalhaes C, Mendes J, Vardasca R. Meta-analysis and systematic review of the application of machine learning classifiers in biomedical applications of infrared thermography. Appl Sci. 2021;11: 842.

    Article  CAS  Google Scholar 

  45. Vatansever F, Hamblin MR. Far infrared radiation (FIR): its biological effects and medical applications. Photonics Lasers Med. 2012;1. https://doiorg.publicaciones.saludcastillayleon.es/10.1515/plm-2012-0034.

  46. Whiting PF. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155:529. https://doiorg.publicaciones.saludcastillayleon.es/10.7326/0003-4819-155-8-201110180-00009.

    Article  PubMed  Google Scholar 

  47. Schwarzer G, Rücker G. Meta-analysis of proportions. 2022. p. 159–172. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-1-0716-1566-9_10.

  48. Lin L, Chu H. Meta-analysis of proportions using generalized linear mixed models. Epidemiology. 2020;31:713–7. https://doiorg.publicaciones.saludcastillayleon.es/10.1097/EDE.0000000000001232.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Barker TH, Migliavaca CB, Stein C, Colpani V, Falavigna M, Aromataris E, Munn Z. Conducting proportional meta-analysis in different types of systematic reviews: a guide for synthesisers of evidence. BMC Med Res Methodol. 2021;21:189. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12874-021-01381-z.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Hunter JE, Schmidt FL. Methods of meta-analysis: correcting error and bias in research findings. Thousand Oaks: Sage; 2004. https://doiorg.publicaciones.saludcastillayleon.es/10.4135/9781483398105.

  51. Schwarzer G, Carpenter JR, Rücker G. Meta-analysis with R. Cham: Springer International Publishing; 2015. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-3-319-21416-0.

    Book  Google Scholar 

  52. Freeman SC, Kerby CR, Patel A, Cooper NJ, Quinn T, Sutton AJ. Development of an interactive web-based tool to conduct and interrogate meta-analysis of diagnostic test accuracy studies: MetaDTA. BMC Med Res Methodol. 2019;19:81. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12874-019-0724-x.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Parisky YR, Sardi A, Hamm R, Hughes K, Esserman L, Rust S, Callahan K. Efficacy of computerized infrared imaging analysis to evaluate mammographically suspicious lesions. Am J Roentgenol. 2003;180:263–9. https://doiorg.publicaciones.saludcastillayleon.es/10.2214/ajr.180.1.1800263.

    Article  CAS  Google Scholar 

  54. Button TM, Li H, Fisher P, Rosenblatt R, Dulaimy K, Li S, O’Hea B, Salvitti M, Geronimo V, Geronimo C, Jambawalikar S, Carvelli P, Weiss R. Dynamic infrared imaging for the detection of malignancy. Phys Med Biol. 2004;49:3105–16. https://doiorg.publicaciones.saludcastillayleon.es/10.1088/0031-9155/49/14/005.

    Article  PubMed  Google Scholar 

  55. Yuan Y, Wang Q, Song ST, Li JY, Liu Z, Image analysis of breast tumors using thermal texture mapping (TTM). In: 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. Shanghai: IEEE; 2005. p. 697–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/IEMBS.2005.1616509.

  56. Arora N, Martins D, Ruggerio D, Tousimis E, Swistel AJ, Osborne MP, Simmons RM. Effectiveness of a noninvasive digital infrared thermal imaging system in the detection of breast cancer. Am J Surg. 2008;196:523–6. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.amjsurg.2008.06.015.

    Article  PubMed  Google Scholar 

  57. Wishart GC, Campisi M, Boswell M, Chapman D, Shackleton V, Iddles S, Hallett A, Britton PD. The accuracy of digital infrared imaging for breast cancer detection in women undergoing breast biopsy. Eur J Surg Oncol (EJSO). 2010;36:535–40. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.ejso.2010.04.003.

    Article  CAS  PubMed  Google Scholar 

  58. Wang J, Chang K-J, Chen C-Y, Chien K-L, Tsai Y-S, Wu Y-M, Teng Y-C, Shih TT-F. Evaluation of the diagnostic performance of infrared imaging of the breast: a preliminary study. Biomed Eng Online. 2010;9: 3. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/1475-925X-9-3.

    Article  PubMed  PubMed Central  Google Scholar 

  59. Kontos M, Wilson R, Fentiman I. Digital infrared thermal imaging (DITI) of breast lesions: sensitivity and specificity of detection of primary breast cancers. Clin Radiol. 2011;66:536–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.crad.2011.01.009.

    Article  CAS  PubMed  Google Scholar 

  60. Kolarić D, Herceg Z, Nola IA, Ramljak V, Kulis T, Holjevac JK, Deutsch JA, Antonini S. Thermography--a feasible method for screening breast cancer? Coll Antropol. 2013;37:583–8. http://www.ncbi.nlm.nih.gov/pubmed/23941007.

    PubMed  Google Scholar 

  61. Collett AE, Guilfoyle C, Gracely EJ, Frazier TG, Barrio AV. Infrared imaging does not predict the presence of malignancy in patients with suspicious radiologic breast abnormalities. Breast J. 2014;20:375–80. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/tbj.12273.

    Article  PubMed  Google Scholar 

  62. Yao X, Wei W, Li J, Wang L, Xu Z, Wan Y, Li K, Sun S. A comparison of mammography, ultrasonography, and far-infrared thermography with pathological results in screening and early diagnosis of breast cancer. Asian Biomed. 2014;8:11–9. https://doiorg.publicaciones.saludcastillayleon.es/10.5372/1905-7415.0801.257.

    Article  Google Scholar 

  63. Francis SV, Sasikala M, Bhavani Bharathi G, Jaipurkar SD. Breast cancer detection in rotational thermography images using texture features. Infrared Phys Technol. 2014;67:490–6. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.infrared.2014.08.019.

    Article  Google Scholar 

  64. Omranipour R, Kazemian A, Alipour S, Najafi M, Alidoosti M, Navid M, Alikhassi A, Ahmadinejad N, Bagheri K, Izadi S. Comparison of the accuracy of thermography and mammography in the detection of breast cancer. Breast Care. 2016;11:260–4. https://doiorg.publicaciones.saludcastillayleon.es/10.1159/000448347.

    Article  PubMed  PubMed Central  Google Scholar 

  65. Araújo MC, Souza RMCR, Lima RCF, Filho TMS. An interval prototype classifier based on a parameterized distance applied to breast thermographic images. Med Biol Eng Comput. 2017;55:873–84. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s11517-016-1565-y.

    Article  PubMed  Google Scholar 

  66. Morales-Cervantes A, Kolosovas-Machuca ES, Guevara E, Maruris Reducindo M, Bello Hernández AB, Ramos García M, González FJ. An automated method for the evaluation of breast cancer using infrared thermography. EXCLI J. 2018;17:989–98. https://doiorg.publicaciones.saludcastillayleon.es/10.17179/excli2018-1735.

    Article  PubMed  PubMed Central  Google Scholar 

  67. Sarigoz T, Ertan T, Topuz O, Sevim Y, Cihan Y. Role of digital infrared thermal imaging in the diagnosis of breast mass: a pilot study. Infrared Phys Technol. 2018;91:214–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.infrared.2018.04.019.

    Article  Google Scholar 

  68. Hellgren RJ, Sundbom AE, Czene K, Izhaky D, Hall P, Dickman PW. Does three-dimensional functional infrared imaging improve breast cancer detection based on digital mammography in women with dense breasts? Eur Radiol. 2019;29:6227–35. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s00330-019-06248-y.

    Article  PubMed  PubMed Central  Google Scholar 

  69. Sun S, Yu X, Li J, Li Z, Zhu S, Wang L, Wu J, Li K, Wu Q, Sun S. Risk of breast cancer based on thermal tomography characteristics. Transl Cancer Res. 2019;8:1148–57. https://doiorg.publicaciones.saludcastillayleon.es/10.21037/tcr.2019.06.29.

    Article  PubMed  PubMed Central  Google Scholar 

  70. Kakileti ST, Madhu HJ, Krishnan L, Manjunath G, Sampangi S, Ramprakash HV. Observational study to evaluate the clinical efficacy of thermalytix for detecting breast cancer in symptomatic and asymptomatic women. JCO Glob Oncol. 2020:1472–1480. https://doiorg.publicaciones.saludcastillayleon.es/10.1200/GO.20.00168.

  71. Singh A, Bhat V, Sudhakar S, Namachivayam A, Gangadharan C, Pulchan C, Sigamani A. Multicentric study to evaluate the effectiveness of Thermalytix as compared with standard screening modalities in subjects who show possible symptoms of suspected breast cancer. BMJ Open. 2021;11: e052098. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/bmjopen-2021-052098.

    Article  PubMed  PubMed Central  Google Scholar 

  72. Da Luz TGR, Coninck JC, Ulbricht L. Comparison of the sensitivity and specificity between mammography and thermography in breast cancer detection. 2022. p. 2163–2168. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-3-030-70601-2_316.

  73. Bansal R, Collison S, Krishnan L, Aggarwal B, Vidyasagar M, Kakileti ST, Manjunath G. A prospective evaluation of breast thermography enhanced by a novel machine learning technique for screening breast abnormalities in a general population of women presenting to a secondary care hospital. Front Artif Intell. 2023;5: 1050803. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/frai.2022.1050803.

    Article  PubMed  PubMed Central  Google Scholar 

  74. Martín-Del-Campo-Mena E, Sánchez-Méndez PA, Ruvalcaba-Limon E, Lazcano-Ramírez FM, Hernández-Santiago A, Juárez-Aburto JA, Larios-Cruz KY, Hernández-Gómez LE, Merino-González JA, González-Mejía Y. Development and validation of an infrared-artificial intelligence software for breast cancer detection. Explor Target Antitumor Ther. 2023:294–306. https://doiorg.publicaciones.saludcastillayleon.es/10.37349/etat.2023.00135.

  75. Hunter JP, Saratzis A, Sutton AJ, Boucher RH, Sayers RD, Bown MJ. In meta-analyses of proportion studies, funnel plots were found to be an inaccurate method of assessing publication bias. J Clin Epidemiol. 2014;67:897–903. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jclinepi.2014.03.003.

    Article  PubMed  Google Scholar 

  76. Mammoottil MJ, Kulangara LJ, Cherian AS, Mohandas P, Hasikin K, Mahmud M. Detection of breast cancer from five-view thermal images using convolutional neural networks. J Healthc Eng. 2022;2022:1–15. https://doiorg.publicaciones.saludcastillayleon.es/10.1155/2022/4295221.

    Article  Google Scholar 

  77. Pérez-Martín J, Sánchez-Cauce R. Quality analysis of a breast thermal images database. Health Informatics J. 2023;29:146045822311537. https://doiorg.publicaciones.saludcastillayleon.es/10.1177/14604582231153779.

    Article  Google Scholar 

  78. Moskowitz M. Efficacy of Computerized Infrared Imaging. Am J Roentgenol. 2003;181:596–596. https://doiorg.publicaciones.saludcastillayleon.es/10.2214/ajr.181.2.1810596.

    Article  Google Scholar 

Download references

Acknowledgements

We thank Manuela Parras from HM Hospitals for sharing her knowledge about breast radiology with us, our colleague Emilio Letón, for his support with the meta-analysis, and Murray Loew, professor at George Washington University, for useful comments.

Funding

This work has been supported by grant PID2019-110686RB-I00 of the Spanish Government, co-financed by the European Regional Development Fund. A.G.A. received a grant (PEJ-2019-AI/TIC-15533) from Universidad Nacional de Educación a Distancia (UNED), co-financed by the Regional Government of Madrid and the Youth Employment Initiative (YEI) of the European Union, and currently holds a predoctoral grant from the Research Promotion Plan of UNED.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the manuscript. F.J.D. had the idea for the article. A.G.A. performed the literature search. Data analysis was performed by A.G.A. and J.P.M. The first draft of the manuscript was written by A.G.A., and all authors commented on previous versions of the manuscript.

Corresponding author

Correspondence to Ane Goñi-Arana.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1. PRISMA checklist.

Additional file 2. Search query used in each database.

13643_2024_2708_MOESM3_ESM.pdf

Additional file 3. Meta-analysis results obtained with different approaches and subgroup analysis. Table S1. Pooled sensitivity and specificity from the 22 selected studies, obtained using different combinations of meta-analysis methods and data transformations. GLMM stands for Generalized Linear Mixed Models, and 95%-CI represents the 95% confidence interval. Table S3. Subgroup analysis of pooled sensitivity and specificity from the 22 selected studies, categorized by complexity of interpretation criteria and computed using Generalized Linear Mixed Models (GLMMs) with logit transformation. The number in parentheses repre-sents the number of studies in each category, and 95%-CI refers to the 95% confidence inter-val. Studies that did not specify the imaging protocol were not considered.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goñi-Arana, A., Pérez-Martín, J. & Díez, F.J. Breast thermography: a systematic review and meta-analysis. Syst Rev 13, 295 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13643-024-02708-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13643-024-02708-9

Keywords