Quantitative basis for planning the costs of medical care in oncology: analyses and predictions for 2017 – methodology and data sources

I. Data sources

Predictions are based on valid population-based data which have been obtained from legally stipulated administrators (Institute of Health Information and Statistics of the Czech Republic, Czech Statistical Office). Data are analysed in their de-identified form, i.e. without direct and indirect identification of an individual. In particular, the following sources are used:

Sources of demographic population-based data

As part of monitoring of the population development, the national statistical authority (Czech Statistical Office, CZSO) processes data on demographic structure of the Czech Republic’s population. These data are available on the CZSO website (https://www.czso.cz/csu/czso/population) and cover the main demographic characteristics of the Czech population, such as the total number of inhabitants, detailed age structure, life expectancy characteristics, or even the projection of age structure development of the Czech population up to 2050.

The Death Records Database is the primary source of population-based data on cancer mortality in the Czech Republic; this database is also administered by the Czech Statistical Office. Standardised Death Certificates have been designed to collect precise data on the cause of death in each individual, and causes of death are classified according to the International Classification of Diseases (ICD). The primary cause of death is assigned to each deceased person; these causes of death are subsequently used in official statistical outputs on population-based mortality according to causes of death, as provided by individual countries. These statistics are available in outputs of national statistical authorities, as well as in international databases of Eurostat (official European statistics) and the World Health Organization.

Czech National Cancer Registry (CNCR)

The registration of malignant tumours is enshrined in the Czech legislation and is obligatory. The Czech National Cancer Registry (CNCR) is administered by the Institute of Health Information and Statistics of the Czech Republic (IHIS), which is responsible for cohesion of the registry as regards its methodology and contents. The IHIS regularly checks the correctness of submitted data, distributes the methodology, processes, and provides and publishes statistical outputs, and defines access rights to authorised users. The Coordination Centre for Departmental Medical Information Systems (CCDMIS) processes data from the CNCR on nationwide level. The CCDMIS is responsible for smooth operation of the registry, the database status, technical support and data security. It also provides information technology (HW, SW, and communication), authentication and authorisation. The CNCR Council is an advisory body and expert guarantor to CNCR. Members of the CNCR Council most typically include representatives of IHIS, the regional CNCR centres, the Czech Ministry of Health, and the Czech Society for Oncology.

CNCR has become an indispensable part of the complex cancer care, containing more than 2.2 million records over the period 1976–2014, with a representative coverage (100%) of the entire Czech population. In other words, CNCR contains detailed records on all individuals who have been diagnosed with cancer since 1976; these records are based on birth numbers (a number unique to each Czech citizen) and involve data describing malignant tumours and diagnostic details, data on patients’ treatment, as well as data on post-treatment follow-up, among others. Data from CNCR – in a limited and aggregated form – can be viewed and analysed by anyone thanks to the project SVOD (System for Visualisation of Oncology Data), which is available on-line at www.svod.cz. The project SVOD is focused on interactive descriptive analyses which make it possible for the user either to explore epidemiological trends in cancer diagnoses of his/her choice, or to view automatically prepared presentations dealing with several important topics.

Expert panel of the Czech Society for Oncology

Some information, used mainly in the population-based modelling of numbers of cancer patients and in the assessment of medical care results and costs, cannot be obtained from available population-based sources neither from published clinical trials. Time trends in the probability of relapse/progression of primary tumours in different clinical stages, or the probability of administration of higher lines of cancer therapy in patients with different disease burdens are just two examples. In these cases, expert opinions and estimates by an expert panel of CSO are applied to calculated predictions and estimates. Data are obtained in the form of authorised electronic forms or in printed form. The expert panel processes these materials based on clearly formulated and data-linked questions. Panel members process these materials individually, without mutual consultations.

The so-called clinical adjustment of population-based estimates is a very valuable input by the expert panel into the entire system of monitoring of costly therapy. Not all newly diagnosed cancer patients can be treated with cancer therapy. Contraindications can involve the patient’s older age, his/her general state of health, or an advanced stage of his/her tumour. The expert panel adjusts (i.e. decreases) population-based estimates of patient numbers with respect to these facts. 

II. Methodology of predictive assessment of cancer epidemiology in the Czech Republic

Definition of a reference dataset for clinically relevant predictions of cancer burden

The reliability of analyses is based on exact definition and specification of the employed dataset. In order to define a dataset for health care results and costs, data from population-based registries must be drawn with certain limitations:

  • Data must be recent enough, reflecting the current situation of the Czech health care system; historical trends might be very misleading.
  • Most importantly, data employed for analyses must describe the patients who were actually treated in a health care facility (the number of malignant tumours diagnosed at autopsy does have its epidemiological significance, but it does not influence the cost assessment in any way).

The Czech National Cancer Registry has been employed for our analyses. The extent of the analysed data has been limited to the period 1995–2014, as this recent CNCR data contains valid records corresponding with the recent versions of TNM classification. Data from this period represent a sample large enough to provide trustworthy analyses (Figure 1). Records of patients with incomplete diagnosis resulting from refusal to treatment, complications or early death must be removed, as this data would distort the analyses of cancer treatment costs. In accordance with literature, early death was defined as a death which occurred within one month of the diagnosis.

The audit of available population-based data, therefore, results in a reference dataset of high-quality and trustworthy records which describe the treatment and health care results in patients in which diagnosis was properly completed. Figure 1 shows that even the subsequent separation of treated and untreated patients still provides a sample large enough to perform population-based analyses.

Figure 1. Definition of reference dataset for the predictions of cancer burden (Czech National Cancer Registry, 1995–2014).

Brief methodical description of performed calculations

The objective of these predictive models is to reach a reliable estimate of the number of patients living in a given period and needing cancer treatment (Figure 2). The proportion of clinical stages in living patients, combined with the knowledge of possible treatment scenarios, makes it possible to estimate expected costs. The following estimates are performed prospectively, as data from population-based registries are always available with a certain delay:

  1. Estimate of incidence rates. Estimates were made separately for clinical stages and in relevant stratifications according to age groups. The methodology is based on long-term epidemiological trends, adjusting them with respect to demographic changes in the population. The Poisson regression model with estimates supplemented by confidence intervals (prediction intervals) was employed. Apart from the extrapolation with a regression model, extrapolation of the mean value of cancer burden in a recent period was considered for diagnoses with a non-specific epidemiological trend.
  2. Estimate of prevalence rates. The prospective estimate of prevalence rate combines the estimated number of newly diagnosed patients in the years to come with the probability of x-year survival in patients diagnosed in the past. This multi-component estimate, therefore, combines regression estimates of the incidence rates with analyses of x-year survival, taking into consideration that only a certain proportion of patients diagnosed in the past would survive in the assessed year (total prevalence rate).
  3. Estimate of probability of cancer relapses or progressions in a given year. This parameter is essential to estimate the number of patients treated for a relapse or progression of the primary tumour. For this purpose, the data on cancer mortality were extracted from CNCR and the Death Records Database. Records on patient’s death as a result of malignant tumour actually make it possible to derive the frequency of relapses, and thus also the probability of their occurrence, in the 1st, 2nd ...xth year since the primary diagnosis. This probability was subsequently extrapolated for a future period by a logistic regression model. Population-based predictions were independently verified by estimates made by a selected group of clinical experts.

Figure 2. The multi-component population-based estimate of the number of patients who might need cancer treatment in a given year.

Localisation of estimates for individual regions of the Czech Republic

All estimates (i.e. those for the entire Czech population) were subsequently localised for catchment areas of comprehensive cancer centres, using the same methodical approaches. In particular, these partial calculations take into consideration the epidemiological situation in a given region, from which weights are derived to distribute the population-based prediction of incidence and mortality rates. 

Risk analysis and the probability of bias

All predictions given below have resulted from population-based epidemiological data. This fact implies that there is a certain probability of inaccuracies; therefore, all point estimates are supplemented with a 90% confidence interval. Each individual point estimate must be interpreted inseparably from these probability limits which specify its statistical reliability and can prevent possible misinterpretations. The accuracy of predictions at regional level can be affected in some less common diagnoses and clinical stages, due to insufficient sample size. Despite this fact, predictions have been made using the strictly same methodology in all subsets, and regional estimates have been calculated in such way that their total equals the population-based estimates.

Prediction scenarios and adjustments made in selected diagnoses due to significant changes in trends in epidemiological parameters

Predictive modelling of epidemiological burden makes it possible for us to work with different scenarios of development of population characteristics, which inevitably leads to different estimates of incidence, and of prevalence in particular. Let us consider, for example, the development of incidence of a specific cancer type, which can either follow a certain long-term trend, or fluctuate around a certain value (burden expressed per 100,000 population). Extrapolation of this central value is then the preferred scenario in cases of ambiguous time trends in diagnoses showing a high variability over a recent period. Survival analysis of cancer patients can serve as another example when scenarios are employed. Again, we can assume that survival values will stay the same over time, and take those survival values into our calculations which were identified on the most recent data set of patients (corresponding to the period 2010–2014, for example). On the other hand, we can suppose that patient survival will develop at the same pace as it has developed until now. We can then define a sufficiently representative and recent data set of patients in order to identify a trend which will be subsequently reflected in the following years, or rather calendar years for which the values of population-based survival are not available. This scenario has been preferred in the published predictions. Of course, both scenarios can be combined; this fact, together with the consideration of other characteristics such as cancer therapy, leads to predictive models corresponding to various possibilities of epidemiological and social development in the Czech Republic. A study by the Institute of Biostatistics and Analyses of the Masaryk University, which predicted the number of colorectal cancer patients likely to be treated with cancer therapy in 2015 (Pavlík et al., 2012), can be cited as one example of employing such scenarios for predictive modelling in cancer epidemiology. An adequate scenario of future development was selected by the experts for the purpose of presented predictions; possible changes in the scenario (as compared to the previous edition) are described and justified in this chapter.

In the following diagnoses, recent data from the Czech National Cancer Registry show significant changes in time trends of epidemiological parameters. For this reason, predictions of the population burden had to be adjusted, mainly with respect to changes in trends, which cannot be unequivocally predicted. The following table shows the list of diagnoses in which newly published data and changes led to adjustments in published values as compared to the previous edition of epidemiological predictions.


Revision of prediction algorithm for year 2017

Description of the impact on predicted values

Justification for revision

Non-small-cell lung cancer

time trend extrapolation: use of a regression model based on the last 5 years

decrease in predicted values of incidence of advanced stages

confirmed decrease in the last two data points

Prostate cancer (C61)

time trend extrapolation: use of a regression model based on the last 6 years

decrease in predicted values of incidence

significant changes in time trend in recent years in early stages

Bladder cancer (C67)

time trend extrapolation: use of a regression model based on the last 10 years

slight increase in the number of patients, adjustment in advanced stages

taking long-term trends into consideration

Stomach cancer (C16)

time trend extrapolation: use of a regression model based on the last 5 years

decrease in incidence

confirmed decrease in the last two data points

Pancreatic cancer (C25)

time trend extrapolation: use of a regression model based on the last 10 years

increase in incidence

increase in the last two data points

Malignant melanoma of skin (C43)

time trend extrapolation: use of a regression model based on the last 10 years

increase in incidence of early stages

confirmed increase in early stages in the last two data points

Cervical cancer (C53)

time trend extrapolation: use of a regression model based on the last 10 years

decrease in incidence

confirmed decrease in the last two data points

III. References

  • DE ANGELIS, R., M. SANT, M. P. COLEMAN, S. FRANCISCI, et al. Cancer survival in Europe 1999-2007 by country and age: results of EUROCARE--5-a population-based study. Lancet Oncol, Jan 2014, 15(1), 23-34.
  • CAPOCACCIA, R., M. COLONNA, I. CORAZZIARI, R. DE ANGELIS, et al. Measuring cancer prevalence in Europe: the EUROPREVAL project. Ann Oncol, Jun 2002, 13(6), 831-839.
  • DUSEK, L. AND ET AL. Czech Cancer Care in Numbers 2008-2009. Praha: Grada Publishing, a.s., 2009.
  • DUSEK, L., J. MUZIK, D. MALUSKOVA, O. MAJEK, et al. Cancer incidence and mortality in the Czech Republic. Klin Onkol,  2014, 27(6), 406-423.
  • DYBA, T. AND T. HAKULINEN Comparison of different approaches to incidence prediction based on simple interpolation techniques. Statistics in Medicine, Jul 15 2000, 19(13), 1741-1752.
  • DYBA, T. AND T. HAKULINEN Do cancer predictions work? Eur J Cancer, Feb 2008, 44(3), 448-453.
  • ESTEVE, J., E. BENHAMOU AND L. RAYMOND Statistical methods in cancer research. Volume IV - Descriptive epidemiology. Lyon: International Agency for Research on Cancer, 1994. 1-350 p.
  • GAIL, M. H., L. KESSLER, D. MIDTHUNE AND S. SCOPPA Two approaches for estimating disease prevalence from population-based registries of incidence and total mortality. Biometrics, Dec 1999, 55(4), 1137-1144.
  • HAKULINEN, T. AND T. DYBA Precision of incidence predictions based on Poisson distributed observations. Statistics in Medicine, Aug 15 1994, 13(15), 1513-1523.
  • MALVEZZI, M., G. CARIOLI, P. BERTUCCIO, T. ROSSO, et al. European cancer mortality predictions for the year 2016 with focus on leukaemias. Ann Oncol, Apr 2016, 27(4), 725-731.
  • MARIOTTO, A. B., K. R. YABROFF, Y. SHAO, E. J. FEUER, et al. Projections of the cost of cancer care in the United States: 2010-2020. J Natl Cancer Inst, Jan 19 2011, 103(2), 117-128.
  • MOLLER, B., H. FEKJAER, T. HAKULINEN, L. TRYGGVADOTTIR, et al. Prediction of cancer incidence in the Nordic countries up to the year 2020. Eur J Cancer Prev, Jun 2002, 11 Suppl 1, S1-96.
  • MOLLER, B., H. FEKJAER, T. HAKULINEN, H. SIGVALDASON, et al. Prediction of cancer incidence in the Nordic countries: empirical comparison of different approaches. Statistics in Medicine, Sep 15 2003, 22(17), 2751-2766.
  • PAVLIK, T., O. MAJEK, J. MUZIK, J. KOPTIKOVA, et al. Estimating the number of colorectal cancer patients treated with anti-tumour therapy in 2015: the analysis of the Czech National Cancer Registry. BMC Public Health,  2012, 12, 117.
  • PAVLIK, T., O. MAJEK, T. BUCHLER, R. VYZULA, et al. Trends in stage-specific population-based survival of cancer patients in the Czech Republic in the period 2000-2008. Cancer Epidemiol, Feb 2014, 38(1), 28-34.
  • VERDECCHIA, A., G. DE ANGELIS AND R. CAPOCACCIA Estimation and projections of cancer prevalence from cancer registry data. Statistics in Medicine, Nov 30 2002, 21(22), 3511-3526.
  • WEIR, H. K., T. D. THOMPSON, A. SOMAN, B. MØLLER, et al. The past, present, and future of cancer incidence in the United States: 1975 through 2020. Cancer,  2015, 121(11), 1827-1837.