Validity of Data in Thailand Cancer-Based Registry with the Application of Probabilistic Record Linkage

Main Article Content

Wannaporn Wattanawong
Sawaeng Watcharathanakij

Abstract

Objectives: To develop and validate probabilistic data linkage models linking data in cancer registry and data in one tertiary care hospital, and to validate data in cancer registry by using such method. Method: The study linked the data from two sources by using deterministic record linkage with 1-1 matching with newly encoded identification numbers and 17 probabilistic record linkage based models. Each model employed either identifiers or non-identifiers including encoded identification numbers, encoded hospital number, first name, last name, contact address in house registration, date of birth, date of death, zip code of residence in house registration issued by Department of Provincial Administration, sex, and ICD-10. Each model employed 3 to 8 variables, and the optimal cutoff points were determined. Result: There were 7,243 matched patients in both databases accounting for 89.72% of 8,073 patients in cancer registry and 36.72% of 19,725 patients in two databases. All models of probabilistic record linkage had positive predictive values between 99.83 to 99.94%, negative predictive values between 97.04 to 99.98%, sensitivity between 94.74-99.97%, and specificity between 99.90-99.97%. Conclusion:  Linking data on cancer patients between two databases using probabilistic record linkage led to a higher number of matched patients than using deterministic record linkage. Therefore, it could be applied in any work or research where data anonymity and confidentiality are important. However, the validity of models using probabilistic record linkage largely depends upon selected variables in the model. Therefore, researchers should carefully select the variables used in the prediction equation.

Article Details

Section
Research Articles

References

Maruekhatat R. Data linking techniques and privacy protection. Journal of King Mongkut's University of Technology North Bangkok 2007;17: 80-5.

WHO. Cancer fact sheet [online]. 2021 [cited April 22, 2022]. Available from: www.who.int/en/news-room/fact-sheets/detail/cancer.

Ministry of Public Health. Cancer in Thailand [online]. 2021 [cited April 22, 2022]. Available from: www .nci.go.th/e_book/cit_x/index.html.

Ministry of Public Health. Definition of key perfor- mance indicator for service plan of cancer 2017-2022 [online]. 2017 [cited April 22, 2022]. Available from: tcb.nci.go.th/CWEB/files/ServicePlan61.pdf.

National Cancer Institute. Cancer registry manual in Thailand. Bangkok: Information Technology Division, National Cancer Institute; 2015.

Merriel SWD, Turner EL, Walsh E, Young G, Metcalfe C, Hounsome L, et al. Validation of the National Cancer Registration and analysis service prostate cancer registry with data from the CAP study. Lancet 2016; 388: S77. DOI: https://doi.org/10.1016/S0140-6736(16)32313-3.

Margulis AV, Fortuny J, Kaye JA, Calingaert B, Reynolds M, Plana E, et al. Validation of cancer cases using primary care, cancer registry, and hospitalization data in the United Kingdom. Epidemio logy 2018; 29: 308-13. DOI: 10.1097/EDE.00000000 00000786

Creighton N, Walton R, Roder D, Aranda S, Currow D. Validation of administrative hospital data for identi fying incident pancreatic and periampullary cancer cases: a population-based study using linked cancer registry and administrative hospital data in New South Wales, Australia. BMJ open. 2016; 6: e0111 61. DOI: 10.1136/bmjopen-2016-011161.

Dusetzina SB, Tyree S, Meyer AM, Meyer A, Green L, Carpenter WR. AHRQ methods for effective health care. Rockville, Maryland: Agency for Health care Research and Quality; 2014.

Kranker K. dtalink: Faster probabilistic deduplication methods in Stata for record linking and large data files [online]. 2018 [cited April 22, 2022]. Available from: www.stata.com/meeting/columbus18/slides/co lumbus18_Kranker.pdf.

Blakely T, Salmond C. Probabilistic record linkage and a method to calculate the positive predictive value. Int J Epidemiol 2002; 31: 1246-52.

Capuani L, Bierrenbach AL, Abreu F, Takecian PL, Ferreira JE, Sabino EC. Accuracy of a probabilistic record-linkage methodology used to track blood donors in the Mortality Information System database. Cadernos de saude publica. 2014; 30: 1623-32. DOI: 10.1590/0102-311x00024914.

Personal Data Protection Act B.E. 2562. Royal Gazette No. 136, Part 69A (May 27, 2019).