Using Approximate String Matching for Verifying Cosmetic Product Names in the Cosmetic Notification Request Form

Main Article Content

Jiraporn Wonglert
Verayuth Lertnattee

Abstract

Objectives: To develop a program for checking cosmetic product names in the cosmetic notification request form using approximate string matching by Levenshtein edit distance and to evaluate the program's performance by comparing its verifying result with that from experts. Method: The program development framework was based on the system development life cycle theory. The problems on verifying cosmetic product names were analyzed with two problems identified including being time consuming and inaccuracy in the verification of product names. A web-based application for verifying cosmetic product names was developed. The main algorithm in the application was an approximate string matching by Levenshtein edit distance. The performance of the application was assessed using F1. Results: The n-gram technique was applied to create a set of appropriate features for increasing the performance in string matching. The appropriate n-gram in this study was trigram with thresholds at 80, 85, and 90. The threshold at 80 meant that two strings with a text similarity greater than or equal to 80 percent were defined as the same string. In comparing the efficiency of the program using the 8 engines of Thai word segmentation, the best engines at thresholds set to 85 and 90, were the attacut and tltk, respectively. There were no significant differences in the results on verification of cosmetic product names of the program and those from experts. Conclusion: This program was developed for verifying the names of cosmetic products in the notification request form using approximate string matching. The results of verification on cosmetic product names are accurate and consistent to those of experts. It can be used to support authorities in preliminary verification of information in the cosmetic notification request form.

Article Details

Section
Research Articles

References

Marketeer team. Beauty market in 2022 with beauty tech: a case study of L'Oreal [online]. 2022 [cited Sep 21, 2022]. Available from: marketeeronline.co/ archives/266777.

Department of Trade Negotiation, Ministry of Commerce. Thailand cosmetic market: plans for preference utilization in FTAs to increase exports [online]. 2021 [cited Sep 21, 2022]. Available from: www.dtn.go.th/th/news/607e50d7ef4140b573032f4 0?cate=5cff753c1ac9ee073b7bd1c5.

Cosmetic Act B.E. 2558. Royal Gazette No. 132, Part 86A (Sep 8, 2015).

Juntarawongpaisarn K, Ruengorn C. Situation of cosmetic notification on e-Submission and moni- toring of Thai Food and Drug Administration, fiscal year 2016-2019. Thai Food and Drug Journal 2021; 28: 60-71.

Cosmetic Control Group, Food and Drug Adminis- tration. Thailand cosmetic regulation guidelines (updated 2021) [online]. 2021 [cited Jan 9, 2022]. Available from: www.fda.moph.go.th/sites/Cosmetic /PublishingImages/SitePages/Permission/29-6-65-Update.pdf.

Navarro G. A guided tour to approximate string matching. ACM Comput Surv 2001; 33: 31-88.

Watcharapinchai N, Rujikietgumjorn S. Approximate license plate string matching for vehicle re-identifi- cation. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS); 2017 Aug 29 - Sep 1; Lecce, Italy: IEEE; 2017. p. 1-6.

Levenshtein VI. Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 1966; 10: 707-10.

Yulianto MM, Arifudin R, Alamsyah A. Autocomplete and spell checking Levenshtein distance algorithm to getting text suggest error data searching in library. Sci J Inform 2018; 51: 67-75.

Haldar R, Mukhopadhyay D. Levenshtein distance technique in dictionary lookup methods: an improved approach [online]. 2011 [cited Dec 15, 2021]. Available from: arxiv.org/ftp/arxiv/papers/1101/1101. 1232.pdf

Santirattanaphakdi C, Niwattanakul S. The design and development of public bus complaint classifica tion process for service problem tagging. Journal of Engineering and Digital Technology 2021; 9: 77-91.

Paluekpet T. Applying Levenshtein’s algorithm to find homograph and look-alike herb names [master thesis]. Nakhon Pathom: Silpakorn University; 2017.

Urathamakun P, Runapongsa K. Improved rule-based and new dictionary for Thai word segmen- tation. The 3rd Joint Conference on Computer Science and Software Engineering; 2006 June 28-30; Bangkok, Thailand. 2006. p. 34-40.

Ekwonganan A. Identification of Thai and translite- rated words by n-gram models [master thesis]. Bangkok: Chulalongkom University; 2005.

Connor RJ. Sample size for testing differences in proportions for the paired-sample design. Biome- trics. 1987; 43: 207-11.

Aiumsiriwong O. System analysis and design. Bang kok: Se-Education; 2012.

Supaartagorn C. Web application development with PHP and MySQL+AJAX jQuery. 2nd ed. Bangkok: Simplify Press; 2018.

Wichitbunyarak P. HMTL: a web language. Execu- tive Journal 2011; 3: 203-7.

Phatthiyaphaibun W, Chaovavanich K, Polpanumas C, Suriyawongkul A, Lowphansirikul L, Chormai P. PyThaiNLP: Thai natural language processing in python [online]. 2016 [cited Jun 27, 2022]. Available from: doi.org/10.5281/zenodo.3519354.

Parikh N, Singh G, Sundaresan N. Query sugges- tion with large scale data. In: Govindaraju V, Rao CR, editors. Handbook of statistics machine learning : theory and applications. Vol. 31. Amsterdam: Elsevier; 2013. p. 493-518.

Kumar G. Evaluation metrics for intrusion detection systems-a study. Int J Comput Sci Mob Appl 2014; 2: 11-7.

Somprasertsri G. Automatic Thai keyphrase extrac- tion using n-gram and machine learning approach. Journal of Information Science 2014; 32: 111-28.