Using Machine Learning for Detection of Illegal Food Advertising Text

Main Article Content

Wannakan Nitirotsuphaphak
Verayuth Lertnattee

Abstract

Objective: To find the appropriate model from machine learning techniques for classifying food advertising texts to legal and illegal texts. Methods: A set of 200 food advertising texts was prepared with 100 legal texts and 100 illegal texts. In preprocessing steps, irrelevant information that could be linked to product owners, such as advertising license numbers, trade names, and company names was removed from original texts. Then, the Thai word segmentation with the longest matching algorithm was used to separate words in sentences/phrases. In the next step, a list of Thai stopwords was applied to remove unimportant words. Then, unigram words and bigram words were used as features in document vectors. The full set and subset of features were utilized for creating and testing models. The subset of features was selected using select k best method. The PHP language with the PHP-ML library for machine learning was used to construct a set of programs. Three techniques of supervised learning were applied to create the models, i.e., support vector machine, k-nearest neighbors, and naïve Bayes. By using the stratified random technique, 80% of the collection with equal portions of legal and illegal texts was used for creating models and the rest of 20% was used for testing models. Each test was performed 10 times. The average score of F1 was used as a performance indicator. Then, models that obtained the highest average F1 for each learning technique were used to create a web application for detecting illegal food advertising text. The performance of each model was tested by 40 food advertising texts. Results: The support vector machine is the most effective classifier for categorizing food advertising text with the highest F1-score of 0.987 when the model was created with full features of unigrams after removing stop words. Conclusion: Machine learning techniques could be efficiently applied for classifying legal/illegal food advertising texts.

Article Details

Section
Research Articles

References

Food Act, B.E. 2522 Royal Gazette No.96, Part 79A special (May 13, 1979).

Announcement of the Food and Drug Administration Re: criteria for food advertisement B.E. 2561. Royal Gazette No.135, Part 322D special (December 17, 2018).

Story M, French S. Food advertising and marketing directed at children and adolescents in the US. Int J Behav Nutr Phys Act 2004; 1: 3.

Chapman K, Nicholas P, Supramaniam R. How much food advertising is there on Australian television?. Health Promot Int 2006; 21: 172-80.

Harris JL, Bargh JA, Brownell KD. Priming effects of television food advertising on eating behavior. Health Psychol 2009; 28: 404-13.

Shalev-Shwartz S, Ben-David S. Introduction. Understanding machine learning: From theory to algorithms. New York: Cambridge university press; 2014. p. 19-23.

Jindal R, Malhotra R, Jain A. Techniques for text classification: Literature review and current trends. Webology 2015; 12: 1-28.

Chirawichitchai N, Sa-nguansat P, Meesad P. Developing and effective automatic Thai document categorization. NIDA Development Journal 2011; 51: 187-205.

Chatcharaporn K, Angskun T, Angskun J. Tourist attraction categorization models using machine learning techniques. Suranaree Journal of Science and Technology 2012; 6: 35-58.

Tipsena R, Jareanpon C, Somprasertsri G. Automatic question classification on webboard using text mining techniques. Journal of Science and Technology Mahasarakham University. 2014; 33: 493-502.

Foundation for Consumers. Foundation for Consumers reveals the situation of consumers in 2018, found number 1 exaggerated ads [online]. 2019 [cited Sep 26, 2019]. Available from: www.consumerthai.org/news-consumerthai/ffc-news/4302-620124comsumerstat.html.

Havrlant L, Kreinovich V. A simple probabilistic explanation of term frequency-inverse document frequency (tf-idf) heuristic (and variations motivated by this explanation). Int J Gen Syst 2017; 46: 27-36.

Mohammad AH, Alwada'n T, Al-Momani O. Arabic text categorization using support vector machine, Naïve Bayes and neural network. GSTF Int J Comput 2016; 5: 108.

Khamar K. Short text classification using kNN based on distance function. Int J Adv Res Comput Commun Eng 2013; 2: 1916-9.

Al-Khurayji R, Sameh A. An effective arabic text classification approach based on kernel naive bayes classifier. Int J Artif Intell Appl 2017; 8: 1-10.