Automatic Thai Sentences Generation Engine Using Cross-Product Operation in Relational Database

Authors

  • Suphrattra Daengcharoen Information Technology Department, Faculty of Science and Technology, Rajabhat Rajanagarindra University
  • Kanida Charungchit Information Technology Department, Faculty of Science and Technology, Rajabhat Rajanagarindra University
  • Chouvalit Khancome Computer Science, Department Faculty of Science, Ramkhamhaeng University

Keywords:

Thai sentence generation, Thai sentence corpus, Thai sentence engine, Cross-Product Relation

Abstract

Natural Language Processing--NLP is the principle that enables computers to understand, interpret, and utilize human language for communication. Specifically, it involves generating text to create coherent narratives automatically. This is particularly useful for generating complex and narrative-rich textual content. The main mechanism involves constructing sentences by assembling various types of words, phrases, or groups of words into coherent sentences before creating meaningful content that humans can understand. This research develops and designs a software machine to generate Thai language sentences for storage in a Thai sentence repository for use in subsequent research on summarization. The design includes both the architecture of the machinery and the methods. It is divided into two main parts: generating a community dictionary of word types in the Thai language and generating sentences using the cross-productive operations of relational algebra on a database as control rules for generating sentences following Thai language syntax patterns. The experiment involves importing words from a small-sized city dictionary of 30,000 words and generating 21 Thai sentence patterns. The experimental results show that the machine can generate a large quantity of sentences, up to 7.63926x1016 sentences. The quality of the results is assessed by considering whether the generated sentences are readable and semantically correct. It is found that, on average, 36.70% of the sentences are readable and semantically correct, with a minimum of 13.33% and a maximum of 64%. Considering the number of words used to create sentences, sentences with two words have a readability and correctness rate between 44.00% and 64.00%, averaging 53.05%. For sentences with a length of 3 words, the readability and correctness rate range from 22.33% to 57.67%, with an average of 34.57%. Sentences with four words have a readability and correctness rate between 13.33% and 21.00%, averaging 18.00%.

References

Bui, V., Abbbass, H. S., & Bender, A. (2010). Evolving stories: Grammar evolution for automatic plot generation. IEEE Congress on Evolutionary Computation 2010 (pp. 1-8). Barcelona, Spain: IEEE doi: 10.1109/CEC.2010.5585934

Daengcharoen, D., Charungchit, K., & Khancome, C. (2020). Thai sentence generation engine for Thai sentences corpus. Proceedings of the 3rd Conference on Innovation Engineering and Technology for Economy and Society 2020 (pp. 199-204). Bangkok: Kasem Bundit University (in Thai)

Dekpituksirikul, K. (2008). A comparative study of phrases, sentences and discourses of normal and tistic children (Master’s thesis). Silpakorn University. Nakhon Pathom (in Thai)

Elmasri, R., & Navathe, S. (2010). Fundamentals of database system (4th ed.). USA.: Pearson Education Inc.

Grandi, F., Mandreoli, F., martoglia, R., & Penzo, W. (2017). A relational algebra for streaming tables living in a temporal database world. 24th International Symposium on Temporal Representation and Reasoning (TIME 2007) (pp. 1-15). Wadern: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH, Wadern

Hamburg University. (2021). Story generator algorithms. Retrieved from http://wikis.sub.uni-hamburg.de/lhn/index.php/Story_Generator_Algorithms.

Jampaibool, T., & Arunmanakul, V. (2016). The complexity of the index noun phrase indicates the complexity of the text: A case study of textbooks for Thai subjects Grade 1-3. Humanities Journal, 23(2), 148-177. (in Thai)

Khancome, C., Daengcharoen, D., & Charungchit, K. (2022). The machine creates Thai sentences according to the purpose of communication. The 14th NPRU National Academic Conference (pp. 612-623). Nakhon Pathom Nakhon: Pathom Rajabhat University (in Thai)

Krukaset, W., Krukaset, N., & Khancome, C. (2017). Thai sentence generation machine employing fixed patterns”. 2017 IEEE International Conference on High Performance Computing and Communications Workshops (pp. 70-73). Bangkok: IEEE. (in Thai)

Kybartas, B., & Bidarra, R. (2010). A survey on story generation techinques for authoring computational narratives. IEEE Trans. on Computational Intelligence and AI in Games, 3(5), 776-786. doi:10.1109/TCIAIG.2016.2546063

LEXiTRON Data. (2018). Lecitron data is an ideal database for people to develop or study. Retrieved from https://lexitron.nectec.or.th/2009_1/index.php?q=common_manager/download#latest_version.

Li, C., Chen-Chuan Chang, K., Ilyas, I. F., & Song, S. (2005). RankSQL: Query algebra and optimization for relational top-k queries. USA: SIGMOD 2005 Bultimore Maryland

Limpanadudadee, W., Punyabukkanna, P., & Poobrasert, O. (2014). Text corpus for natural language story-telling sentence generation: A design and evaluation. 11th International Joint Conference on Computer Science and Software Engineering (JCSSE) (pp. 80-85). Bangkok: IEEE. (in Thai)

Meehan, J. R. (1976). The metanovel: Write stories by computer (Doctoral dissertation). Yale University. USA.

Meehan, J. R. (1977). Tale-Spin, an interactive program that writes stories. Proceedings of the 5th International Joint Conference on Artificial Intelligence (pp. 91-98). USA: Morgan Kaufmann Publishers Inc.

Ontanon, S., & Zhu, J. (2011). The SAM algorithm for analogy-based story generation. Proceedings of the 9th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (pp. 67-72). USA: AIIDE

Thongkaol, S. (2007). Artificial Intelligence. Songkhla: Faculty of Science and Technology Songkhla Rajabhat University. (in Thai)

Downloads

Published

2024-04-26

How to Cite

Daengcharoen, S. ., Charungchit, K. ., & Khancome, . C. . (2024). Automatic Thai Sentences Generation Engine Using Cross-Product Operation in Relational Database. EAU Heritage Journal Science and Technology (Online), 18(1), 176–193. retrieved from https://he01.tci-thaijo.org/index.php/EAUHJSci/article/view/265365

Issue

Section

Research Articles