Developing an Association Framework of Automatic Question and Answering for Community Tourism

Please use this identifier to cite or link to this item: http://202.28.34.124/dspace/handle123456789/1872

Title:	Developing an Association Framework of Automatic Question and Answering for Community Tourism การพัฒนากรอบการทำงานหาความสัมพันธ์คำถามคำตอบอัตโนมัติสำหรับท่องเที่ยวชุมชน
Authors:	Umaporn Chaisoong อุมาพร ไชยสูง Chatklaw Jareanpon ฉัตรเกล้า เจริญผล Mahasarakham University. The Faculty of Informatics
Keywords:	การค้นคืนเอกสาร การตัดคำ TF-IDF การวัดความคล้ายคลึงแบบโคไซน์ การประมวลผลภาษาธรรมชาติ การท่องเที่ยว Information retrieval Thai word segmentation TF-IDF Cosine similarity Natural language processing Tourism
Issue Date:	7
Publisher:	Mahasarakham University
Abstract:	Information retrieval (IR) is one of the Natural Language Processing (NLP). The purpose of the research is to develop an association framework of automatic questions and answers resolving the problem of spatial information access of tourists in a case study of the South Northeastern Region, THAILAND. The difficulty of Thai language processing is that the language has no space in a sentence, without punctuation or word separation. Correctly tokenizing or separating words affects precision and accuracy. The efficient data access of the tourists is a challenge of this research. The similarity method called the Cosine Similarity technique is based on the Vector Space Model (VSM) well-known and efficient, by bringing the outstanding points of Text Vectorization to calculate and acquire the crucial features for being the document representatives efficiently. The result in this stage is Bag of Words, in all 19,501 terms, from 1,237 documents. For the evaluation of model effectiveness an Accuracy value of 96%, which best indicates the ability to describe the answer and effectiveness of the model. การค้นคืนสารสนเทศ (Information Retrieval: IR) เป็นแขนงหนึ่งของการประมวลผลภาษาธรรมชาติ (Natural Language Processing: NLP) วัตถุประสงค์ของการพัฒนากรอบการทำงานหาความสัมพันธ์คำถามคำตอบอัตโนมัติสำหรับท่องเที่ยวชุมชน เพื่อแก้ไขปัญหาการเข้าถึงข้อมูลเชิงพื้นที่ของนักท่องเที่ยวโดยชุมชนอีสานใต้ ถือเป็นความท้าทายของการประมวลผลภาษาไทยและปัญหาของการเขียนภาษาไทย อาทิเช่นการเขียนติดกันโดยไม่มีช่องว่างหรือการแบ่งแยกคำ (Tokenize) ดังนั้นการตัดแบ่งคำที่ถูกต้องจึงส่งผลต่อความแม่นจำและความถูกต้องในการเข้าถึงข้อมูลอย่างมีประสิทธิภาพของนักท่องเที่ยว งานวิจัยในครั้งนี้จึงนำวิธีการวัดความคล้ายคลึง (Cosine Similarity) ทำงานบนฐานของ Vector Space Model (VSM) ที่ใช้กันอย่างแพร่หลาย โดยนำจุดเด่นของ Text Vectorization มาคำนวณเพื่อให้ได้มาซึ่งคุณสมบัติของคำสามารถเป็นตัวแทนเอกสารอย่างมีประสิทธิภาพ ผลการทดลองพบว่า ได้ถุงคำศัพท์ (Bag of Words: BoW) จำนวน 19,501 คำ (Term) จากเอกสารทั้งหมด 1,237 เอกสาร สำหรับการประเมินประสิทธิภาพของแบบจำลองให้ค่าความถูกต้อง (Accuracy) ร้อยละ 99 บอกถึงความสามารถในการอธิบายคำตอบและประสิทธิภาพของแบบจำลองได้อย่างแม่นยำ
Description:	Doctor of Philosophy (Ph.D.) ปรัชญาดุษฎีบัณฑิต (ปร.ด.)
URI:	http://202.28.34.124/dspace/handle123456789/1872
Appears in Collections:	The Faculty of Informatics

Files in This Item:

File	Description	Size	Format
62011261003.pdf		3.77 MB	Adobe PDF	View/Open

Show full item record