Feature Selection Of Thai People’s Sentiment Towards Covid-19 On Social Media With Text Mining

Please use this identifier to cite or link to this item: http://202.28.34.124/dspace/handle123456789/1681

Title:	Feature Selection Of Thai People’s Sentiment Towards Covid-19 On Social Media With Text Mining การคัดเลือกคุณลักษณะความรู้สึกของคนไทยต่อโรคโควิด 19 บนสื่อสังคมออนไลน์ด้วยเหมืองข้อความ
Authors:	Sarawut Kedtarwon ศราวุฒิ เกิดถาวร Jaree Thongkam จารี ทองคำ Mahasarakham University. The Faculty of Informatics
Keywords:	เหมืองความคิดเห็น จำแนกความคิดเห็น โควิด19 Opinion Mining classification COVID19
Issue Date:	24
Publisher:	Mahasarakham University
Abstract:	The objective of this research was to study a method of word-weighted trait selection to create an effective model for discriminating the opinions of Thai people on COVID-19 by using text mining principles. and to create and compare the performance of The Thai sentiment model for COVID-19 collected 2,920 opinions through an opinion mining process. Then build a vocabulary of all 9,037 words, and select only the adverbs that express feelings well. come as a feature word Indicated according to positive and negative meanings, 236 words were left, then 2 forms of trait selection were made. Scheme 1 Chi-Square TFIDF and BM25 traits. Characteristics were 83 words. Attributes were selected with TFIDF and BM25, discriminant models were created and performance was measured using 6 techniques: decision tree technique. vector machine support techniques Technique Na Eve Bay Caniers Neighbor Technique Multi-layer perceptron technique deep learning techniques Then apply the principle 10-fold cross validation To segment the data into learning datasets and test datasets. and measure the performance of the model It was found that when reducing the dimensions by selecting the Chi-Square attribute, the characteristic word remained. Then the traits were selected with TFIDF and BM25. The trait selection was most suitable for the classification technique. It was found that the precision, recall, F-measure, trait selection results with TFIDF and BM25 at KNN and MLP techniques were the highest at 99.20%. งานวิจัยฉบับนี้มีวัตถุประสงค์เพื่อศึกษาวิธีการการคัดเลือกคุณลักษณะด้วยการให้น้ำหนักคำเพื่อใช้ในการสร้างแบบจำลองที่ได้ประสิทธิภาพในการจำแนกความคิดเห็นของคนไทยต่อโรคโควิด 19 โดยใช้หลักการเหมืองข้อความ และเพื่อสร้างและเปรียบเทียบประสิทธิภาพของ แบบบจำลองความรู้สึกของคนไทยต่อโรคโควิด 19 ได้รวบรวมความคิดเห็นด้วยข้อมูลจำนวน 2,920 ความคิดเห็น ผ่านกระบวนการเหมืองความคิดเห็น แล้วสร้างคลังคำศัพท์ได้ทั้งหมด 9,037 คำศัพท์ แล้วคัดเลือกเฉพาะคำวิเศษณ์ที่เป็นคำที่บ่งบอกถึงความรู้สึกได้ดี มาเป็นคำคุณลักษณะ ระบุตามความหมายเชิงบวกและเชิงลบ คงเหลือจำนวน 236 คำ แล้วการคัดเลือกคุณลักษณะ 2 รูปแบบ รูปแบบที่ 1 การคัดเลือกคุณลักษณะด้วย Chi-Square TFIDF และ BM25 รูปแบบที่ 2 เมื่อผ่านการคัดเลือกคุณลักษณะด้วย Chi-Square คงเหลือคำที่เป็นคุณลักษณะ จำนวน 83 คำ แล้วจึงทำการคัดเลือกคุณลักษณะด้วย TFIDF และ BM25 แล้วทำการสร้างแบบจำลองจำแนกแล้วนำมาวัดประสิทธิภาพ 6 เทคนิค ได้แก่ เทคนิคต้นไม้ตัดสินใจ เทคนิคซัพพอร์ทเวกเตอร์แมชชีน เทคนิคนาอีฟเบย์ เทคนิคเคเนียเรสเนเบอร์ เทคนิคเพอร์เซปตรอนหลายชั้น เทคนิคแบบระบบเรียนรู้เชิงลึก จากนั้นใช้หลักการ 10-โฟลด์ครอสวาลิเดชั่น ในการแบ่งกลุ่มข้อมูลเป็นชุดข้อมูลการเรียนรู้และชุดข้อมูลทดสอบ และวัดประสิทธิภาพของแบบจำลอง พบว่าเมื่อลดมิติข้อมูลลงด้วยการคัดเลือกคุณลักษณะด้วย Chi-Square คงเหลือคำที่เป็นคุณลักษณะ แล้วจึงทำการคัดเลือกคุณลักษณะด้วย TFIDF และ BM25 การคัดเลือกคุณลักษณะเหมาะสมกับเทคนิคการจำแนกมากที่สุด พบว่า ค่าความแม่นยำ (Precision) ค่าความระลึก (Recall) ค่าความถ่วงดุล (F-measure) ผลการคัดเลือกคุณลักษณะด้วย TFIDF และ BM25 ที่เทคนิค KNN และ MLP สูงที่สุด ร้อยละ 99.20
Description:	Master of Science (M.Sc.) วิทยาศาสตรมหาบัณฑิต (วท.ม.)
URI:	http://202.28.34.124/dspace/handle123456789/1681
Appears in Collections:	The Faculty of Informatics

Files in This Item:

File	Description	Size	Format
63011283001.pdf		3.01 MB	Adobe PDF	View/Open

Show full item record