Improvement of prediction technique for adolescent depression disorder

Please use this identifier to cite or link to this item: http://202.28.34.124/dspace/handle123456789/1873

Title:	Improvement of prediction technique for adolescent depression disorder การปรับปรุงเทคนิคการพยากรณ์โรคซึมเศร้าในวัยรุ่น
Authors:	Wongpanya Nuankaew วงษ์ปัญญา นวนแก้ว Chatklaw Jareanpon ฉัตรเกล้า เจริญผล Mahasarakham University. The Faculty of Informatics
Keywords:	การปรับปรุงเทคนิคการพยากรณ์ การจำแนกข้อมูล ค่าน้ำหนักความน่าจะเป็น การจำแนกโรคซึมเศร้า เอ็นเซมเบิล Improving data mining Classification Probability-weighted Depressive disorder classification Ensemble learning
Issue Date:	27
Publisher:	Mahasarakham University
Abstract:	Depression disorder is a mental illness caused by unbalanced levels of brain chemicals, which affect emotions, thinking, and feelings. Depressive patients exhibit different symptoms depending on the severity. It can range from self-harm to suicide. Especially, younger patients and working-age patients are more at risk of committing suicide. Early identification of depression disorder is needed so that appropriate preventative and curative care can be implemented and provided in time. Many studies have used social media post data to find indications and interpretations of the emotions, feelings, and thoughts of each social media user. To improve the classification of depression, data mining techniques were widely presented and most of them employed a single classifier. This research proposes two weight adjustment methods for better fitting the weights to improve the weighted voting ensemble of the data mining technique: 1) true positive weighted rate, and 2) average probability weighted as detailed below: 1) Extract the features into opinion features and image features by using binary term occurrence, 2) Select the optimal number of features by using the information gain method. The test data is publicly available on Twitter and Instagram. The datasets are multi-class and binary-class data, 3) Compare and measure the effectiveness of the three classification models: single classifier, unweighted voting ensemble, and weighted voting ensemble, and the ensemble method is grouped into four groups: 3-Ensemble, 4-Ensemble, 5-Ensemble, and 6-Ensemble classifier ensembles, and 4) Test the statistical significance using a paired samples t-test to compare the differences of the effectiveness of weighted voting ensemble and unweighted voting ensemble models. The results indicate that the weighted voting ensemble with a better fitting weight adjustment is more effective than the unweighted voting ensemble and single classifier. The result shows a maximum accuracy of 66.67% and 87.23%, a precision of 72.73% and 88.89%, recall of 80% and 92.57%, and F1 of 70.59% and 88.89%, consecutively. The weighted voting ensemble statistically performs better than the unweighted voting ensemble at the significance level of 0.05 for both normalcy and depression classes of data. The method can be further applied or developed in the future with careful consideration of the ratio of datasets of both classes in the training set. The number of datasets of the normalcy class is much less than that of the depression class. โรคซึมเศร้าเป็นโรคทางจิตเวชเกิดจากความไม่สมดุลของสารเคมีในสมองที่มีผลต่ออารมณ์ ความคิด ความรู้สึก การแสดงอาการของผู้ป่วยมีความแตกต่างกันอยู่ที่ระดับความรุนแรงจนถึงการทำร้ายตนเอง จนนำไปสู่การฆ่าตัวตาย โดยเฉพาะกลุ่มผู้ป่วยวัยรุ่นและวัยทำงานมีความเสี่ยงสูงต่อการเสียชีวิตจากการฆ่าตัวตาย จึงจำเป็นต้องมีวิธีการพยากรณ์โรคซึมเศร้าเพื่อหาแนวทางการป้องกันและการรักษาที่เหมาะสม มีงานวิจัยจำนวนมากได้นำข้อมูลการแสดงความคิดเห็นผ่านโซเชียลมีเดียเป็นการสะท้อนถึงอารมณ์ ความนึกคิดของผู้ใช้งานแต่ละคน ซึ่งได้นำเสนอวิธีการการปรับปรุงการพยากรณ์โรคซึมเศร้า โดยใช้เทคนิคเหมืองข้อมูลและส่วนใหญ่เป็นการพยากรณ์โดยใช้ตัวจำแนกประเภทแบบเดี่ยว แต่ในงานวิจัยนี้ได้นำเสนอวิธีการปรับปรุงค่าน้ำหนักที่เหมาะสมเพื่อการปรับปรุงเทคนิคเหมืองข้อมูลโดยใช้วิธีการเอ็นเซมเบิล ทั้งแบบกำหนดค่าน้ำหนักน้ำหนัก 2 วิธีการ ได้แก่ อัตราการทำนายถูกของคลาสคำตอบ และค่าเฉลี่ยความน่าจะเป็นของการเกิดคลาสคำตอบ โดยมีรายละเอียด ดังนี้ 1) คุณลักษณะข้อมูลจากความคิดเห็น ร่วมกับคุณลักษณะภาพ ดำเนินการสกัดคุณลักษณะด้วยวิธีการ Binary term occurrence 2) คัดเลือกคุณลักษณะที่เหมาะสมด้วยวิธีการ Information gain ข้อมูลที่ใช้ในการทดสอบนำมาจาก Twitter และ Instagram ซึ่งเป็นชุดข้อมูลแบบหลายคลาส และแบบไบนารีคลาส 3) เปรียบเทียบและวัดประสิทธิภาพการจำแนกโรคซึมเศร้าจากแบบจำลองทั้งหมด 3 ประเภท ได้แก่ ตัวจำแนกประเภทแบบเดี่ยว วิธีการเอ็นเซมเบิลแบบไม่กำหนดค่าน้ำหนัก และวิธีการเอ็นเซมเบิลแบบกำหนดค่าน้ำหนัก โดยวิธีการเอ็นเซมเบิลแบ่งออกเป็น 4 กลุ่ม ตามจำนวนตัวจำแนกประเภท ประกอบด้วย 3-เอ็นเซมเบิล 4-เอ็นเซมเบิล 5-เอ็นเซมเบิล และ 6-เอ็นเซมเบิล และ 4) ทดสอบนัยสำคัญด้วยวิธีการ Paired samples t-test เพื่อเปรียบเทียบความแตกต่างระหว่างประสิทธิภาพของวิธีการเอ็นเซมเบิลแบบกำหนดค่าน้ำหนักและแบบไม่กำหนดค่าน้ำหนัก ผลการทดลองแสดงให้เห็นว่า การปรับปรุงค่าน้ำหนักที่เหมาะโดยวิธีเอ็นเซมเบิลแบบกำหนดค่าน้ำหนักมีประสิทธิภาพดีกว่าวิธีการเอ็นเซมเบิลแบบไม่กำหนดน้ำหนักและตัวจำแนกประเภทแบบเดี่ยว มีความถูกต้อง 66.67% และ 87.23% ค่าความแม่นยำ 72.73% และ 88.89% มีค่าระลึก 80.00% และ 92.57% และค่าเฉลี่ยประสิทธิภาพโดยรวม 70.59% และ 88.89% ตามลำดับ โดยวิธีการเอ็นเซมเบิลแบบกำหนดค่าน้ำหนัก มีประสิทธิภาพมากกว่าวิธีการแบบไม่กำหนดค่าน้ำหนักอย่างมีนัยสำคัญที่ 0.05 ทั้งคลาสของบุคคลที่เป็นโรคซึมเศร้าและบุคคลปกติ แบบจำลองมีความเหมาะสมสำหรับนำไปประยุกต์หรือพัฒนาเพื่อใช้งานในอนาคต แต่ควรพิจารณาถึงอัตราส่วนระหว่างชุดข้อมูลคลาสอาการของบุคคลปกติและอาการบุคคลคนที่เป็นโรคซึมเศร้าของชุดข้อมูลสำหรับการเรียนรู้แบบจำลองในคลาสอาการของบุคคลปกติมีจำนวนน้อยกว่ามาก จึงทำให้เรียนรู้ชุดคำได้น้อย
Description:	Doctor of Philosophy (Ph.D.) ปรัชญาดุษฎีบัณฑิต (ปร.ด.)
URI:	http://202.28.34.124/dspace/handle123456789/1873
Appears in Collections:	The Faculty of Informatics

Files in This Item:

File	Description	Size	Format
62011262001.pdf		2.7 MB	Adobe PDF	View/Open

Show full item record