Automatically Correcting Data with Noisy Labels for Improving Training Set of Sentiment Classification Domain

Please use this identifier to cite or link to this item: http://202.28.34.124/dspace/handle123456789/3631

Full metadata record

DC Field	Value	Language
dc.contributor	Thananchai Khamket	en
dc.contributor	ธนันชัย คำเกตุ	th
dc.contributor.advisor	Jantima Polpinij	en
dc.contributor.advisor	จันทิมา พลพินิจ	th
dc.contributor.other	Mahasarakham University	en
dc.date.accessioned	2026-04-22T09:47:56Z	-
dc.date.available	2026-04-22T09:47:56Z	-
dc.date.created	2025
dc.date.issued	19/5/2025
dc.identifier.uri	http://202.28.34.124/dspace/handle123456789/3631	-
dc.description.abstract	Sentiment classification is crucial in natural language processing, but noisy or mislabeled data can significantly degrade model performance. This study proposes an automated label correction method to improve training data quality before applying sentiment classification models. The research introduces the Polarity Label Analyzer, a predictive model developed using sentence-level sentiment analysis, which detects and corrects mislabeled sentiment data to enhance classification accuracy. Three datasets of TripAdvisor hotel reviews were used in this study. The first dataset, manually validated by linguistic experts, was used to train the Polarity Label Analyzer. The second dataset, containing a mix of correctly and incorrectly labeled reviews, was used to analyze the impact of label noise on model performance. The third dataset, also validated by experts, served as a test set to assess the impact of label correction on various sentiment classification models. The study applies seven classification models KNN, Logistic Regression, Multinomial Naïve Bayes, Random Forest, SVM with a Linear Kernel, CNN, and BERT Base to evaluate the effect of label correction. The results show significant improvements in accuracy and F1-score across all models when trained on corrected data. SVM performed best among traditional models, while BERT Base achieved the highest accuracy (0.95) and F1-score (0.94), highlighting the importance of label quality for deep learning models. Findings suggest that correcting noisy labels before training significantly enhances sentiment classification models, especially for deep learning architectures like CNN and BERT. The Polarity Label Analyzer proves to be a valuable tool for improving training set quality, reinforcing the importance of data reliability in sentiment analysis tasks.	en
dc.description.abstract	-	th
dc.language.iso	en
dc.publisher	Mahasarakham University
dc.rights	Mahasarakham University
dc.subject	Sentiment classification	en
dc.subject	Noisy label correction	en
dc.subject	Polarity Label Analyzer	en
dc.subject	Machine learning	en
dc.subject	Deep learning	en
dc.subject.classification	Computer Science	en
dc.subject.classification	Information and communication	en
dc.title	Automatically Correcting Data with Noisy Labels for Improving Training Set of Sentiment Classification Domain	en
dc.title	การแก้ไขข้อมูลที่ลาเบลไม่ถูกต้องแบบอัตโนมัติเพื่อปรับปรุงคุณภาพข้อมูลชุดสอนสำหรับโดเมนการจำแนกความรู้สึก	th
dc.type	Thesis	en
dc.type	วิทยานิพนธ์	th
dc.contributor.coadvisor	Jantima Polpinij	en
dc.contributor.coadvisor	จันทิมา พลพินิจ	th
dc.contributor.emailadvisor	Jantima.p@msu.ac.th
dc.contributor.emailcoadvisor	Jantima.p@msu.ac.th
dc.description.degreename	Doctor of Philosophy (Ph.D.)	en
dc.description.degreename	ปรัชญาดุษฎีบัณฑิต (ปร.ด.)	th
dc.description.degreelevel	Doctoral Degree	en
dc.description.degreelevel	ปริญญาเอก	th
dc.description.degreediscipline	สาขาเทคโนโลยีสารสนเทศ	en
dc.description.degreediscipline	สาขาเทคโนโลยีสารสนเทศ	th
Appears in Collections:	The Faculty of Informatics

Files in This Item:

File	Description	Size	Format
65011293501.pdf		3.5 MB	Adobe PDF	View/Open

Show simple item record