Please use this identifier to cite or link to this item:
http://202.28.34.124/dspace/handle123456789/3631| Title: | Automatically Correcting Data with Noisy Labels for Improving Training Set of Sentiment Classification Domain การแก้ไขข้อมูลที่ลาเบลไม่ถูกต้องแบบอัตโนมัติเพื่อปรับปรุงคุณภาพข้อมูลชุดสอนสำหรับโดเมนการจำแนกความรู้สึก |
| Authors: | Thananchai Khamket ธนันชัย คำเกตุ Jantima Polpinij จันทิมา พลพินิจ Mahasarakham University Jantima Polpinij จันทิมา พลพินิจ Jantima.p@msu.ac.th Jantima.p@msu.ac.th |
| Keywords: | Sentiment classification Noisy label correction Polarity Label Analyzer Machine learning Deep learning |
| Issue Date: | 19 |
| Publisher: | Mahasarakham University |
| Abstract: | Sentiment classification is crucial in natural language processing, but noisy or mislabeled data can significantly degrade model performance. This study proposes an automated label correction method to improve training data quality before applying sentiment classification models. The research introduces the Polarity Label Analyzer, a predictive model developed using sentence-level sentiment analysis, which detects and corrects mislabeled sentiment data to enhance classification accuracy. Three datasets of TripAdvisor hotel reviews were used in this study. The first dataset, manually validated by linguistic experts, was used to train the Polarity Label Analyzer. The second dataset, containing a mix of correctly and incorrectly labeled reviews, was used to analyze the impact of label noise on model performance. The third dataset, also validated by experts, served as a test set to assess the impact of label correction on various sentiment classification models. The study applies seven classification models KNN, Logistic Regression, Multinomial Naïve Bayes, Random Forest, SVM with a Linear Kernel, CNN, and BERT Base to evaluate the effect of label correction. The results show significant improvements in accuracy and F1-score across all models when trained on corrected data. SVM performed best among traditional models, while BERT Base achieved the highest accuracy (0.95) and F1-score (0.94), highlighting the importance of label quality for deep learning models. Findings suggest that correcting noisy labels before training significantly enhances sentiment classification models, especially for deep learning architectures like CNN and BERT. The Polarity Label Analyzer proves to be a valuable tool for improving training set quality, reinforcing the importance of data reliability in sentiment analysis tasks. - |
| URI: | http://202.28.34.124/dspace/handle123456789/3631 |
| Appears in Collections: | The Faculty of Informatics |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 65011293501.pdf | 3.5 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.