Multiclass Classification Approach for Detecting Software Bug Severity Level from Bug Reports

Please use this identifier to cite or link to this item: http://202.28.34.124/dspace/handle123456789/3632

Title:	Multiclass Classification Approach for Detecting Software Bug Severity Level from Bug Reports วิธีการจำแนกแบบหลายคลาสสำหรับการตรวจจับระดับความรุนแรงของจุดบกพร่องซอฟต์แวร์จากรายงานจุดบกพร่อง
Authors:	Kamthorn Sarawan กำธร สารวรรณ Jantima Polpinij จันทิมา พลพินิจ Mahasarakham University Jantima Polpinij จันทิมา พลพินิจ Jantima.p@msu.ac.th Jantima.p@msu.ac.th
Keywords:	การจำแนกแบบหลายคลาส ระดับความรุนแรงของจุดบกพร่อง การเรียนรู้ของเครื่อง การเรียนรู้เชิงลึก T5 Summarization Multiclass classification bug severity level machine learning BERT T5 Summarization
Issue Date:	26
Publisher:	Mahasarakham University
Abstract:	The detection and analysis of software bug reports play a critical role in enhancing problem-solving efficiency and improving software quality. However, the growing volume of bug reports in bug tracking systems presents substantial challenges in accurately classifying bug severity levels. This study proposes a multiclass classification approach for the automated detection of software bug severity levels, leveraging both machine learning and deep learning techniques. The models employed in the experiments include Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Ensemble Stacking, and Transformer-based models, specifically BERT. To improve the accuracy of severity classification, this research introduces a data augmentation strategy incorporating the T5 Summarization technique to address data imbalance issues. Furthermore, the study compares the effectiveness of this approach against other established methods, including Synthetic Minority Over-sampling Technique (SMOTE), Class Weight adjustments, and synonym replacement. The experiments were conducted using the Bugzilla bug report dataset, which was partitioned into training and testing sets in an 80:20 ratio. The experimental findings indicate that the BERT2 model, which was fine-tuned and augmented with T5 Summarization-based data augmentation, exhibited the highest performance, achieving an F1-score of 64.25% and an accuracy of 65.20%, surpassing the other tested models. The results of this study suggest that integrating Transformer-based models with data augmentation techniques can substantially enhance the effectiveness of software bug severity classification. ในปัจจุบัน การตรวจจับและวิเคราะห์รายงานจุดบกพร่องของซอฟต์แวร์เป็นกระบวนการสำคัญที่ช่วยเพิ่มประสิทธิภาพในการแก้ไขปัญหาและพัฒนาซอฟต์แวร์ อย่างไรก็ตาม ปริมาณรายงานจุดบกพร่องที่เพิ่มขึ้นในระบบติดตามจุดบกพร่อง ส่งผลให้เกิดความท้าทายในการจำแนกระดับความรุนแรงของจุดบกพร่องอย่างแม่นยำ งานวิจัยนี้จึงนำเสนอวิธีการจำแนกแบบหลายคลาสสำหรับการตรวจจับระดับความรุนแรงของจุดบกพร่องโดยอัตโนมัติ โดยใช้เทคนิคการเรียนรู้ของเครื่องและการเรียนรู้เชิงลึก โมเดลที่ใช้ในการทดลองประกอบด้วย Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Ensemble Stacking และโมเดล Transformer ได้แก่ BERT เพื่อปรับปรุงความแม่นยำของการจำแนกระดับความรุนแรง งานวิจัยนี้ได้นำเสนอวิธีการเสริมข้อมูลโดยใช้เทคนิค T5 Summarization เพื่อแก้ไขปัญหาความไม่สมดุลของข้อมูล และเปรียบเทียบกับวิธีการอื่น เช่น SMOTE, การปรับค่า Class Weight และการแทนที่คำพ้องความหมาย ทั้งนี้ ได้ทำการทดลองโดยใช้ชุดข้อมูลรายงานจุดบกพร่องจาก Mozilla โดยแบ่งข้อมูลเป็นสัดส่วน 80:20 สำหรับการฝึกและทดสอบโมเดล ผลการทดลองแสดงให้เห็นว่าโมเดล BERT2 ซึ่งได้รับการปรับแต่งพารามิเตอร์และใช้เทคนิคการเสริมข้อมูลจาก T5 Summarization ให้ผลลัพธ์ที่ดีที่สุด โดยมีค่า F1-score 64.25% และ Accuracy 65.20% ซึ่งสูงกว่าโมเดลอื่น ๆ ที่ทดสอบ งานวิจัยนี้ชี้ให้เห็นว่าการใช้โมเดล Transformer ควบคู่กับเทคนิคการเสริมข้อมูลสามารถช่วยเพิ่มประสิทธิภาพในการจำแนกระดับความรุนแรงของจุดบกพร่องได้อย่างมีนัยสำคัญ
URI:	http://202.28.34.124/dspace/handle123456789/3632
Appears in Collections:	The Faculty of Informatics

Files in This Item:

File	Description	Size	Format
65011294001.pdf		4.18 MB	Adobe PDF	View/Open

Show full item record