Historical Document Transcription using Deep Learning

Please use this identifier to cite or link to this item: http://202.28.34.124/dspace/handle123456789/2168

Full metadata record

DC Field	Value	Language
dc.contributor	Sarayut Gonwirat	en
dc.contributor	สรายุทธ กรวิรัตน์	th
dc.contributor.advisor	Olarik Surinta	en
dc.contributor.advisor	โอฬาริก สุรินต๊ะ	th
dc.contributor.other	Mahasarakham University	en
dc.date.accessioned	2023-09-07T14:25:21Z	-
dc.date.available	2023-09-07T14:25:21Z	-
dc.date.created	2023
dc.date.issued	22/1/2023
dc.identifier.uri	http://202.28.34.124/dspace/handle123456789/2168	-
dc.description.abstract	This thesis focuses on handwritten text recognition problems. The research aimed to approach deep learning methods to improve the performance of recognitions in a historical document. Chapter 1 briefly introduces deep learning for handwritten text recognition systems and uses deep learning techniques for analyzing and recognizing a historical document, including research questions, the objectives of the dissertation and its contributions are described. In Chapter 2, Two deep convolutional neural networks (CNNs): VGGNet and InceptionResNet, are proposed for handwritten character recognition. The proposed research investigated two learning strategies, including scratch and transfer learning, and compared them with traditional machine learning techniques of local descriptor and support vector machine. The results showed that VGGNet architecture with transfer learning can reduce learning time. Moreover, it also increased the efficiency of recognition. Chapter 3 presents solutions to problems that can reduce handwritten character recognition performance, such as image degradation, light conditions, low-resolution images, and even the quality of the capture devices. We combine the deblur generative adversarial network architecture (DeblurGAN) with a CNN called DeblurGAN-CNN. The DeblurGAN-CNN could transform the noisy characters into new clean characters and recognize clean characters simultaneously. We have evaluated and compared the experimental results of the proposed DeblurGAN-CNN architectures with the existing methods on four handwritten character datasets: n-THI-C68, n-MNIST, THI-C68, and THCC-67. For the n-THI-C68 dataset. Chapter 4 proposes the architecture of the CNN and recurrent neural network (RNN), called CRNN architecture, to predict the sequence pattern of the handwritten text images. We propose a novel cyclical data augmentation strategy called CycleAugment, to discover various local minima values and prevent overfitting. Each cycle rapidly decreased the training loss to reach a new local minima. Chapter 5 comprises two main sections: - 1) answers to the research questions and 2) future work. This chapter briefly explains the proposed approaches and answers three main research questions in handwritten text recognition using deep learning techniques. Two main approaches are planned to be the focus of future work, as follows. We might need to synthesize the handwritten text images and use them as the training set. The GAN is the best choice to study and synthesize the training set. And to enhance deep learning performance, we plan to work on the ensemble CNNs technique and combine the DeblurGAN-CNN architecture as a part of the ensemble CNNs.	en
dc.description.abstract	วิทยานิพนธ์นี้มุ่งเน้นแก้ปัญหาการรู้จำข้อความที่เขียนด้วยลายมือ เพื่อค้นหาวิธีการเรียนรู้เชิงลึกที่ใช้ปรับปรุงประสิทธิภาพของการรู้จำในเอกสารโบราณ โดยมีรายละเอียดแต่ละบทดังต่อไปนี้ บทที่ 1 เกี่ยวกับการเรียนรู้เชิงลึกซึ้งสำหรับระบบการรู้จำข้อความที่เขียนด้วยลายมือเบื้องต้น และใช้เทคนิคการเรียนรู้เชิงลึกสำหรับการวิเคราะห์และจดจำเอกสารลายมือเขียนที่เป็นเอกสารโบราณ รวมถึงคำถามของงานวิจัย วัตถุประสงค์และผลงานที่สร้างขึ้นในวิทยานิพนธ์ฉบับนี้ ในบทที่ 2 มีการเสนอโครงข่ายประสาทเทียมแบบคอนโวลูชัน (Convolutional Neural Network) ประกอบด้วย VGGNet และ InceptionResNet เพื่อใช้สำหรับการรู้จำอักษรลายมือเขียน โดยได้นำเสนอกลยุทธ์การเรียนรู้สองแบบ ได้แก่ การเรียนรู้แบบใหม่จากการสุ่มเริ่มต้น (Scratch Learning) และแบบจากการส่งต่อความรู้เดิม (Transfer Learning) และเปรียบเทียบกับเทคนิคการเรียนรู้ของเครื่องแบบเดิมการใช้วิธีการตัวอธิบายแบบโลคอล (Local Descriptor) และการเรียนรู้ด้วยวิธีซัพพอร์ตเวกเตอร์แมชชีน (Support Vector Machine: SVM) ผลการวิจัยพบว่าสถาปัตยกรรมแบบ VGG ด้วยวิธีการเรียนรู้จากการส่งต่อความรู้เดิมสามารถลดเวลาการเรียนรู้ได้ นอกจากนี้ยังเพิ่มประสิทธิภาพในการจดจำ บทที่ 3 นำเสนอวิธีแก้ไขปัญหาที่ประสิทธิภาพการรู้จำอักษรลายมือเขียนถูกทำให้ลดลงจากผลกระทบ เช่น การลดลงของภาพ สภาพแสง ภาพที่มีความละเอียดต่ำ และคุณภาพของอุปกรณ์จับภาพที่ไม่ดี เป็นต้น ได้เสนอการรวมสถาปัตยกรรม DeblurGAN กับ CNN เรียกว่า DeblurGAN-CNN โดยวิธี DeblurGAN-CNN สามารถเปลี่ยนภาพตัวอักษรที่ถูกรบกวนให้เป็นอักษรที่ความชัดเจนมากขึ้น และเรียนรู้จดจำภาพตัวอักษรไปพร้อมกัน การทดลองได้เปรียบเทียบวิธีการ DeblurGAN-CNN และ CNN แสดงให้เห็นประสิทธิภาพที่เพิ่มขึ้นกับชุดข้อมูล ประกอบด้วย n-THI-C68 n-MNIST THI-C68 และ THCC-67 บทที่ 4 ได้นำเสนอการรวมกันของ CNN และโครงข่ายประสาทเทียบแบบป้อนย้อนกลับ (Recurrent Neural Network: RNN) เรียกว่า CRNN เพื่อทำนายรูปแบบลำดับของภาพข้อความที่เขียนด้วยลายมือ และได้ประยุกต์ใช้กลยุทธ์การเพิ่มข้อมูลแบบวัฏจักร ซึ่งเป็นวิธีการใหม่ เรียกว่า CycleAugment โดยวิธีการนี้ได้ค้นหาค่าโลคอลต่ำสุด (Local Minima) ที่หลากหลายและป้องกันการเทรนเกินพอดี (Overfitting) ซึ่งแต่ละรอบของการเทรนลดค่าความผิดพลาด (Loss) ที่ลดลงเพื่อเปลี่ยนไปยังโลคอลต่ำสุดใหม่ บทที่ 5 ประกอบด้วยสองส่วนหลัก ได้แก่ การตอบคำถามของงานวิจัยและงานวิจัยที่จะทำต่อไปในอนาคต โดยได้อธิบายเกี่ยวกับแนวทางที่นำเสนอแบบสรุปและตอบคำถามงานวิจัยสามข้อหลักทางด้านการรู้จำข้อความลายมือเขียนโดยใช้เทคนิคการเรียนรู้เชิงลึก ทั้งนี้งานวิจัยที่จะทำต่อไปในอนาคต ได้เสนอให้ใช้การสังเคราะห์รูปภาพข้อความที่เขียนด้วยลายมือและใช้เป็นชุดการเทรน ซึ่งวิธี GAN เป็นทางเลือกที่ดีในการศึกษาและใช้สังเคราะห์ภาพขึ้น นอกจากนี้ได้นำเสนอการเพิ่มประสิทธิภาพการเรียนรู้เชิงลึกด้วยเทคนิคการตัดสินใจรวมหลายโมเดล (Ensemble Learning) และรวมวิธี DeblurGAN-CNN เป็นส่วนหนึ่งของโมเดลนั้น	th
dc.language.iso	en
dc.publisher	Mahasarakham University
dc.rights	Mahasarakham University
dc.subject	การรู้จำตัวอักขระลายมือเขียน	th
dc.subject	การปรับปรุงคุณภาพของภาพจากสิ่งรบกวน	th
dc.subject	การรู้จำข้อความลายมือเขียน	th
dc.subject	โครงข่ายประสาทเทียมก่อกำเนิดแบบมีคู่ปรปักษ์	th
dc.subject	โครงข่ายแบบคอนโวลูชัน	th
dc.subject	เทคนิคการเพิ่มข้อมูลรูปภาพแบบวัฏจักร	th
dc.subject	โครงข่ายแบบคอนโวลูชันย้อนกลับ	th
dc.subject	โครงข่ายประสาทเทียมก่อกำเนิดแบบมีคู่ปรปักษ์สำหรับการถอดภาพเบลอ	th
dc.subject	Handwritten character recognition	en
dc.subject	Denoising image	en
dc.subject	Handwritten text recognition	en
dc.subject	Generative adversarial network	en
dc.subject	Convolutional neural network	en
dc.subject	DeblurGAN	en
dc.subject	convolutional recurent neural network	en
dc.subject	Cycle agumentation	en
dc.subject.classification	Computer Science	en
dc.subject.classification	Information and communication	en
dc.subject.classification	Computer science	en
dc.title	Historical Document Transcription using Deep Learning	en
dc.title	การถอดความเอกสารโบราณด้วยการเรียนรู้เชิงลึก	th
dc.type	Thesis	en
dc.type	วิทยานิพนธ์	th
dc.contributor.coadvisor	Olarik Surinta	en
dc.contributor.coadvisor	โอฬาริก สุรินต๊ะ	th
dc.contributor.emailadvisor	olarik.s@msu.ac.th
dc.contributor.emailcoadvisor	olarik.s@msu.ac.th
dc.description.degreename	Doctor of Philosophy (Ph.D.)	en
dc.description.degreename	ปรัชญาดุษฎีบัณฑิต (ปร.ด.)	th
dc.description.degreelevel	Doctoral Degree	en
dc.description.degreelevel	ปริญญาเอก	th
dc.description.degreediscipline	สาขาเทคโนโลยีสารสนเทศ	en
dc.description.degreediscipline	สาขาเทคโนโลยีสารสนเทศ	th
Appears in Collections:	The Faculty of Informatics

Files in This Item:

File	Description	Size	Format
61011262003.pdf		3.42 MB	Adobe PDF	View/Open

Show simple item record