Historical Document Transcription using Deep Learning

Please use this identifier to cite or link to this item: http://202.28.34.124/dspace/handle123456789/2168

Title:	Historical Document Transcription using Deep Learning การถอดความเอกสารโบราณด้วยการเรียนรู้เชิงลึก
Authors:	Sarayut Gonwirat สรายุทธ กรวิรัตน์ Olarik Surinta โอฬาริก สุรินต๊ะ Mahasarakham University Olarik Surinta โอฬาริก สุรินต๊ะ olarik.s@msu.ac.th olarik.s@msu.ac.th
Keywords:	การรู้จำตัวอักขระลายมือเขียน การปรับปรุงคุณภาพของภาพจากสิ่งรบกวน การรู้จำข้อความลายมือเขียน โครงข่ายประสาทเทียมก่อกำเนิดแบบมีคู่ปรปักษ์ โครงข่ายแบบคอนโวลูชัน เทคนิคการเพิ่มข้อมูลรูปภาพแบบวัฏจักร โครงข่ายแบบคอนโวลูชันย้อนกลับ โครงข่ายประสาทเทียมก่อกำเนิดแบบมีคู่ปรปักษ์สำหรับการถอดภาพเบลอ Handwritten character recognition Denoising image Handwritten text recognition Generative adversarial network Convolutional neural network DeblurGAN convolutional recurent neural network Cycle agumentation
Issue Date:	22
Publisher:	Mahasarakham University
Abstract:	This thesis focuses on handwritten text recognition problems. The research aimed to approach deep learning methods to improve the performance of recognitions in a historical document. Chapter 1 briefly introduces deep learning for handwritten text recognition systems and uses deep learning techniques for analyzing and recognizing a historical document, including research questions, the objectives of the dissertation and its contributions are described. In Chapter 2, Two deep convolutional neural networks (CNNs): VGGNet and InceptionResNet, are proposed for handwritten character recognition. The proposed research investigated two learning strategies, including scratch and transfer learning, and compared them with traditional machine learning techniques of local descriptor and support vector machine. The results showed that VGGNet architecture with transfer learning can reduce learning time. Moreover, it also increased the efficiency of recognition. Chapter 3 presents solutions to problems that can reduce handwritten character recognition performance, such as image degradation, light conditions, low-resolution images, and even the quality of the capture devices. We combine the deblur generative adversarial network architecture (DeblurGAN) with a CNN called DeblurGAN-CNN. The DeblurGAN-CNN could transform the noisy characters into new clean characters and recognize clean characters simultaneously. We have evaluated and compared the experimental results of the proposed DeblurGAN-CNN architectures with the existing methods on four handwritten character datasets: n-THI-C68, n-MNIST, THI-C68, and THCC-67. For the n-THI-C68 dataset. Chapter 4 proposes the architecture of the CNN and recurrent neural network (RNN), called CRNN architecture, to predict the sequence pattern of the handwritten text images. We propose a novel cyclical data augmentation strategy called CycleAugment, to discover various local minima values and prevent overfitting. Each cycle rapidly decreased the training loss to reach a new local minima. Chapter 5 comprises two main sections: - 1) answers to the research questions and 2) future work. This chapter briefly explains the proposed approaches and answers three main research questions in handwritten text recognition using deep learning techniques. Two main approaches are planned to be the focus of future work, as follows. We might need to synthesize the handwritten text images and use them as the training set. The GAN is the best choice to study and synthesize the training set. And to enhance deep learning performance, we plan to work on the ensemble CNNs technique and combine the DeblurGAN-CNN architecture as a part of the ensemble CNNs. วิทยานิพนธ์นี้มุ่งเน้นแก้ปัญหาการรู้จำข้อความที่เขียนด้วยลายมือ เพื่อค้นหาวิธีการเรียนรู้เชิงลึกที่ใช้ปรับปรุงประสิทธิภาพของการรู้จำในเอกสารโบราณ โดยมีรายละเอียดแต่ละบทดังต่อไปนี้ บทที่ 1 เกี่ยวกับการเรียนรู้เชิงลึกซึ้งสำหรับระบบการรู้จำข้อความที่เขียนด้วยลายมือเบื้องต้น และใช้เทคนิคการเรียนรู้เชิงลึกสำหรับการวิเคราะห์และจดจำเอกสารลายมือเขียนที่เป็นเอกสารโบราณ รวมถึงคำถามของงานวิจัย วัตถุประสงค์และผลงานที่สร้างขึ้นในวิทยานิพนธ์ฉบับนี้ ในบทที่ 2 มีการเสนอโครงข่ายประสาทเทียมแบบคอนโวลูชัน (Convolutional Neural Network) ประกอบด้วย VGGNet และ InceptionResNet เพื่อใช้สำหรับการรู้จำอักษรลายมือเขียน โดยได้นำเสนอกลยุทธ์การเรียนรู้สองแบบ ได้แก่ การเรียนรู้แบบใหม่จากการสุ่มเริ่มต้น (Scratch Learning) และแบบจากการส่งต่อความรู้เดิม (Transfer Learning) และเปรียบเทียบกับเทคนิคการเรียนรู้ของเครื่องแบบเดิมการใช้วิธีการตัวอธิบายแบบโลคอล (Local Descriptor) และการเรียนรู้ด้วยวิธีซัพพอร์ตเวกเตอร์แมชชีน (Support Vector Machine: SVM) ผลการวิจัยพบว่าสถาปัตยกรรมแบบ VGG ด้วยวิธีการเรียนรู้จากการส่งต่อความรู้เดิมสามารถลดเวลาการเรียนรู้ได้ นอกจากนี้ยังเพิ่มประสิทธิภาพในการจดจำ บทที่ 3 นำเสนอวิธีแก้ไขปัญหาที่ประสิทธิภาพการรู้จำอักษรลายมือเขียนถูกทำให้ลดลงจากผลกระทบ เช่น การลดลงของภาพ สภาพแสง ภาพที่มีความละเอียดต่ำ และคุณภาพของอุปกรณ์จับภาพที่ไม่ดี เป็นต้น ได้เสนอการรวมสถาปัตยกรรม DeblurGAN กับ CNN เรียกว่า DeblurGAN-CNN โดยวิธี DeblurGAN-CNN สามารถเปลี่ยนภาพตัวอักษรที่ถูกรบกวนให้เป็นอักษรที่ความชัดเจนมากขึ้น และเรียนรู้จดจำภาพตัวอักษรไปพร้อมกัน การทดลองได้เปรียบเทียบวิธีการ DeblurGAN-CNN และ CNN แสดงให้เห็นประสิทธิภาพที่เพิ่มขึ้นกับชุดข้อมูล ประกอบด้วย n-THI-C68 n-MNIST THI-C68 และ THCC-67 บทที่ 4 ได้นำเสนอการรวมกันของ CNN และโครงข่ายประสาทเทียบแบบป้อนย้อนกลับ (Recurrent Neural Network: RNN) เรียกว่า CRNN เพื่อทำนายรูปแบบลำดับของภาพข้อความที่เขียนด้วยลายมือ และได้ประยุกต์ใช้กลยุทธ์การเพิ่มข้อมูลแบบวัฏจักร ซึ่งเป็นวิธีการใหม่ เรียกว่า CycleAugment โดยวิธีการนี้ได้ค้นหาค่าโลคอลต่ำสุด (Local Minima) ที่หลากหลายและป้องกันการเทรนเกินพอดี (Overfitting) ซึ่งแต่ละรอบของการเทรนลดค่าความผิดพลาด (Loss) ที่ลดลงเพื่อเปลี่ยนไปยังโลคอลต่ำสุดใหม่ บทที่ 5 ประกอบด้วยสองส่วนหลัก ได้แก่ การตอบคำถามของงานวิจัยและงานวิจัยที่จะทำต่อไปในอนาคต โดยได้อธิบายเกี่ยวกับแนวทางที่นำเสนอแบบสรุปและตอบคำถามงานวิจัยสามข้อหลักทางด้านการรู้จำข้อความลายมือเขียนโดยใช้เทคนิคการเรียนรู้เชิงลึก ทั้งนี้งานวิจัยที่จะทำต่อไปในอนาคต ได้เสนอให้ใช้การสังเคราะห์รูปภาพข้อความที่เขียนด้วยลายมือและใช้เป็นชุดการเทรน ซึ่งวิธี GAN เป็นทางเลือกที่ดีในการศึกษาและใช้สังเคราะห์ภาพขึ้น นอกจากนี้ได้นำเสนอการเพิ่มประสิทธิภาพการเรียนรู้เชิงลึกด้วยเทคนิคการตัดสินใจรวมหลายโมเดล (Ensemble Learning) และรวมวิธี DeblurGAN-CNN เป็นส่วนหนึ่งของโมเดลนั้น
URI:	http://202.28.34.124/dspace/handle123456789/2168
Appears in Collections:	The Faculty of Informatics

Files in This Item:

File	Description	Size	Format
61011262003.pdf		3.42 MB	Adobe PDF	View/Open

Show full item record