Development of an on-Premise Indonesian Handwriting Recognition Backend System Using Open Source Deep Learning Solution for Mobile User

Existing handwriting recognition solution on mobile app provides off premise service which means the handwriting is processed in overseas servers. Data sent to abroad servers are not under our control and could be possibly mishandled or misused. As recognizing handwriting is a complex problem, deep learning is needed. This research has the objective of developing an on-premise Indonesian handwriting recognition using open source deep learning solution. Comparison of various deep learning solution to be used in the development are done. The deep learning solution will be used to build architectures. Various database format are also compared to decide which format is suitable to gather Indonesian handwriting database. The gathered Indonesian handwriting database and built architectures are used for experiments which consists of number of Convolutional Neural Network (CNN) layers, rotation and noise data augmentation, and Gated Recurrent Unit (GRU) vs Long Short-Term Memory (LSTM). Experiment results shows that rotation data augmentation is the parameter to be change to improve word accuracy and Character Error Rate (CER). The improvement is 64.8% and 23.2% to 69.6% and 20.6% respectively.


Introduction
A salesperson who usually roams around finding clients usually gives out forms after a client purchases the goods they are selling. Later on, they will store the values in the forms (names, addresses, etc.) into computers in their office by means of scanning the document or manually typing each value. This way of storing data is time-consuming and prone to human error (mistyped values).
The solution to the previously mentioned problem is to use a handwriting recognition software. Existing handwriting recognition softwares are categorized into two types, online and offline (Plamondon & Srihari, 2000). Both of these types of software run on a desktop/laptop with an OS which means if a salesperson wanted to store the. values in the forms while out on the field then they need to bring a laptop or desktop everywhere they go, not to mention they need to boot up the laptop/desktop every time they need to scan a form. Valuable time will be wasted because of this low mobility.
By developing the Handwriting Recognition system for mobile user, then the user of said system will have high mobility. It is worth mentioning that the mobile app version of handwriting recognition system exists. However, the services they provide are off premise cloud based, meaning the handwriting we feed into the app is processed in the respective company's server which brings us to a new problem especially if the data is personal data (Name, Address, etc.) Off premise servers are located faraway physically and we have no idea what the company will do with the data sent to their servers as we have no control towards our data being sent to these off-premise servers. One example of this are Facebook giving authorization to Cambridge Analytica to access its users' personal data which amounts to more than 87 million unknowing users' personal data (Isaak & Hanna, 2018). The solution is to use on premise server which means the data is under our control and furthermore there is a law prohibiting such case like Facebook.
According to the Rancangan Undang-Undang tentang Perlindungan Data Pribadi pasal 32 (Draft of Protection of Personal Data Law article 32), which is roughly translated as Companies are not allowed to transfer users' personal data abroad except the destination country has the same protection level as this regulation, except: 1. There has been a contract between the company and the data receiver located outside of Indonesia. 2. International Agreement between countries. The transferring of user's personal data by companies is prohibited with some exception. To solve this, an on-premise system implementation which ensures that users' data are stored only in the respective company's server is needed. The on-premise requirement can be fulfilled by using an open-source library such as TensorFlow, Keras, PyTorch, etc.
Handwriting Recognition is a complex problem in which difficulties arise when reading different styles of handwriting since everybody have their own personal style of writing. Humans can recognize handwriting with near 100 percent when presented with fairly simple and clear pictures (no noise) of handwriting. However, when presented with increasing distortion, noise, and slants humans' performances are outperformed by computers by 5 percent (Chellapilla et al., 2005).

Related works
From literature review, comparisons between existing deep learning libraries had been made [4]. The libraries being compared consists of Theano, Torch, Caffe, Tensorflow, and DeepLearning4J. All libraries were tested using the MNIST database (Deng, 2012) with the task of classifying the digits present in the database. The architecture used was a Fully Connected Neural Network. Several categories are used for comparison purposes. The categories are Training Time, Prediction Time, Accuracy, and Lines of Code. The result shows that Tensorflow and Theano was two of them that performed better than the rest. Tensorflow performed best in prediction time and lines of code while Theano performed best in training time, accuracy, and lines of code. Tensorflow was eventually chosen since further research shows that Theano has been discontinued.
Papers show that IAM Database is generally used to train and test the architecture for detecting handwriting (Chowdhury & Vig, 2018) (Shi, Bai, & Yao, 2015). IAM Database format is word in an image with close margins between the word and the image border. The type of the images are also PNG.
The architecture used in the papers for detecting and translating handwritten words is CRNN (Chowdhury & Vig, 2018) (Shi, Bai, & Yao, 2015). CRNN comprises of CNN layers combined with RNN layers right after it. The architecture in the papers discussed the usage of 7 layers of CNN and 2 layers of bidirectional LSTM for the RNN. CNN layers helped with picking up features of an image that are used for classification purposes while RNN helps with the pattern recognition which in this case is the sequence of alphabets in a word.
Papers also shows that metrics that is used in evaluating and measuring the performance of the architecture consists of two metrics, Word Error Rate (WER) and Character Error Rate (CER) (Chowdhury & Vig, 2018) (Shi, Bai, & Yao, 2015). Word Error Rate measures the number of wrong words classified in a line of words or sentence while Character Error Rate measures the number of wrong characters classified in a word.
Development of a python backend are done by using Flask (Aslam, 2015). Flask is a micro framework for python that is useful in creating web applications.

Research methods
The research goes through several steps which is shown in figure 1. It is started by finding the suitable deep learning on premise solution. The solution is used to build the handwriting recognition. After the solution is found, the next step is finding a suitable handwriting database format. The format is used to gather handwriting. The collected handwriting then goes through pre-processing with the suitable handwriting database format as a guide. After pre-processing, the database is split into testing set and training set. An initial deep learning architecture are built to act as a baseline for later experiments. Literature Review is the reference to build the architecture. The aforementioned database is used to train and test the architecture in that particular order. VOL. 7, NO. 2, 2020

Figure 1. Research Overview
The results of testing the database are used for evaluation. After finding the optimal architecture, a backend is constructed with a frontend built later on to proof the backend system works. Table 1 is the summary of all the charts present in the paper in (Kovalev, Kalinovsky, & Kovalev, 2016). For training time, prediction time, and accuracy, the average is used. The first and second ranking of each category is in bold. Theano performed best in Training Time, Accuracy, and Lines of Code. Tensorflow followed in Prediction Time and Lines of Code. However, further research found that Theano has been discontinued.

Evaluate Architecture
The initial architecture consists of 7 layers of CNN and 2 bidirectional LSTM. The result from using the said architecture will be used as a baseline for improvement. The evaluation is carried out by doing 3 experiments which are number of CNN layers, Data Augmentation, and GRU vs LSTM.
Number of CNN layers are carried out by adding or subtracting the CNN layers. 5 and 6 layers are used to find out whether lower number of layers yield better result while 8 and 9 layers are used for higher number of layers.
As illustrated in figure 3, data augmentation is carried out by adding rotation, noise, and combination of rotation and noise. Rotation value are randomized between 1 and 4. Noise value are randomized between 0.05 and 0.8 rounded to 2 numbers behind zero. The experiments are evaluated using Word Accuracy and Character Error Rate (CER). (1) Word Accuracy =

Number of words spelled correctly
(2)  Applying rotation augmentation when using Gathered Handwriting improves the initial experiments the most when compared with CNN layers and changing LSTM to GRU. This is due to more variation of an image learnt by the system. However, when using IAM Database it shows that more layers of CNN in this case 9 layers improves the initial experiment the most when compared with the Data Augmentation and changing LSTM to GRU.

Results and discussion
Observation shows that using 8 layers of CNN and adding rotation augmentation improve the Word Accuracy and Character Error Rate (CER) of the initial experiment. Therefore, the combination of them would be done for research purpose.  Table 3 shows result of using 8 layers with the input image being rotated. By combining the result of the number of CNN layers experiment and data augmentation experiment, the initial experiment's results are improved by 5.3% and 1.3% for Word Accuracy and CER respectively.

Conclusion
In conclusion, the development of an on premise Indonesian Handwriting Recognition Backend System using Open Source Deep Learning Solution for Mobile User are built by using Tensorflow to provide the on premise capabilities, IAM Database format to gather Indonesian Handwriting Data, and adding rotation augmentation helps achieve improved result from the result of the initial experiment. However, in terms of parameter to be changed it is different in terms of data set. Adding data augmentation in particular rotation will help improve your Word Accuracy and CER if the database used amounts to 2617 data, 54 variety of words, and 50 writers while if the database is 115000 in amount, more than 54 variety of words, and 657 writers then more layers of CNN help achieve improvement. It is also important to note that combination of rotation augmentation and 8 layers CNN, since both shows best result in their respective experiments, improves the result of the 8-layer experiment results with 0.5% for Word Accuracy.