Project Architecture

02.16.09

The Project - OCRDROID is basically a two tire system. Its mainly divided into two parts. The first part is basically application dependent while other is application independent. We will focus the talking of the first part considering the application - POCKETPAL build on this architecture. The other applications have the same structure of OCRDROID but with some difference in utility, design and structure. The central idea of OCRDROID remains same for each and every applications built on it.

1. Front End Part : This part is the android phone. This can also be said at the client side of the application. The user can take the photograph of the receipt and has the option to perform OCR. When he clicks on the OCR, the image is sent to the webserver for the OCR processing. The front end part effeciently handles the problems of Perception Distortion, Blurness, Lightening as well as Misalignment. We have implemented the algorithms on the front end to carefully calculate the orientation using the orientation sensors - Pitch and Roll and hence this successfully avoids the Perception Distortion. We have implemted two algorithms on the front end for calulating the alignment as accurately as possible. The algorithm we implemented is derived from light weight modified version of Sauvola's algorithm and other alogorithm we implemented is for Noise reduction in recognition of receipts. The full flow can be visualized from below image.

Before Moving to the other part, i.e the backend part, lets discuss the webbserver configuration and why we used that. We have used the Linux Servers with Apache2 installed on it to make a suitable web server. We have installed on that machine, the PHP as the web scripting language and we have installed libraries for the image processing. Here we have employed the use of Google Ocropus Project and Tesseract as OCR engine for the processing and OCR of the image respectively on the backend. All the components used in this project are open source. The Server configurations can be well visualized from the image below.

2. Backend End Part : This part is the webserver where the processing of the OCR occurs. We have used the Linux server running Apache and PHP as the scripting language to handle the incomming the uploaded images from android mobile phones. We have basically used the open source components and platform for the development of the OCR System. The receipt’s image is uploaded to the Apache webserver. We upload the image on the webserver and than the PHP scripts calls the shell scripts in sequence to perform the binarization, rotation and the OCR of the image using the popular OCR engine – Tesseract. Here at each steps conversion of image is done using popular open source tool called ImageMagick. Once the operation is completed the server stores the text content on the server and responds the android phone with a ‘OK’ signal. Once the android receives the ‘OK’ signal, the applications understands that the OCR on the server has been performed successfully and hence it requests the webserver for the stored OCR text. When the application receives the text successfully, it opens an intent and processes the data on the android phone to make it available to user in an appropriate fashion. The full flow can be visualized from below image.

After we have discussed the details of front end, back end and server. Lets visualize the flow of the PocketPal system from the diagram below.

The test cases for the project can be seen on the this Link.

The full report is available for the download.