OCR based on Deep Learning implemented on Raspberry Pi4 with Coral USB Accelerator.

This mini-project is a part of ICT730; Hardware Designs for Artificial Intelligence and Internet of Things in the TAIST-Tokyo Tech program which was written by P. Saengthong, P. Jungjariyanon and P. Kumpipot who are a master student.

Photo by Jarrod Erbe on Unsplash

Table of Contents


This project will be demonstrated an OCR(Optical Character Recognition) Technology based on Deep Learning by using Raspberry Pi as a Microcontroller, in order to improve its performance working together with Coral USB Accelerator is an interesting choice. It adds an Edge TPU coprocessor to your system, and enabling high-speed inferencing with low consumption and also supports TensorFlow Lite which is a lightweight model for mobile deployment. Completely, this project can recognize in real-time and convert virtually any kind of image or stream video containing typed and printed text into machine-readable text data. It is very easy to use, just open your mobile camera and got the text data!

Photo by Annie Spratt on Unsplash

Related Works

OCR is one of the computer vision tasks that is revolutionizing so many fields from bank to medicine, and everything in between. AI at the Edge was considered as a popular trend which is a system that uses Machine Learning(ML) algorithms to process data. In this experiment, we will create the new version of OCR by building OCR based on Machine Learning for increasing ability to handle with more complex data and more accuracy. This project was broken into 2 parts to extract data from any provided sources.

Part1: Text Detection, there are alternatives pre-trained models can detect text on image or video but we focus on text detectors that can be use for microcontroller deployment such as EAST model, CRAFT model, TextBoxes++, and PaddleOCR. CRAFT and EAST, Their performances are almost the same, but still different. East was chosen for working in this project following its better performance in A battle result article.

  • EAST (Efficient accurate scene text detector), it is a very robust deep learning text detection in natural scenes. The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps (e.g., candidate aggregation and word partitioning), with a single neural network. So, It is great option to be our text detector.
Figure1 : The structure of the EAST text detection Fully-Convolutional Network (by Zhou et al.)

Part2 : Text Recognition, which is among the most important and challenging tasks in image-based sequence recognition. Unlike general object recognition, recognizing such sequence-like objects often requires the system to predict a series of object labels, instead of a single label. There are a lot of great work in this categories such as tesseract, keras-ocr, and CRNN. In this mini-project, CRNN model was chosen to implement in recognition part.

  • CRNN (Convolutional Recurrent Neural Networks), it is a implementation of a Deep Neural Network for scene text recognition. The model consists of a CNN stage extracting features which are fed to an RNN stage (Bi-LSTM) and a CTC loss for image-based sequence recognition tasks, such as scene text recognition and OCR. For details, please take a look into their paper.
Figure2 : The structure of Convolution Recurrent Neural Network (by Shi et al.)

Notice : This project aims to work with TensorFlow Lite which is lightweight model that is a set of tools that help convert and optimize TensorFlow to run on mobile and edge devices and in order to gather results on inference latency and memory usage of the models. In deep details of “How to convert” , it will describe intensively in Approach Part


Let’s review the hardware requirements for this project:

Hardware requirement

  1. Raspberry Pi 4 2GB
Photo by website
  • Raspberry Pi: This project assumes you are using a Raspberry Pi 4 such as 2GB, 4GB or 8GB hardware.
  • Operating system: These instructions only apply to Raspbian Buster.
  • 32GB microSD: I recommend the high-quality SanDisk 32GB 98Mb/s cards. (You can use 8GB or 16GB microSD as well)
  • microSD adapter: You’ll need to purchase a microSD to USB adapter so you can flash the memory card from your laptop.

2. Coral USB Accelerator; it adds an Edge TPU coprocessor to your system, enabling high-speed machine learning inferencing on a wide range of systems with low consumption and supports TensorFlow Lite, simply by connecting it to a USB port.

Photo by website

The software included with the Coral USB Accelerator includes APIs and demos for:

  • Image Classification: For a given image/video returns a list of labels and confidence scores.
  • Object Detection: For a given image returns a list of objects found, each with a label, a confidence score and the coordinates of the object.
  • Transfer Learning: Allows for retraining of classification models on the Edge TPU.
Author’s algorithm diagram.

Software requirement

1). Install Raspberry Pi 4 and Raspbian Buster.

Once you have the hardware ready, you’ll need to flash a fresh copy of the Raspbian Buster operating system to the microSD card (there might be some differences for each OS)

Download Raspberry Pi Imager following your OS
  • Open Raspberry Pi Imager, and click on OS (Figure 2).
Select Operating system and SD Card.
  • Select Raspberry Pi OS Full (32-bit) as shown Figure 3, and click write. The writing process will take a while.
Select Raspberry Pi OS Full (32-bit) — Recommend.

2). Getting started with Google Coral USB Accelerator.

Up to this point, we hope you already done the previous section and ready to continue for the configuration of Coral USB Accelerator which was provided in the following steps.

  • Step1: Setting up your Google Coral virtual environment. We’ll be using Python virtual environments, a best practice when working with Python and pip command.

you can install pip using the following command:

$ wget https://bootstrap.pypa.io/get-pip.py
$ sudo python get-pip.py
$ sudo python3 get-pip.py
$ sudo rm -rf ~/.cache/pip

Let’s install virtualenv and vitrualenvwrapper :

$ sudo pip install virtualenv virtualenvwrapper

Once both virtualenv and vitrualenvwrapper have been installed, open up your ~/.bashrc file:

$ nano ~/.bashrc

and append the following lined to the bottom of the file:

# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh

save and exit via ctrl+x , y , enter.

From there, reload your ~/.bashrcfile to apply the changes to your current bash session:

$ source ~/.bashrc

Next, create your Python 3 virtual environment:

$ mkvirtualenv coral -p python3

Here we are creating a Python virtual environment named coral using Python3. Going forward, we recommend Python 3.

  • Step2: Installing the Coral EdgeTPU Runtime and Python API

First, let’s add the package repository:

$ echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
$ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
$ sudo apt-get update

Now we’re ready to install the EdgeTPU runtime library:

$ workon coral
$ sudo apt-get install libedgetpu1-std
  • Step3; Reboot your device

Rebooting your Raspberry Pi or computer is critical for the installation to complete.

You can use the following command:

$ sudo reboot now

3). Install OpenCV on Raspberry Pi 4.

Starting with insert your microSD into your Raspberry Pi and boot it up with a screen attached. Once booted, configure your WiFi/ethernet settings to connect to the internet because you’ll need an internet connection to download and install required packages for OpenCV. Let’s open a terminal and run the following steps.

  • Step1; Expand filesystem and reclaim space

The first step is to run, raspi-config and expand your filesystem:

$ sudo raspi-config

and then select 7 Advanced Options menu items and followed by selecting A1 Expand filesystem as shown in figure4 and figure5

The `raspi-config` configuration screen for Raspbian Buster. Select “7 Advanced Options” so that we can expand our filesystem.
The “A1 Expand Filesystem” menu item allows you to expand the filesystem on your microSD card containing the Raspberry Pi Buster operating system. Then we can proceed to install OpenCV 4.

Once prompted, you should select the first option, A1 Expand File System, hit “enter” on your keyboard, arrow down to the <Finish> button, and then reboot your Pi — you may be prompted to reboot, but if you aren’t you can execute:

$ sudo reboot

You can verify that the disk has been expanded by executing command and examining the output:

$ df -h

We would suggest deleting both Wolfram Engine and LibreOffice to reclaim ~1GB of space on your Raspberry Pi:

$ sudo apt-get purge wolfram-engine
$ sudo apt-get purge libreoffice*
$ sudo apt-get clean
$ sudo apt-get autoremove
  • Step2; Install dependencies

The following commands will update and upgrade any existing packages, followed by installing dependencies, I/O libraries, and optimization packages for OpenCV:

The first step is to update and upgrade any existing packages:

$ sudo apt-get update && sudo apt-get upgrade

We then need to install some developer tools, including CMake, which helps us configure the OpenCV build process:

$ sudo apt-get install build-essential cmake pkg-config

Next, we need to install some image I/O packages that allow us to load various image file formats from disk. Examples of such file formats include JPEG, PNG, TIFF, etc.:

$ sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
$ sudo apt-get install libxvidcore-dev libx264-dev

The OpenCV library comes with a sub-module named highgui which is used to display images to our screen and build basic GUIs. We need to install the GTK development library and prerequisites:

$ sudo apt-get install libfontconfig1-dev libcairo2-dev
$ sudo apt-get install libgdk-pixbuf2.0-dev libpango1.0-dev
$ sudo apt-get install libgtk2.0-dev libgtk-3-dev

an installing a few extra dependencies.

$ sudo apt-get install libatlas-base-dev gfortran
  • Step3; pip install OpenCV 4

you can install CV to virtual environment:

$ workon coral
$ pip install opencv-contrib-python
  • Step4; Testing your OpenCV 4 Raspberry Pi BusterOS install

As a quick sanity check, access the CV virtual environment, fire up a Python shell, and try to import the OpenCV library:

$ cd ~
$ workon coral
$ python
>>> import cv2
>>> cv2.__version__'4.1.1'

Now, You’ve just finished an installation of OpenCV 4 on your Raspberry Pi.

The last but not least, Using TensorFlow Lite with Python is great for embedded devices based on Linux, such as Raspberry Pi and Coral devices with Edge TPU, among many others. So we will go to the installation of TensorFlow Lite for python step.

$ workon coral
$ echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
$ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install python3-tflite-runtime

4). Install TensorFlow version 2.3.0 on Raspberry Pi4

  • Step1 : Upgrade pip (>19.0), setuptools (≥41.0.0) and install Tensorflow 2 dependencies.
$ workon coral
$ sudo pip install --upgrade pip
$ sudo pip3 install numpy==1.19.0
$ sudo pip3 install --upgrade setuptools
$ sudo apt-get install -y libhdf5-dev libc-ares-dev libeigen3-dev gcc gfortran python-dev libgfortran5 libatlas3-base libatlas-base-dev libopenblas-dev libopenblas-base libblas-dev liblapack-dev cython libatlas-base-dev openmpi-bin libopenmpi-dev python3-dev$ sudo pip3 install keras_applications==1.0.8 --no-deps
$ sudo pip3 install keras_preprocessing==1.1.0 --no-deps
$ sudo pip3 install h5py==2.9.0
$ sudo pip3 install pybind11
$ pip3 install -U --user six wheel mock
  • Step2 : Install TensorFlow 2
$ workon coral
$ wget "https://raw.githubusercontent.com/PINTO0309/Tensorflow-bin/master/tensorflow-2.3.0-cp37-none-linux_armv7l_download.sh"
$ sudo chmod +x tensorflow-2.3.0-cp37-none-linux_armv7l_download.sh$ ./tensorflow-2.3.0-cp37-none-linux_armv7l_download.sh$ sudo pip3 uninstall tensorflow$ sudo -H pip3 install tensorflow-2.3.0-cp37-none-linux_armv7l.whl

Other version can be found in this link

  • Step3 : Check if TF 2 has been installed correctly
$ workon coal 
$ python3
>>> import tensorflow
>>> tensorflow.__version__
>>> exit()

5). Install TensorFlow Lite for Python

$ workon coral
$ echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
$ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install python3-tflite-runtime

Implementation of OCR on Raspberry Pi 4 with Coral USB accelerator and results

Text detection

Text detection is an early process to find horizontal and rotated bounding boxes in any provided image or video. East model is a pre-trained model use to detect text, it was implemented based on TensorFlow library. Microcontroller as Raspberry Pi not proper to run on heavy work so we need to convert it into .tflite

To convert East model to compatible with edge TPU USB, it was required to do Post-training quantization which is a conversion technique that can reduce model size to compatible with microcontroller deployment, with little degradation in model accuracy as “ dr”, “ int8”, and “ float16" formats as the following figure below.

Each post-quantization technique

Full integer quantization

You can get further latency improvements, reductions in peak memory usage, and compatibility with integer only hardware devices or accelerators by making sure all model math is integer quantized.

For full integer quantization, you need to calibrate or estimate the range, i.e, (min, max) of all floating-point tensors in the model. Unlike constant tensors such as weights and biases, variable tensors such as model input, activations (outputs of intermediate layers) and model output cannot be calibrated unless we run a few inference cycles. As a result, the converter requires a representative dataset to calibrate them. This dataset can be a small subset (around ~100–500 samples) of the training or validation data. Refer to the representative_dataset() function below.

Additionally, to ensure compatibility with integer only devices (such as 8-bit microcontrollers) and accelerators (such as the Coral Edge TPU), you can enforce full integer quantization for all ops including the input and output, by using the following steps:

After converted model into int8 format, we also need to convert it into edge TPU readable format by using The Edge TPU Compiler (edgetpu_compiler) , it is a command line tool that compiles a TensorFlow Lite model (.tflite file) into a file that's compatible with the Edge TPU. This page describes how to use the compiler and a bit about how it works.

Finally, we can convert east_model_int8.tflite which is 23.8 MB into east_model_int8_edgetpu.tflite, and get size is around 23.4 MB, and ready to use it.

demonstrate code for text detection

result gif of implementation from text detection

Result no.1 of text detection.
Result no2. of text detection.

Text detection + text recognition

Since we selected CRNN model to be our text recognition engine, we need to convert the weight file to be TensorFlow lite as well. In this mini-project we try to use 2 format files which are dr and float16, since int8 model could not be converted due to some error in the operation. The most accuracy prediction format is float16 but heavier weight which may perform higher latency than dr. Both drand float16 weights will computed on Raspberry Pi 4.

Before we start convert to .tflite, we go download model that trained on the Mjsynth dataset provided by FLming/CRNN.tf2, this model can only predict 0–9 and a-z (ignore case).

For dr format

For float16 format

Text detection + text recognition code.

After we implement the code, the window of text detection in real-time will appear, and it just wait for the moment that you want to capture the box wrapping text, then press any key to exit the window, and the window of text recognition will appear!

result gif of implementation of text detection + text recognition.

Result of OCR


As we can see the result from real-time text detection part, the quantized model named east_model_int8_edgetpu.tflite , which size is around 23.4 MB, and get the highest frame rate at 3 frame per second in real-time detection using edge TPU USB (in our experiment when we use east_model_int8.tflite we archive around 0.5 fps.). After trying to implement text detection and text recognition together in real-time scene. Even though both model use different resource for inference, the frame did not show up. Therefore, our approach will be seen like we seen in Implementation part.

East model that we use in text detection part the model use ResNet-50 for backbone architecture which has a lot of parameter and required high computation from hardware. Instead, if we change backbone to Mobilenet-v2 which less parameter, it may give more speedup in inferencing but a little bit lower in precision, for more details please see in the Github-repo of ZMLLLL/EAST-Pytorch. In the repo, they provide both model that use backbone of Resnet-50 and Mobilenet-v2. Unfortunately, we could not convert the model from Mobilenet-v2 into .tflite, since the way to convert is .pth.onnx.pb.tflitebut we could not convert from Pytorch model to .onnxformat properly, since the model is in .pth.tar format and we could not find the way to fix it.

Our Github repository



[1] Y. Baek, B. Lee, D. Han, S. Yun, H. Lee. Character Region Awareness for Text Detection, arXiv:1904.01941

[2] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang. Megvii Technology Inc., Beijing, China. EAST: An Efficient and Accurate Scene Text Detector. on IEEE

[3] Y. Baek, B. Lee, D. Han, S. Yun, H. Lee. Character Region Awareness for Text Detection. arXiv:1904.01941, 2019

[4] B. Shi, X. Bai and C. Yao. An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. arXiv:1507.05717, Jul 21, 2015.

[5] Docparser, What Is OCR And What Is It Used For?,<https://docparser.com/blog/what-is-ocr/#:~:text=Literally%2C%20OCR%20stands%20for%20Optical,into%20machine%2Dreadable%20text%20data.>

[6]fcakyon, CRAFT: Character-Region Awareness For Text detection, <https://github.com/fcakyon/craft-text-detector>

[7] SakuraRiven , PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector,<https://github.com/SakuraRiven/EAST>

[8] Adrian Rosebrock , August 20,2018 , OpenCV Text Detection (EAST text detector), <https://www.pyimagesearch.com/2018/08/20/opencv-text-detection-east-text-detector/>

[9] Sayak Paul & Tulasi Ram Laghumavarapu, Nov 27, 2020, A Battle of Text Detectors for Mobile Deployments: CRAFT vs. EAST, <https://sayak.dev/optimizing-text-detectors/#EAST---320x320-Dynamic-Range-&-float16>

[10]Mark West, 2019, Hands-on with the Google Coral USB Accelerator,<https://www.bouvet.no/bouvet-deler/hands-on-with-the-google-coral-usb-accelerator>

[11] TensorFlow, TensorFlow Guide, <https://www.tensorflow.org/lite/guide/python>

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store