### From OpenCV and Tesseract to exploring recent research results in Computer Vision [Challenge 2]

I’ve been away for a while… Actually, I didn’t notice as the time was going on.. even though I felt like I’ve been running all days since my last article 😀 .. I was all in a new dimension of machine learning..one that is so huge and with many many possible doors to knock!

-What is it that you called “new dimension”?

Okay, let me tell you. If you followed my previous post about OpenCV and Tesseract, you may already know that I’ve been working on a computer vision project, involving Object Character Recognition (OCR) for texts in a natural scene image – By the way, I really like the problem that I’m trying to solve, that’s why I accepted working on the project despite the fact that I don’t have a previous knowledge of the topics and technologies used in such contexts. –

Back to the subject 🙂 .. so after trying out with tesseract and openCV, I found that there are many parameters that can affect the result, these parameters relate especially to a phase that it’s emphasized from experienced people.

What is that phase?

The preprocessing phase of the images. What I mean by preprocessing is the preparation of the input data (in this case, a natural scene image containing text) in a way that allows the algorithm to deliver good results.

How is that? preparing an image? What can someone do after getting the image?

Yes preparing it! It’s like when you want to do a cake, you have to prepare its ingredients.. or when you want to teach your child to differentiate between dogs and cats, you have to collect photos of cats and dogs beforehand, that will be your material in the process.

And you can prepare an image by playing around the color intensity of each pixel, regions, contours, edges, alignment.. all these are image properties. These properties are what make variations present in an image e.g blur, noise, contrast, etc. Adjusting these properties to control variations is necessary in order to identify(detect) text regions, segment these regions correctly and recognize the text as a last phase in the pipeline.

I think that at this moment you are starting to understand why it’s a very important phase, isn’t it? Obviously, this phase will influence all consequent phases. So when I realized that I started looking around on how to do it. I checked various research papers, OpenCV functions, tutorials.After doing so and trying some methods (check my GitHub for the code snippet that I tested) like image thresholding (classifying pixels into two groups: below the threshold or above the threshold), dilation and erosion, contour approximation, MSER regions detections using Neuman proposed method, I came to the conclusion: to improve text detection and recognition it needs much of parameters understanding and tuning for these methods  whereas I’m extremely limited by deadlines imposed by my client!

Wow, how funny is that! What should I do! HEEELP! Deadlines and newbie in computer vision!… Yeah, I felt frustrated! You surely faced such a situation in a specific phase of a project.

The Me surprised about how much I have to learn and deal with.

Fortunately, when remembering my motivation and goal I decided to keep calm and raise the challenge 2 in my project path! So I read some other papers with more attention to the details of the work presented and I got to a conclusion!

Oh, finally we are in the conclusion. What is it? And still, I don’t get what you mean by that “new dimension”!

First, the images I’m dealing with have some specific characteristics:
– low contrast between the text and background,
– The distance between characters presents variations,
– The number of characters is not the same everywhere,
-And the text doesn’t involve words, it means it can’t be recognized using a dictionary and a list of choices, but a mix of numeric characters and Alphabetical characters.

With consideration to that, I examined papers with methods based on Convolutional Neural Network and decided to go for it because according to research and some applications of CNN it delivered much better result! Especially Text-Attentional Convolutional Neural Networks for Scene Text Detection method is what interested me because it is based on a more informative supervised information and on improving contrast in order to detect text.

The presented method involves:
-training a CNN with more informative supervised information, such as text region mask, character label, and binary text/non-text information.
-introducing a deep multi-task learning mechanism to learn the Text-CNN efficiently and making it possible to learn more discriminative text features.
-developing a new type of MSERs method by enlarging the local contrast between text and background regions.

(If you know a better paper method that seems to fit my problem, let me know 😉 )

And since that time, which was about approximately two weeks ago, I have started learning Deep Learning with computer vision! That new dimension in machine learning! I tried to familiarize myself with the concept in a fast way so I read these articles that I suggest to you if you want to start with deep learning and computer vision:

A quick introduction to Neural NetworkAn intuitive explanation of Convolutional Neural Network, and A beginner’s guide to understanding Convolutional Neural Network. After that, I installed Tensorflow CPU as it’s easier to install than GPU version and Keras because it’s easier (more high level) than Tensorflow and followed Machine Learning Mastery website tutorial: develop your first Neural Network in Python with Keras step by step. There, I discovered something very important! I really need to switch to GPU version because it’s very slow! Buuuuut… to run GPU version, you have to use the Graphics card (NVIDIA) which I found out later that it doesn’t work on a Virtual Machine!!! Oh my God! A new time-consuming thing with an approaching deadline 😀

Apart from that, I didn’t mention that I have to resolve a very important problem too, that was the first reason of trying out other methods before switching to CNN: I don’t have a big amount of data! (you may tell me how then you will make it work with deep learning! Everyone knows that it’s the most important component of Neural Networks! Otherwise, you can get good results because simply your CNN won’t learn much enough from your input images!)…

It’s possible! There are some methods, some of them are less evident than others, but it’s possible!

I will explore more this 3rd challenge later one but for now, I’m very happy to find a complete article about Setting Up Ubuntu + Keras + GPU for Deep Learning which was published yesterday by PyImageSearch! (Or maybe there is another way…Do you have in mind another possible way to get a CNN running faster? )

By the way, if you are starting like me in Deep Learning with computer vision I would be happy to share with each other the feedback of what we experiment along the way because that will help us move faster ;).

And if you are experienced in the field, let me know what you think about my taken strategy in dealing with the project and what possible methods that can help.

### Starting with OpenCV and Tesseract OCR on visual studio 2017 [Challenge 1]

I have recently started working on a Freelance project where I need to use text scene recognition based on OpenCV and Tesseract as libraries. I was so motivated to hit the Wolrd of computer vision combined with machine learning and experience developing applications in the field, so I welcomed challenges that come with!
Here I’ll be talking about the first challenge and how I tackled it.

found myself with multiple new things to prepare in order to start coding, without mentioning that it’s been a long time before when I last coded with C++ (back to my university time)!

At first, I was asked to use OpenCV 3.0 and Tesseract 3.02 in order to run the project’s part which is already available. So I installed OpenCV 3.0 and Tesseract 3.02 with Leptonica Library, by following provided documentation about how to build application with OpenCV on Windows Visual Studio in this link. Then, I tried to run the project in Visual Studio 2017. I got more than 800 errors!!! Most of them where LINK errors of type:

mismatch detected for '_MSC_VER': value '1700' doesn't match value '1900' in <filename.obj> 

and

error LNK2038: mismatch detected for 'RuntimeLibrary': value 'MT_StaticDebug' doesn't match value 'MD_DynamicDebug' in file.obj. 

The first error was caused by the fact that objects were compiled by different versions of the compiler which is not supported because different versions of the standard library are binary incompatible. Actually, the numbers 1700 and 1900 meant binary version of  v11 (Visual Studio 2012) is not compatible with the binary version of  v14 (Visual Studio 2015).[1][2]

About the second one, the reason behind is that both the library and your project must be linked with the same settings with regards to the C Runtime Library, and that whether in Release or Debug mode [3]. And here the solution in case you need to use static libs or you can deactivate static lib when building with CMake by setting:

-DBUILD_WITH_STATIC_CRT=OFF

So I decided to change the version of Tesseract so that it will be compatible with newer version of Visual Studio and x64 target project. As compiling Tesseract3.05.1 with cppan client was not possible for me (ccpan crashes just after launching it), I decided to go for VS2015_Tesseract. I followed these steps in addition to using v140 toolset because Visual Studio 2017 was shipped with v141 toolset, but it supports v140.

Besides, there were some other points to consider. Unfortunately, I got again errors after configuring my project exactly like it’s in tesseract\tesseract.vcxproj (extracted from VS2015_Tesseract) and I find out that the reason behind was: Having a “space” in tesseract path directory.

One last thing to consider when using this tesseract version is to the initialization of tesseract: TESSDATA_PREFIX environment variable is required to setup Tesseract data (tesseract/tessdata) folder. Also you can set it via api->Init(<data path here>, <language>)

I wanted to get also a newer version of OpenCV and went for OpenCV 3.02. Fortunately, that was easier than Tesseract installation, but of course not without some difficulties. I followed this Tutorial in order to install it. In these steps, you have to pay attention to the version of visual studio you choose when configuring CMake as there is a version for win64 (or x64) target project and a version for win32 (or x86). I chose “Visual Studio 15 2017 Win64”.

And yes! it worked! I tested opencv3.2 on a sample project and tesseract on a sample project with OpenCV! I was relieved! Finally, I could start coding!…

After launching the project “End to end text recognition“..I discovered that I wasn’t so close! There was a problem with opencv_text module. I got the error: Tesseract not found at runtime. So, I asked for help through OpenCV forum and StackOverflow and got a suggested solution. In the step of OpenCV configuration using CMake, besides providing Lept_LIBRARY Path and Tesseract_INCLUDE_DIR Path and tesseract_LIBRARY, check Build_examples checkbox, uncheck Build_tests and Build_perf_tests (You don’t need tests when you are a beginner and it takes a lot of memory space, as I know). Also, be sure that:

– tesseract305.dll file is in tesseract-3.05.01\build\bin\Release directory;

leptonica-1.74.4.dll file in leptonica-1.74.4/build/bin/Release.

-put them both in opencv\build\bin\Release (if you will use Release mode or bin\Debug if you will use debug mode).

After that, you have to open INSTALL.vcxproj from OpenCV build directory and build it in Debug (if you will need Debug mode later) and then Release. Be sure to select the right mode x64 (if you chose x64 in CMake configure) or x86 (if you chose Win32 in CMake configure), or you can get errors of type:

“fatal error LNK1112: module machine type ‘x64’ conflicts with target machine type ‘X86”  [4]

That’s all! Now you can start creating projects in Visual studio. Don’t forget to configure the project correctly: