### From OpenCV and Tesseract to exploring recent research results in Computer Vision [Challenge 2]

I’ve been away for a while… Actually, I didn’t notice as the time was going on.. even though I felt like I’ve been running all days since my last article 😀 .. I was all in a new dimension of machine learning..one that is so huge and with many many possible doors to knock!

-What is it that you called “new dimension”?

Okay, let me tell you. If you followed my previous post about OpenCV and Tesseract, you may already know that I’ve been working on a computer vision project, involving Object Character Recognition (OCR) for texts in a natural scene image – By the way, I really like the problem that I’m trying to solve, that’s why I accepted working on the project despite the fact that I don’t have a previous knowledge of the topics and technologies used in such contexts. –

Back to the subject 🙂 .. so after trying out with tesseract and openCV, I found that there are many parameters that can affect the result, these parameters relate especially to a phase that it’s emphasized from experienced people.

What is that phase?

The preprocessing phase of the images. What I mean by preprocessing is the preparation of the input data (in this case, a natural scene image containing text) in a way that allows the algorithm to deliver good results.

How is that? preparing an image? What can someone do after getting the image?

Yes preparing it! It’s like when you want to do a cake, you have to prepare its ingredients.. or when you want to teach your child to differentiate between dogs and cats, you have to collect photos of cats and dogs beforehand, that will be your material in the process.

And you can prepare an image by playing around the color intensity of each pixel, regions, contours, edges, alignment.. all these are image properties. These properties are what make variations present in an image e.g blur, noise, contrast, etc. Adjusting these properties to control variations is necessary in order to identify(detect) text regions, segment these regions correctly and recognize the text as a last phase in the pipeline.

I think that at this moment you are starting to understand why it’s a very important phase, isn’t it? Obviously, this phase will influence all consequent phases. So when I realized that I started looking around on how to do it. I checked various research papers, OpenCV functions, tutorials.After doing so and trying some methods (check my GitHub for the code snippet that I tested) like image thresholding (classifying pixels into two groups: below the threshold or above the threshold), dilation and erosion, contour approximation, MSER regions detections using Neuman proposed method, I came to the conclusion: to improve text detection and recognition it needs much of parameters understanding and tuning for these methods  whereas I’m extremely limited by deadlines imposed by my client!

Wow, how funny is that! What should I do! HEEELP! Deadlines and newbie in computer vision!… Yeah, I felt frustrated! You surely faced such a situation in a specific phase of a project.

The Me surprised about how much I have to learn and deal with.

Fortunately, when remembering my motivation and goal I decided to keep calm and raise the challenge 2 in my project path! So I read some other papers with more attention to the details of the work presented and I got to a conclusion!

Oh, finally we are in the conclusion. What is it? And still, I don’t get what you mean by that “new dimension”!

First, the images I’m dealing with have some specific characteristics:
– low contrast between the text and background,
– The distance between characters presents variations,
– The number of characters is not the same everywhere,
-And the text doesn’t involve words, it means it can’t be recognized using a dictionary and a list of choices, but a mix of numeric characters and Alphabetical characters.

With consideration to that, I examined papers with methods based on Convolutional Neural Network and decided to go for it because according to research and some applications of CNN it delivered much better result! Especially Text-Attentional Convolutional Neural Networks for Scene Text Detection method is what interested me because it is based on a more informative supervised information and on improving contrast in order to detect text.

The presented method involves:
-training a CNN with more informative supervised information, such as text region mask, character label, and binary text/non-text information.
-introducing a deep multi-task learning mechanism to learn the Text-CNN efficiently and making it possible to learn more discriminative text features.
-developing a new type of MSERs method by enlarging the local contrast between text and background regions.

(If you know a better paper method that seems to fit my problem, let me know 😉 )

And since that time, which was about approximately two weeks ago, I have started learning Deep Learning with computer vision! That new dimension in machine learning! I tried to familiarize myself with the concept in a fast way so I read these articles that I suggest to you if you want to start with deep learning and computer vision:

A quick introduction to Neural NetworkAn intuitive explanation of Convolutional Neural Network, and A beginner’s guide to understanding Convolutional Neural Network. After that, I installed Tensorflow CPU as it’s easier to install than GPU version and Keras because it’s easier (more high level) than Tensorflow and followed Machine Learning Mastery website tutorial: develop your first Neural Network in Python with Keras step by step. There, I discovered something very important! I really need to switch to GPU version because it’s very slow! Buuuuut… to run GPU version, you have to use the Graphics card (NVIDIA) which I found out later that it doesn’t work on a Virtual Machine!!! Oh my God! A new time-consuming thing with an approaching deadline 😀

Apart from that, I didn’t mention that I have to resolve a very important problem too, that was the first reason of trying out other methods before switching to CNN: I don’t have a big amount of data! (you may tell me how then you will make it work with deep learning! Everyone knows that it’s the most important component of Neural Networks! Otherwise, you can get good results because simply your CNN won’t learn much enough from your input images!)…

It’s possible! There are some methods, some of them are less evident than others, but it’s possible!

I will explore more this 3rd challenge later one but for now, I’m very happy to find a complete article about Setting Up Ubuntu + Keras + GPU for Deep Learning which was published yesterday by PyImageSearch! (Or maybe there is another way…Do you have in mind another possible way to get a CNN running faster? )

By the way, if you are starting like me in Deep Learning with computer vision I would be happy to share with each other the feedback of what we experiment along the way because that will help us move faster ;).

And if you are experienced in the field, let me know what you think about my taken strategy in dealing with the project and what possible methods that can help.

### Starting with OpenCV and Tesseract OCR on visual studio 2017 [Challenge 1]

I have recently started working on a Freelance project where I need to use text scene recognition based on OpenCV and Tesseract as libraries. I was so motivated to hit the Wolrd of computer vision combined with machine learning and experience developing applications in the field, so I welcomed challenges that come with!
Here I’ll be talking about the first challenge and how I tackled it.

found myself with multiple new things to prepare in order to start coding, without mentioning that it’s been a long time before when I last coded with C++ (back to my university time)!

At first, I was asked to use OpenCV 3.0 and Tesseract 3.02 in order to run the project’s part which is already available. So I installed OpenCV 3.0 and Tesseract 3.02 with Leptonica Library, by following provided documentation about how to build application with OpenCV on Windows Visual Studio in this link. Then, I tried to run the project in Visual Studio 2017. I got more than 800 errors!!! Most of them where LINK errors of type:

mismatch detected for '_MSC_VER': value '1700' doesn't match value '1900' in <filename.obj> 

and

error LNK2038: mismatch detected for 'RuntimeLibrary': value 'MT_StaticDebug' doesn't match value 'MD_DynamicDebug' in file.obj. 

The first error was caused by the fact that objects were compiled by different versions of the compiler which is not supported because different versions of the standard library are binary incompatible. Actually, the numbers 1700 and 1900 meant binary version of  v11 (Visual Studio 2012) is not compatible with the binary version of  v14 (Visual Studio 2015).[1][2]

About the second one, the reason behind is that both the library and your project must be linked with the same settings with regards to the C Runtime Library, and that whether in Release or Debug mode [3]. And here the solution in case you need to use static libs or you can deactivate static lib when building with CMake by setting:

-DBUILD_WITH_STATIC_CRT=OFF

So I decided to change the version of Tesseract so that it will be compatible with newer version of Visual Studio and x64 target project. As compiling Tesseract3.05.1 with cppan client was not possible for me (ccpan crashes just after launching it), I decided to go for VS2015_Tesseract. I followed these steps in addition to using v140 toolset because Visual Studio 2017 was shipped with v141 toolset, but it supports v140.

Besides, there were some other points to consider. Unfortunately, I got again errors after configuring my project exactly like it’s in tesseract\tesseract.vcxproj (extracted from VS2015_Tesseract) and I find out that the reason behind was: Having a “space” in tesseract path directory.

One last thing to consider when using this tesseract version is to the initialization of tesseract: TESSDATA_PREFIX environment variable is required to setup Tesseract data (tesseract/tessdata) folder. Also you can set it via api->Init(<data path here>, <language>)

I wanted to get also a newer version of OpenCV and went for OpenCV 3.02. Fortunately, that was easier than Tesseract installation, but of course not without some difficulties. I followed this Tutorial in order to install it. In these steps, you have to pay attention to the version of visual studio you choose when configuring CMake as there is a version for win64 (or x64) target project and a version for win32 (or x86). I chose “Visual Studio 15 2017 Win64”.

And yes! it worked! I tested opencv3.2 on a sample project and tesseract on a sample project with OpenCV! I was relieved! Finally, I could start coding!…

After launching the project “End to end text recognition“..I discovered that I wasn’t so close! There was a problem with opencv_text module. I got the error: Tesseract not found at runtime. So, I asked for help through OpenCV forum and StackOverflow and got a suggested solution. In the step of OpenCV configuration using CMake, besides providing Lept_LIBRARY Path and Tesseract_INCLUDE_DIR Path and tesseract_LIBRARY, check Build_examples checkbox, uncheck Build_tests and Build_perf_tests (You don’t need tests when you are a beginner and it takes a lot of memory space, as I know). Also, be sure that:

– tesseract305.dll file is in tesseract-3.05.01\build\bin\Release directory;

leptonica-1.74.4.dll file in leptonica-1.74.4/build/bin/Release.

-put them both in opencv\build\bin\Release (if you will use Release mode or bin\Debug if you will use debug mode).

After that, you have to open INSTALL.vcxproj from OpenCV build directory and build it in Debug (if you will need Debug mode later) and then Release. Be sure to select the right mode x64 (if you chose x64 in CMake configure) or x86 (if you chose Win32 in CMake configure), or you can get errors of type:

“fatal error LNK1112: module machine type ‘x64’ conflicts with target machine type ‘X86”  [4]

That’s all! Now you can start creating projects in Visual studio. Don’t forget to configure the project correctly:

To test one of opencv text module sample example, put tesseract.dll and lept.dll in the same directory as your .exe (opencv/build\bin\Release\<projectname.exe>) and copy all files from opencv\opencv_contrib-3.2.0\modules\text\samples to opencv/build\bin\Release\

Note: for me, I didn’t test creating a project because when updating Visual Studio 2017, I had a bug!!! Which I tried to resolve and at the end I switched to Ubuntu 16.4!! 😀 .. and how easy it was because these libraries have a greater community in Ubuntu and there is video series to install latest versions. You just need to follow this Francesco Piscani youtube channel videos on how to install OpenCV, Tesseract and Leptonica. Also, you can choose the option to download Tesseract3.04 and Leptonica1.73, which are older versions that don’t require build from source. But I don’t know how to run the project on Debug mode and if it requires another build condition of opencv library. (like in Windows you have to choose build configuration Release or Debug)

What about your experience with OpenCV, Tesseract and Leptonica? Are you working on any project involving Text recognition in natural images? Share in a comment!

Please don’t hesitate to write me in a comment if you tried the steps and failed! I will do my best to help! 🙂

### Is cracking the coding interview the only benefit of learning algorithms?

Often, algorithms are considered only when someone is looking for a new job. This tight perception of algorithms use puts us away from what algorithms can allow us to achieve!

Actually, algorithms are everywhere! Algorithms are involved in each aspect of computer science! Not only that but also used in a wide range of fields: recommendations, social media, medicine, psychology, transportation and the list is longer still!

Anything you do, can be broken down into small steps and that, is an Algorithm. Imagine you wake up the morning to go to work and you can’t remember where are your car keys, how would you find them? One approach might be to apply an algorithm, which is a step by step logical procedure. First, you look for places where you used to put it in. You try to remember the last time you used it. You check the place where you went when entering the home. Sooner or later by the flow of steps, you eventually find the car keys. An algorithm is a sequence of instructions which ensure a certain task completion. We, as human, apply algorithms in order to take actions in every aspect of our lives.

So, algorithm knowledge can be an asset in improving your life or even others’lives! You can turn something that takes the time to decide on it into an algorithm that assists you in the decision process. You can create an algorithm in order to prioritize your tasks, instead of doing it by a simple to-do list. You can make an algorithm in order to predict the best time to visit a city. You can make algorithm in order to get suggestions on what films to watch according to your preferences and historical data. There is no limit to what you imagine and can make real by applying algorithms!

Now, I think it’s time to start talking about how to gain those set of skills in order to make a difference with algorithms! First, let’s go for the characteristics of an algorithm.

What are the characteristics of an algorithm?

1. It should be finite: If your algorithm never ends trying to solve the problem it was designed to solve then it is useless
2. It should have well-defined instructions: Each step of the algorithm has to be precisely defined; the instructions should be unambiguously specified for each case.
3. It should be effective: The algorithm should solve the problem it was designed to solve. And it should be possible to demonstrate that the algorithm converges with just a paper and pencil.

What are the steps to learn algorithm coding?

1. Develop your programming logic. Here some ways to improve it.
2. Pick up a programming language you are comfortable in (if you don’t have any, I suggest learning python, it’s easy and simple)
3. Learn Data structures: Start with the basic ones: String, Vectors, Lists, Arrays, Map. Then Queue and Stack. Finish with Advanced ones: trees, graphs, tries.
4. Practice coding data structures. Here is a good link
5. Learn simple algorithms first and then move to the most common algorithms. Make sure you understand every step and don’t be frustrated from being slow. Remember « It does not matter how slowly you go as long as yo don’t stop ». You can look for videos because they make it easier to assimilate.
6. Start tackling problems (e.g in Hackerrank, there are funny and challenging algorithms). You must have a strategy of resolving problems, I’d suggest following the method proposed by Gayle Laakmann McDowell.

Finally, through my own modest experience in problem-solving, here are my pieces of advice :

• Ask questions: « why » and « what » are really important in order to understand the basics of anything! When you question the choice, you will better memorize the answer and you will dig deeper into the source of the information.
• To stay motivated and keep the learning curve up, create a simple daily useful algorithm after getting the basics. Accomplishment fuels your energy.
• Try to solve the problem manually by yourself first. Take a paper and start writing steps. Especially take your time and don’t move on to the solution fast.
• Even if you figure out the solution, don’t tell yourself « It’s ok now, no need to implement it, I know how it works » It’s a big trick there because you never know what you missed if you don’t try it and test it with corner cases.
• Be patient. If you begin to feel frustrated about a problem, leave it and go do something that is not so stressful. Then, come back.

I hope this helps you to leverage the power of your brain, by discovering its super capabilities to execute algorithms and learning how to translate them into code! 😉

Remember how was Rome build (not in a day right)? By each small step in learning algorithms, you are growing your potential to contribute to making life better in this world!

Suggested books:

Github resources:

mission-peace

Online resources:

Topcoder: an explanation of data structures and algorithms.

ideserve: contains a visual explanation of some algorithm problems