Data Natives Berlin 2017: my talk “Building a Recommender System using Collaborative Filtering”

Data Natives Berlin 2017: my talk “Building a Recommender System using Collaborative Filtering”

My activities

I was invited as a speaker in Data Natives 2017 to give a talk on tech trends track and I chose to speak about Recommender Systems because it’s a topic that fascinates me. It’s a topic that’s changing the way we shop clothes, order food, purchase books and more!

Behind those systems, there is a big whole stuff being done by machine learning algorithms, analyzing your behavior from your clicks, views, likes, purchases, social network, localization!

One of those famous recommender systems is the one of Amazon.com. When you find sentences like “Recommended for you, Sarah” or “Frequently bought together” or “Customers who bought this item also bought..” you are actually coming across outcomes of these machine learning algorithms.

I know, it seems like…

But it’s not! and you can do it as well! you can build your own recommender system, you just have to get an answer for How to start? and what to consider when building it? and I came here to help you make your first steps in :).

I stated Amazon as an example because they are leveraging a method called “Collaborative Filtering” in their Recommendation Engine. This method is frequently used in recommender systems, it’s very famous to the point that sometimes people think that “recommendation engine = collaborative filtering” but Recommender Systems are not the only application of Collaborative Filtering, and recommendation systems can be done using other methods too.

Others like Netflix and Last.fm are using collaborative filtering too, among others..and thanks to that method.

“35% of Amazon.com’s revenue is generated by its recommendation engine” [source]

“More than 80% of the TV shows people watch on Netflix are discovered through the platform’s recommendation system.” [source]

What do you wait, let’s pull back the curtain on Recommender System using Collaborative Filtering!

Check my talk slides here.

And if you want to watch a video, here is the official video.

If you have any questions about the topic, don’t hesitate to post a comment and I will be happy to answer!

Applied text classification on Email Spam Filtering [part 1]

Applied text classification on Email Spam Filtering [part 1]

My activities, projects, Technology

Since last few months, I’ve started working on online Machine Learning Specialization provided by the University of Washington. The first course was about ML foundations, the second about Linear Regression, and the third course on which  I’m currently on track is about Classification. I liked the courses almost in every aspect as they teach ML algorithms implementation from Scratch. That was my goal when I decided to discover the field in more depth. But, honestly, I was feeling that there is a gap somehow because many questions were left without answers along the way. Then, after reading about how to start with Machine Learning, I found that most articles emphasized the importance of combining courses with practical projects in order to apply what’s learned and better assimilate it.. and it’s so true! You just have to try to combine both and you will soon notice the difference!

So, here I am in my first practice application! 😀 I chose to try Email Spam Filtering as it’s a very common topic in applied classification. It can be understood easily because we are experiencing spam filtering in our e-mails every day.

I followed a simple starting tutorial: Email Spam Filtering: A Python implementation with Scikit-learn. Soon after finishing it, my brain started analyzing the steps and a bunch of questions bombarded my head!

Why “equal number of spam and non-spam emails”? What’s stemming? Are there other methods to clean data other than removing stop words and lemmatization? How is the partition between the training set and test set done? Why no validation set? Why Naive Bayes classifier and SVM (Support Vector Machines) were specifically used? What makes Naive Bayes so popular for document classification problem? etc..

As William.S.Burroughs said, “Your mind will answer most questions if you learn to relax and wait for the answer.”

I took a breath and started answering question by question by doing sometimes search on the net, experimenting some changes in code and analyzing the output. And I’m happily sharing  the results:

1) The data we need

– how many emails we’ve seen (will be used in train-test sets)
– how many emails go in each label (used to detect if there is imbalanced data)
– how often a word is associated with each label (used to calculate the probability of an email being a spam or ham (class 0 or class 1))

2) Cleaning data

Why cleaning the words list? Cleaning data is essential in order to reduce the probability of getting wrong results because some words have no influence on the classification (they can neither be associated with spam class and nor with ham class) and there are words that can be normalized in order to group same-meaning words and reduce redundancy. By acting on the quality of the training data, we can change what is called the accuracy of the classifier. So removing stop words, stemming, and lemmatization help in improving the results of Machine Learning algorithms.

3) Naive Bayes

Why was Naive Bayes used? Naive Bayes has highly effective learning and prediction, it’s often used to compare with more sophisticated methods because it’s fast and highly scalable (works well with high-dimensional data) and as Andrew Ng suggests when dealing with an ML problem start by trying with a simple quick and dirty algorithm and then expand from that point.

How is Naive Bayes simple and easy? Naive Bayes is based on “Bayes” theorem and it’s called “Naive” because it assumes that features are independent of each other given the class(no/little correlation between features), which is not realistic. Thus, Naive Bayes can learn individual features importance but can’t determine the relationship among features. Besides, the training time with Naive Bayes is significantly smaller as opposed to alternative methods and it doesn’t require much training data.

Why Multinomial Naive Bayes? What about other models like Gaussian Naive Bayes or Bernoulli Naive Bayes?

Well, Multinomial NB considers the frequency count (occurrences) of the features (words in our case) while Bernoulli NB cares only about the presence or absence of a particular feature (word) in the document. The latter is adequate for features that are binary-valued (Bernoulli, boolean).  Whereas, with Gaussian NB, features are real-valued or continuous and their distribution is Gaussian, the Iris Flower dataset is an example with continuous features.

4) Support Vector Machines (SVM)

Why was SVM used? I didn’t find a specific reason to that, but what I learned is that SVM delivers high accuracy results because it uses an optimization procedure. SVM builds a classifier by searching for a separating hyperplane (optimal hyperplane) which is optimal and maximises the margin that separates the categories (in our case spam and ham). Thus, SVM has the advantage of robustness in general and effectiveness when the number of dimensions is greater than the number of samples.

Unlike Naive Bayes, SVM is a non-probabilistic algorithm.

What’s the difference between LinearSVC and SVC (Scikit-learn)? The difference is that they don’t accept the same parameters. For example, LinearSVC does not accept kernel parameter as it’s supposed linear. SVC supports more parameters(C, gamma,..) since it holds all possible kernel functions (linear, polynomial, rbf or radial basis function, sigmoid).

How can tuning SVM parameters be done? Tuning SVM parameters improve the performance of the algorithm. Some of them have a higher impact:

-Kernel: Kernal is like a similarity function. It’s a way of computing the dot product of two vectors in possibly a high dimensional feature space using data transformations based on some provided constraints into a more complex space. Kernel functions are sometimes called “generalized dot product”.

-Gamma: Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. Higher the value of gamma will try to exactly fit as per training data set i.e. generalization error and cause an over-fitting problem.

-C: The factor C in (3.15) is a parameter that allows one to trade off training error vs. model complexity. A small value for C will increase the number of training errors, while a large C will lead to a behavior similar to that of a hard-margin SVM.” Joachims (2002), page 40

5) Analyzing output in different cases

What if I vary dictionary size?

Varying dictionary size means changing the number of features (words). So, I wanted to explore the impact of having more features and what’s the limit of a good result based on confusion matrix result.

I tested on size= {3000,5000,6000,7000} and discovered that at size = 7000, SVM classification starts slightly dropping (false identification) while Naive Bayes delivered same results despite the size variation.

I think that at that point maybe target classes started overlapping or training data overfitting. I’m not yet sure about the explanation of the result.

What if I try Gaussian and Bernoulli?

Obviously, introducing Bernoulli won’t help because as I explained above, it doesn’t provide enough information in our case, we need the count of words, not the presence/absence of it.

Multinomial NB:
[[129   1]
[  9 121]]
Gaussian NB:
[[129   1]
[ 11 119]]
Bernoulli NB:
[[130   0]
[ 53  77]]

As we can see, Multinomial NB outperformed both Gaussian NB and Bernoulli NB.

What if I try GridSearch on SVM in order to tune SVM parameters?
Params GridSearch: param_grid = {‘C’:[0.1,1,10,100,1000],’gamma’:[1,0.1,0.01,0.001,0.0001]}
Best params found: Gamma: 0.0001 ; C: 100

Linear SVM:
[[126   4]
[  5 125]]
Multinomial NB:
[[129   1]
[  9 121]]
SVM:
[[129   1]
[ 62  68]]
GridSearch on SVM:
[[126   4]
[  2 128]]

SVM tuned parameters using GridSearch allowed to get better results.

Conclusion

So, that was about my first steps in Email-Spam Filtering application! I hope that it’s helpful if you are thinking about starting a text-classification project! And I will continue sharing various reflections and experiments along the way. For next time, I will explore more the improvement/change of training data and features.

The project is on Github too.

Tell me about your experience with text-classification? In what did you apply it? What methods do you suggest to apply? What are the challenges?

Helpful Resources:

[1] Naive Bayes and Text Classification.
[2]Naive Bayes by Example.
[3] Andrew Ng explanation of Naive Bayes video 1 and video 2
[4] Please explain SVM like I am 5 years old.
[5] Understanding Support Vector Machines from examples.

Contributing to CS community..starting with Dzone!

Contributing to CS community..starting with Dzone!

My activities

When I took the decision to follow my passion and to choose the career that fits me better, I’ve started a journey that is different, a journey which requires a lot of effort in order to stand from the crowd.

Along the way, as I like to be involved in a community, to support and be supported, I’ve been thinking about launching a lot of initiatives. However, there was one constraint: each of these ideas that came to my mind required much time, which I couldn’t ensure at that time as I had a full plan with other priorities.
One day, I told myself why not sharing the knowledge I gain while I’m learning? As I read a lot of articles, if I summarize the best links about learning something and the best way to learn it in one article, I would make it easier to others to find what they need without having to spend much time searching on the internet.

So, I read different suggestions about how to write a good article (A common habit before doing something, I look for tips and pieces of advice on how to do it in the best way) and chose to write about a subject that is classical in computer science but that I saw differently! I was fascinated when learning it because I was discovering not only how powerful is the human brain but also how this thing is making a huge impact in our lives!
The selected subject was: Algorithms.

Writing the article was a difficult phase, I struggled to formulate what I want to say. It took me many days and still, I was not satisfied with the quality of what I wrote. I didn’t publish it.
Then, I remembered a sentence that one of my friends told me about that situation: “Don’t let perfection be the enemy of good.” It was like an awake! So, I made the step and published it on my website! To my big surprise, the article was successful! I found even some of top data science influencers sharing it and encouraging me!

The success of this small imperfect action pushed me to try something new. I had a desire to contribute to a bigger community in order to reach a bigger audience and so help more people!

As every accomplishment starts with the decision to try, here is the result!

I submitted my article in Dzone, a CS community that I admire too much, And I got accepted! My article The benefits of learning algorithms was there! I enjoyed how they put it in “Big Data Zone”!!

Today, after one month from making it available through Dzone, I’m happy to know that more than 9000 people viewed it and I hope that the content supported and motivated them! It’s just such a rewarding experience!

See you in the next action step to achieve my purpose: “Making a positive impact with my passion to CS!”

Stay tuned! I’m preparing a new contribution for the coming week!

J On the Beach 2017

J On the Beach 2017

My activities, Technology

This time I made it to Malaga, Spain!

I attended J on the Beach 2017 conference, a Big Data event targeting developers and DevOps communities. It was held over 3 days.

The first day was about workshops. In the morning session, I participated in “Building Microservices on DS/OS” ensured by Jorg  Schad where I got an idea about containers, Mesos, Clusters. It was a little bit difficult for me as all of them were new.


The Afternoon session was  “Hands on Elastic Stack 5.3”. The presenters Pablo Musa and David Pilato were very helpful and I liked so much combining discovering these tools with practice, as I believe that the best way to motivate people about learning something, is by showing results!

I discovered how it’s possible with Elasticsearch to get results of a search query over very large dataset in milliseconds. Elasticsearch has unique capabilities thanks to its indexation process.  Not only can you process a query rapidly, but also you can visualize and explore your query results easily with Kibana! Those where the parts, I most enjoyed, apart from Logstash and Filebeat.

The second day was the conference opening day and it started BIG! Eric Ladizinsky presented to us the evolution of Quantum Computers.

Eric justified weak machine learning binary classification and how quantum can make machine truly intelligent by enabling Probabilistic Machine Learning. More about Quantum age can be found in D-Wave company website.

Various sessions were held after that. The schedule was partitioned by field of interest and level to ensure there is food for everyone’s mind.

As my interest in machine learning couldn’t take down the one for Java, I was excited to attend “Real World Java 9” talk, by Trisha Gee. And I wasn’t the only one! The room was full!

Trisha started by mentioning that Java 9 is still not stable but that it’s a very promising version with its new features! She did her presentation with Live coding (Oh, Yesss!) where she tried the new Flow API and Streams API. More about her presentation here.

If we speak about BIG Data, then for sure there is a subject not to miss: Performance.

Rob Harrop presented a session about “designing and running performance experiments”. It was an interesting session! Rob emphasized that wait times are A THING. Wait times are drawn from a distribution and if you don’t know the distribution, exponential is a good fallback.

James Allerton spoke about how to develop, deploy and iterate often using Oracle management API. And as concretization is always a better way to convince, he demonstrated with a live demo!

Day one last session was about “Dynamic data visualization”. Santiago Ortiz created the link between human intelligent learning capabilities and machine learning created models (classification, labeling, prediction). Trying out things with his children, like classification, made him conclude that understanding human perception and cognition is instrumental in creating environments in which models, machines, and humans collaborate.

Before heading to day two, I like to mention one other thing: Oracle IOT digital demo. Fascinating! Isn’t it?

I couldn’t resume the first day with enough words, it was such a great one! Especially, when it ended with something related to women in tech (obviously, as I’m proudly one of them 😀 )

Carmel Hassan presented to us the visualization of collected data from JOTB17 day one in a way that showcases diversity facts. Surprisingly, the number of women attendees decreased between last year and this year from 13% to approximately 10%! While women are not well represented “live”, they tend to have greater percent “online” according to the data collected from the interaction with JOTB17 publications. Well, it seems that there is still a lot of work to be done in the context of women in tech! But no worry, superwomen heroes are there! Check Yes We Tech community if you want to support!

Here we are! Now, day 3.

To me, what marked the conference day 2 most is “Distributed Sagas” session by Caitie McCaffrey.  Distributed Saga is a collection of requests and compensating requests that represent a single business level action. It makes building microservices easier because it allows putting different services like for example renting a car, booking a hotel reservation and booking a flight ticket all together before proceeding to payment action (one payment action for the three of them). Caitie presentation really added value to my knowledge on distributed services. Her presentation can be found here.

Otherwise, for Elasticsearch users, here is an advice from Pablo Musa: Do not Overshard (Don’t just keep default settings).

So, that was about J on the beach event! But was it all? Where is the fun part?

Well, I didn’t want to mention all the fun at first, otherwise, tears will take place for those who missed it! 😀

We had a lot of fun! Spanish Omlette, Paella, JOTB Afterparty, nice organizing people and Malaga who welcomed us with its beautiful weather!

I would like to finish this article with something that I cherish much. I googled it in order to find a quote about and I think that this one describes it well:

The richest people in the world look for and build Networks. Everyone else looks for work.” Robert Kiyosaki

The World may seem big for some people, but not for me! The World is little and magical! I got to meet new friends from Ireland, Spain, Russia, Italy! We discussed many topics together, we discovered each other countries culture! We ate ice cream :D. (One of my friends will recognize himself :p by the why of ice cream) and Yeah, of course, laughed a lot!

Thank you all and see you next time!

Note: You can find more about other sessions by using #JOTB17 tag on Twitter!

A glimpse into AI potential between today and tomorrow (part 1)

A glimpse into AI potential between today and tomorrow (part 1)

My activities, Technology

April 23rd, I was honored to be invited by GDG Monastir (Tunisia) as a speaker in Google Next 2017 extended event. My talk was about Artificial intelligence and following is the written version of the idea I wanted to communicate.

I bet that when you think/hear about Artificial Intelligence many thoughts pop to your mind! Big Data, IBM Watson, Alexa, Google self-driving car, NAO, Turing, Google Cloud, but what’s Artificial Intelligence? Is it a brain that can outperform human’s brain? How does AI impact our everyday life? What are the technologies behind AI?

What’s AI?

Back to 1950, when Alan Turing, a computer science pioneer, started thinking about “What if machines learn?”. He invented Turing Test in the aim of making an intelligent machine. The test was an adaptation of a Victorian-style competition called the imitation game. At that time, Alan pinpointed a possible horizon of AI that many didn’t believe on or ever imagined.

It’s only 5 years later that, in Dartmouth Conference, John McCarthy coined the term of “Artificial Intelligence”, defining it as “the science and engineering of making intelligent machines”.

Intelligent machines, artificial intelligence.. well let’s define intelligence before digging deeper.

What Intelligence consists of [1]
As we can see, Intelligence consists of:

  1.  the ability to process data,
  2. learn,
  3. solve a complex problem in special purpose as well as general-purpose domain,
  4. ability to reason and to draw inferences based on the situation.

Do the machines have all these abilities? Some of you will say “I think yes!” others will object..and what I suggest here is to discover together the answer by checking where is artificial intelligence now! because there is no better convincing argument than real facts!

What’s the impact of AI?

AI has an impact in different industries: automobile, e-commerce, marketing, robotics, mobile and the list is still longer! And one of the most important pillars of a human being is, without any doubt: health. No one could make any advancement in technology if his health situation was deficient somehow. So, investing on AI in the health sector is a big deal for many companies including IBM and Google.

Artificial Intelligence is helping doctors better explore the human body and deal with its complexity. April 2017, a research conducted by a team at the University of Nottingham in the United Kingdom, proved that doctors can better predict patient’s heart attack risk when using AI. This can save thousands or millions of people every year.

The fact is each year, between 15 million and 20 million people fall victim to cardiovascular disease, which includes heart attacks, strokes, and blocked arteries. Doctors use a method called ACC/AHA, based on some guidelines like age, cholesterol level, and blood pressure to predict the risk of heart attack. Actually, the method is revealed to be not very efficient. The complexity of human body still hides some unknown biological guidelines.

Fortunately, as Stephen Weng, an epidemiologist at the University of Nottingham, told Science Magazine. “What computer science allows us to do is to explore those associations.”  Trying out some machine learning algorithms like logistic regression and neural network on the problem, resulted in better predictions! The algorithm was 7.6 percent more often than the ACC/AHA method and resulted in 1.6 percent fewer false positives. That means that in a sample size of around 83,000 patient records, 355 additional lives could have been saved. [2]

Achievements of AI in medicine don’t end there:

  1. IBM Watson for oncology is working cancer diagnosis,
  2. Google DeepMind blindness detection: Two million people are living with sight loss in the UK, of whom around 360,000 are registered as blind or partially sighted. [3] The research project is investigating how technology could help to better analyze scans, giving doctors a better understanding of eye disease.  The team working on developed an AI algorithm that can automatically identify diabetic retinopathy, a leading cause blindness among adults. “We were able to take something core to Google—classifying cats and dogs and faces—and apply it to another sort of problem,” says Lily Peng, the physician and biomedical engineer who oversees the project at Google.
  3. Robots as surgery and patient assistants like Romeo, developed by SoftBank for elderly assistance.

While it’s true that a lot of work still need to be done in order to actually help developing medical services and patient assistance, AI impact is already there and doctors believe in how collaboration with AI will be beneficial, “I can’t stress enough how important it is,” Elsie Ross, a vascular surgeon at Stanford University in Palo Alto, California, told Science, “and how much I really hope that doctors start to embrace the use of artificial intelligence to assist us in care of patients.”

Moving on to the first field that grew with AI: Games.
The story started with computers beating humans at games like chess or Jeopardy by simply calculating all possible moves on a board or rote learning.

Today, computers may even have the ability to think ahead and reason! Researchers were not considering these possibilities before Google’s Alpha Go win!

March 2016, Lee Se-dol, one of the world’s top Go players, won just one of the matches against the AlphaGo program. The game is considered to be much more complicated than chess, as it has much more move possibilities ( > 10170 legal positions).  Thus, it usually requires a special human skill: intuition and creativity.

DeepMind researchers said that kind of computing power has real-world promise outside of gaming, especially in health care and science.

Health, Gaming, what else?

Google self-driving car

Well, people are now talking about vehicles that not only transport us but also think for us!

Google began testing a self-driving car in 2012, and since then, the U.S. Department of Transportation has released definitions of different levels of automation, with Google’s car classified as the first level down from full automation. Other transportation methods are closer to full automation, such as buses and trains.

AI application areas in transportation field are quite diverse. Thanks to pattern recognition, system identification, classification and many other subsets of Artificial intelligence, transportation has known a huge evolution over the years!

Traffic state tracking, advanced driver assistance systems (ADAS), driver alert systems (e.g drowsiness detection) already exist to provide drivers with a better driving experience. Google, Tesla, Volvo as well as many other companies are focusing on providing road safety in a way that “Artificial intelligence could change our emotional as well as our practical relationship with cars“.

Ok, you may tell me: that all sounds good but I just don’t have all these applications of AI around me! “I’m not in the USA to see self-driving cars”, “I’m not a doctor to try out AI assistance in my job”, “I don’t play games!” I’m not sure about the impact of AI in my life!

You are already using AI! It’s not a science fiction! It’s real. Maybe, you are just not aware of it! And whether you want it or not, Artificial intelligence is influencing each aspect of your everyday life.

While surfing on the internet you are experimenting user behavior tracking. Many companies like Google, Amazon, Facebook and others are using your preferences, history, searches, and purchases, in order to predict your interests and target you according to the situation (Facebook news feed, ads on the right/left side of a website page, suggestions of books to buy,..). So, through Google search, Facebook, movies recommendations in Netflix, Amazon store, you encounter aspects of Artificial intelligence.

AI is within your mobile too! Siri, Google Now, Cortana are all intelligent assistants using AI technologies designed to help you get answers to your questions, or perform actions with voice control. For example, you can say: “What’s the weather today?” and the assistant will search for the information and relate it to you.

Not only that, AI is also used for online customer support. Not every website you visit has a live person communicating with you. In many cases, you’re talking to a rudimentary AI (chatbots).

You see? AI is everywhere! Your smartphone, your car, your bank and your house possibly too!

 

At this point, the story is starting to be like a horror film: talking about all these AI impacts is raising some questions..it seems controversial, right? Intuition, creativity, assistance, holding your data and maybe risking your privacy, using data to train machines, helping doctors, distracted driver-spotting, Humanoid robots, what’s next? Does Artificial intelligence hold real “Intelligence” (that we defined at the beginning)? Is AI going to take our jobs? 

To be continued…

Resources:

[1] Artificial intelligence defined, Deloitte.

[2] Can machine-learning improve cardiovascular risk prediction using routine clinical data?

[3] Google DeepMind research in health.

Is cracking the coding interview the only benefit of learning algorithms?

Is cracking the coding interview the only benefit of learning algorithms?

challenges, learning, My activities, Technology

Often, algorithms are considered only when someone is looking for a new job. This tight perception of algorithms use puts us away from what algorithms can allow us to achieve!

Actually, algorithms are everywhere! Algorithms are involved in each aspect of computer science! Not only that but also used in a wide range of fields: recommendations, social media, medicine, psychology, transportation and the list is longer still!

Anything you do, can be broken down into small steps and that, is an Algorithm. Imagine you wake up the morning to go to work and you can’t remember where are your car keys, how would you find them? One approach might be to apply an algorithm, which is a step by step logical procedure. First, you look for places where you used to put it in. You try to remember the last time you used it. You check the place where you went when entering the home. Sooner or later by the flow of steps, you eventually find the car keys. An algorithm is a sequence of instructions which ensure a certain task completion. We, as human, apply algorithms in order to take actions in every aspect of our lives.

An example of how to prepare a Smoothie. It needs an input which is Fluid Frozen fruit Flavorings, and a sequence of ordered steps to be executed to deliver a Smoothie as an output. These are algorithm components. (original resource of the picture)

So, algorithm knowledge can be an asset in improving your life or even others’lives! You can turn something that takes the time to decide on it into an algorithm that assists you in the decision process. You can create an algorithm in order to prioritize your tasks, instead of doing it by a simple to-do list. You can make an algorithm in order to predict the best time to visit a city. You can make algorithm in order to get suggestions on what films to watch according to your preferences and historical data. There is no limit to what you imagine and can make real by applying algorithms!

Now, I think it’s time to start talking about how to gain those set of skills in order to make a difference with algorithms! First, let’s go for the characteristics of an algorithm.

What are the characteristics of an algorithm?

  1. It should be finite: If your algorithm never ends trying to solve the problem it was designed to solve then it is useless
  2. It should have well-defined instructions: Each step of the algorithm has to be precisely defined; the instructions should be unambiguously specified for each case.
  3. It should be effective: The algorithm should solve the problem it was designed to solve. And it should be possible to demonstrate that the algorithm converges with just a paper and pencil.

What are the steps to learn algorithm coding?

  1. Develop your programming logic. Here some ways to improve it.
  2. Pick up a programming language you are comfortable in (if you don’t have any, I suggest learning python, it’s easy and simple)
  3. Learn Data structures: Start with the basic ones: String, Vectors, Lists, Arrays, Map. Then Queue and Stack. Finish with Advanced ones: trees, graphs, tries.
  4. Practice coding data structures. Here is a good link
  5. Learn simple algorithms first and then move to the most common algorithms. Make sure you understand every step and don’t be frustrated from being slow. Remember « It does not matter how slowly you go as long as yo don’t stop ». You can look for videos because they make it easier to assimilate.
  6. Start tackling problems (e.g in Hackerrank, there are funny and challenging algorithms). You must have a strategy of resolving problems, I’d suggest following the method proposed by Gayle Laakmann McDowell.

Finally, through my own modest experience in problem-solving, here are my pieces of advice :

  • Ask questions: « why » and « what » are really important in order to understand the basics of anything! When you question the choice, you will better memorize the answer and you will dig deeper into the source of the information.
  • To stay motivated and keep the learning curve up, create a simple daily useful algorithm after getting the basics. Accomplishment fuels your energy.
  • Try to solve the problem manually by yourself first. Take a paper and start writing steps. Especially take your time and don’t move on to the solution fast.
  • Even if you figure out the solution, don’t tell yourself « It’s ok now, no need to implement it, I know how it works » It’s a big trick there because you never know what you missed if you don’t try it and test it with corner cases.
  • Be patient. If you begin to feel frustrated about a problem, leave it and go do something that is not so stressful. Then, come back.

I hope this helps you to leverage the power of your brain, by discovering its super capabilities to execute algorithms and learning how to translate them into code! 😉

Remember how was Rome build (not in a day right)? By each small step in learning algorithms, you are growing your potential to contribute to making life better in this world!

Suggested books:

Github resources:

mission-peace

Online resources:

Topcoder: an explanation of data structures and algorithms.

ideserve: contains a visual explanation of some algorithm problems

Youtube resources:

Tushar Roy : explanation of some advanced algorithms

Celebrate each step! welcome in my website!

Celebrate each step! welcome in my website!

My activities

 

Welcome in my newly born website and with my own domain name! (I know, it’s an achievement as I was planning to do so since a looong time 😀 )

About achievements, I would like to share with you some thoughts :

1- Make a goal and believe in yourself!

2- Don’t think about the height of the mountain, visualize your feelings when you are at its top!

3- Take risks and you will be surprised of what you can achieve.

4- Celebrate each step! you are making your dream real! you’re closer!

5- Organization and planning are the key for relieving stress along the way and staying productive.

6- As I learned from classy career girl: “You don’t find time. You make it!”

I hope you will enjoy the website! It’s still in its first steps 🙂

 

 

Data Natives Berlin 2016 day 2

Data Natives Berlin 2016 day 2

My activities, Technology

Berlin, October 28th, Data Natives day 2. It was about tech trends, including numerous topics: Deep learning, Source code abstracts, analyzing and searching unstructured Medical Data, smart-data based lending, intelligent hiring, and more.

Klaas Bollhoefer, Chief Data scientist at *um, tackled the gap between several departments in a traditional company and took the example of “operational gap” study, made for one of *um clients. He demonstrated how *um approached  Metro Group issue: what does Metro need to do to become a leading analytics-driven omni-channel retailer. Among that, Klaas stated that “every company requires unique capabilities” in order to reach Data Maturity level and achieve digital transformation. For that, a bottom up approach should be done for analytics in order to explore the framing of any company. That means taking care of traditions and use that to build a new feature. It’s a step-by-step Data thinking model based on capabilities, insights & recommended actions derived from what it really needs to think.

20161028_110613

Moving to finance world, where massive data collection is obviously taking the main part of it, what’s all the collected data from clients is used for? What’s the benefit for the client?! Here came, Patrick Koeck, COO of CreamFinance presentation about “Smart Data-based lending”. Patrick focused on the difference that makes the use of Big Data VS Smart Data. He showcased how smart assessment of data by contextualizing it conducts to a better client experience and a digital transformation. Based on consistency and higher stability with less noisy informations and  more transparent models, smart data helps customer segmentation and decreases the stealing risk  of sensitive data.

img_0141

From data collection and usage comes the struggle to create a value! Martin Loetzsch, CDO of project A ventures, invoked the necessity of creating value as a data scientist in early stage startups by making the link between tech team and business team. He demonstrated the low values creation such as being unclear and for example just ask developers to do something great without taking care of their needs. Thus there is a need to killing black boxes in order to achieve high value creation because people don’t trust black boxes.

cv2rfafvmaef99y-1

Then, Lisa Winter, Sr. Analyst Executive Compensation and Nicolas Rossetti, Risk analyst at Willis Towers Watson, presented to us how to build the bridge between Skillset and use case to become a data scientist. This involves: subject matter expertise, the statistics extension to subject matter expertise and the technology extension to statistics. They used the example of how to build a personalized health insurance pricing and offering as an illustration. Besides, Lisa took the initiative to present to us her “Faces of data science series” or real life data scientists by presenting a list of women in data science as well as men. By the way, I was happy to be mentioned in her list!

IMG_0147.JPG

Well, the hiring process can also be benefiting from AI and Emotional Analytics, this is what Darja Gutnick, Founder and CEO of 12 Grapes, explained to us by stating 5 principles of intelligent hiring, among them goal commitment, accountability and the focus on results. She affirmed that “Alignment on values is everything” and gave the steps of how to predict the success of new hires through measuring skill complementarity and value alignment, according to 12 Grapes model.  Thus, achieving “good hires”.

IMG_0169.JPG

This is about talks, and do you know? With Data Natives, there is also surprises! The conference weclomed on stage Anish Mohammed, Advisory Board Member at Blockchain Advisory Group, for a cyber security talk! Anish explained the link between smart contracts and cyber security. He showcased the Advanced Persistant Threat life cycle, consisting of: incursion, discovery, capture and exfiltration. Also, he presented security analytics evolution and cyber security future scenarios.

IMG_0164.JPG

IMG_0163.JPG

I couldn’t finish the blog without mentioning the “Big Data in FinTech Panel“! It was an exceptional panel where the evolution of FinTech and its Horizons were discussed with the presence of Dr.Stephan Wachenfeld, MD at Savedo, Christian Rebernik, CTO at N26, Jacqueline Faridani, Associate Director of Quantitative Analytics & Credit Risk at VendorMach as panelists and Devie Mohan, CEO at Burnmark as moderator. During the panel, Stephan affirmed that banking is not done anymore by bankers, Christian harnessed the fact that it’s about creating value and using data for the unique benefit of the customer! Number26 is a concrete illustration of that.

20161028_173317.jpg

Well, it’s not all, there is a lot in that day! It was incredible! I had the pleasure to meet Evelyn Münster, a data visualization addict, who encouraged me to follow my dream and keep it up. Otherwise, I was honored to know at the end of the event Elena Poughia, the founder of data natives and Warren Channel, the operations and community manager at Dataconomy Media as well as Maya Lumbroso, the Business Development manager at Dataconomy Media.

20161028_181952.jpg

I would like to thank all the organizers and volunteers. I will be pleased to join their team for Data Natives Berlin 2017, which will be held on 23th and 24th of November.

Data Natives 2016 was a real success! and here a summary about it:

20161028_181739.jpg

As a closing of the event overview, I’d like to present to you Lisa and Nicolas conclusion:

IMG_0157.JPG

See you soon my dear readers! Hope you enjoyed my overview of Day 1 and Day 2 of Data Natives 2016, Berlin-Germany.

Finally excited to announce a scope! I’m eager to attend the next conference dedicated for big data developers for which Luis Sánchez, Digital Marketing Manager at valo.io invited me! So, my next tech stop will be on the beach in Malaga, Spain, to attend the event hosted by valo.io. I’m very thankful for Luis! It was nice to meet him. See you there!

Note: Data natives day one overview: here.

Visit my twitter account @MestiriSarah for more feedback on Data Natives 2016.

Data Natives Berlin 2016 Day 1

Data Natives Berlin 2016 Day 1

My activities, Technology

My journey in Berlin started with two days enjoying its winter landscape and counting days down to the conference for which I was determined to do the whole trip from my country, Tunisia in order to attend.  Thursday, here it is! Data Natives opened its doors!

I got up that same morning very excited to meet people sharing with me  the passion for technology! I already discussed with some of them on twitter,like Lisa Winter and Evelyn Münster, so I was eager to get to know them.

Looking at the schedule, it was difficult to choose which talks to join since all of them were interesting. There were subjects about Visualizing and Communicating High Dimensional Data, Clustering Data, Big Data Semantics, Connected things, and much more here. Besides, as Artificial Intelligence is a field full of diversity in every component, points of view also vary, and that’s why there was a Panel about “the future of AI and Universal Basic Income”, directed by Prof. Dr Hans Uszkoreit, Scientific Director at DFKI and with Nima Asghari, Senior UAV Applications Expert at Drone Industry Insights, Dr. Abdourahmane Faye, Big Data SME Lea, Michael Bohmeyer, CPO at admineo as panelists. The discussion was mainly about the job opportunities drop down due to robots replacement of many tasks done by Human now and by consequence causing a reflection point: is it really a curse? Should we be optimistic about that fact? Are they killing our jobs? Is the Universal Basic Income a solution in order to ensure living cost for those would-be jobless people?

14876361_655826507910789_1573131831_o

Top talks according to me? Well, I’ve enjoyed starting the day with Kim Nilsson, CEO and Co-Founder of Pivigo where I discovered “what does it take to be a successful Data Scientist”. DS potential consists of Interest + Technology (tech skills in at least one of the most commun tools/languages) + Communication (As Data Science is not easily understandable by non-DS people, it requires communication skills so that the information can be transmitted) + Motivation + Hard WorkAcademia.

IMG_0128.JPG

After that, I was pleased to see an example of how data processing and integration can be a benefit when it’s accessible by everyone! This can be done using Open Data. Christina Schoenfeld, Business Developer at Open Data Soft, presented to us the advantages of harnessing open data by making it easy to publish and to understand in Germany, which can be of a huge innovative potentiel, with impact on society and economy.

IMG_0136.JPG

In the other side, Alexandra Deschamps-Sonsino, Director of Design swarm, shared with us an important reflection time about the internet of things and the use of data, questioning: “When things Speak, are we ready to listen”. Many hidden truths behind the data were showcased.. such as the ones using fitness track systems actually do not loose weight while the others  who’re NOT using them Do!More intriguing, the tools used to track elders in order to help take care of them actually made them more susceptible to be left alone, with miserable company (TV and pets) instead of having their children around them.

Oh, yes! there is Data Hypocrisy!..

Many more interesting points were mentioned by Alexandra and I’ll relate one more: she presented some alternatives like “good night lamp” and Data Brick which encourage people to have control over their own data experience.

14795904_655824607910979_1121551727_o.jpg

I’ll say no more, and I’ll let the last words on this blog post be Alexandra’s last presentation’s advice. See you tomorrow!

“Question what is hiding behind your data. Find out what behaviors are you really enabling, good and bad. Think about what is the alternative to using your product.” Because.. ” Data is a slave to use. Use is a slave to experience. Experience is a slave to culture.”

Note:  Data Natives day 2 is available here.

Tunisia Women Hack Day

Tunisia Women Hack Day

My activities

Two days ago, I was invited in one of the most popular Tunisian radio stations “Express Fm” to speak about a first time event in Tunisia, Women Hack Day. As its name suggests,  it’s for every woman who’s interested in coding. You may ask why is it just for women? Well, just keep reading and you will find the answer.

Let’s start with a story. I remember when I was eager to try a Hackathon during my university studies because I like being challenged. As usual, I want to discover my own limits and push them even further. Fortunately, the opportunity presented itself with “Google INSAT Club”, and I didn’t think twice. So, I was looking for my colleagues, especially girls, to compete with me because (as you probably know) teaming up with friends creates a more challenging environment and of course a more cozy one. I didn’t think, at that time, that it would be hard to find a team but it was! Maybe girls don’t want to stay a whole night outdoors or maybe it’s just scary for them..Anyway, I finally decided to compete with what’s available: Boys.

To my big surprise, when I made it to the convened place, I discovered that I actually was the only girl participating among about 25 boys!

I think you got the answer now, didn’t you? Similar accounts are made by different people in Tech, among them, Access Now, a digital rights organization, Netlinks, a student association and ArabWIC (Arab Women In Computing). So, it was decided to break the ice and create an opportunity for girls, following computer science studies, to show their skills in coding through Women Hack Day, on November , 28th in INSAT.

We believe that women are giving much to the World of Technology and Computing and we are committed to the success of this event which is not only a Hackathon but aslo an opportunity to meet Tunisian Women with bright success stories.

In order to present the event, I had the pleasure of speaking about it on Express Fm, as ArabWIC Tunisia Chapter Coordinator. It was my first time on radio and you know what? It felt great!

Finally, I’m happy to share the link with you so that you can get more details about the event and the radio show: here.

12295426_961032107302146_4838282584711016767_n
Me on the left, during the radio show “DATA” of Express Fm- 23/11/2015.