Events Jan 03, 2017

Partech Shaker hosted Benjamin Guinebertière, expert in Machine Learning at Microsoft France, for an Entrepreneurs’ Talk session introducing machine learning with various scenarios representing innovation by data.

 

In the realm of big data, the main issue today is giving value to data. To this end, machine learning scenarios can now be considered the most innovative and advanced from a technological standpoint.

 

Definition of machine learning

One way of defining machine learning is by saying that it makes it possible to predict the near future using past data. Examples of this include churn analysis (e.g. predicting the departure of a client or employee), spam filtering (fraud detection), and anomaly detection (such as the possibility of using artificial intelligence connected to drones to analyze the state of power lines).

Another definition is programming by example. This machine learning model involves the machine extrapolating new behaviors based on what it has learned, becoming able to accomplish tasks by itself. It is capable of a certain kind of learning, such as defining a graph in a format y=ax+b from a series of points, or equations with more complex dimensions.

 

The data

In order to work, machine learning tools need to have access to a lot of data. These data can be either digital (e.g. from a website) or real (and requiring digitalization). This is where connected devices become particularly useful. The possibilities are considerable: every company is a data company. At the 2015 Strata + Hadoop World conference, Joseph Sirosh, who is in charge of machine learning products at Microsoft, presented a method for improving the number of calf births from a herd of cattle. Equipped with connected devices for counting the number of steps taken by the cows, the team was able to detect health issues early on and prevent loss, and improve cattle production by the accurate detection of estrus.

https://www.youtube.com/watch?v=oY0mxwySaSo

 

Typical big data machine learning architecture

 

Data are collected from various sources: connected devices, social networks, websites and databases. The data can be processed in real-time or stored upstream of NoSQL or relational databases in a data lake, without any constraints as to how this is formed. Big data engines can process the data so that they can be used, or store the newly-processed data in the data lake. The advantage of this architecture is the possibility of numerous iterations with very varied types of data.

 

Architecture and techniques

There are two major types of architecture. Lambda architecture makes it possible to process events in near real-time (hot path) or process aggregated historic data (batch processing, or cool path). With non-Lambda architecture, the same engine does both.

Two major languages have stood out so far. R is a language for statistical computing that has a strong open-source community. Microsoft purchased it from Revolution Analytics. A new version of R exists with open-source access, in addition to Microsoft R Open (an enhanced, open-source solution distribution of R) and a paid solution sold by Revolution Analytics, Microsoft AirServer. Today, R is included in SQL Server, Azure and others.

The other language is Python, which is used both in development and in digital modeling. It has a large community and many libraries, including NumPy, SciPy, Pandas, matplotlib, statsmodels, and scikit-learn (managed by researchers from the French Institute for Research in Computer Science and Automation).

Several Microsoft spaces make it possible to bring these concepts together, access courses or experiment, notably azure.com/documentations and Aka.ms/data.

 

Deep learning

 

Deep learning refers to the ability of a computer to recognize representations (images, text, videos, sounds) by showing them to the computer many times. Represented by Yann LeCun, it is the new name for vaguely biologically-inspired neural networks. Multiple layers of neurons produce a hierarchy in representations, thereby making it possible to move towards increasingly abstract concepts. From 2012, deep learning models were improved with expanded learning capacities. In 2015, machine vision surpassed human-level performance in large scale image classification tasks with deep learning. These advances make progress possible in methods of artificial vision, image recognition and automatic translation.

There are several frameworks for deep learning, including ones created by Microsoft, Facebook and Google. Microsoft developed the Computational Network Toolkit (CNTK), which has been available under an open-source license since January 2016. It works with Brainscript and Python.

 

Moving towards interactive machine learning

 

Artificial intelligence can adapt automatically to its environment via methods known as reinforcement learning. Google DeepMind developed an AI that was able to learn to play the Space Invaders video game by observing the pixels and the score. Microsoft launched Project Malmo, an AI development and research platform developed by Katja Hofmann that is able to play Minecraft.

https://www.microsoft.com/en-us/research/project/project-malmo/

One of Microsoft's missions is to democratize machine learning – to build a Cortana for everyone. The goal is to be able to offer the general public a range of cognitive services – models that have already learned in the areas of natural language understanding, chat bots, emotions and facial recognition.

 

 

Join the Entrepreneurs' Talk MeetUp group  and don't miss any of our next events!

Partech Shaker, what is it?