The Three Paradigms of Machine Learning
Machine Learning by Rohan Chhipa
Machine Learning? Whaaa?
In this day and age machine learning models and algorithms can perform amazing feats of computations that rival state-of-art algorithms and human performance. Examples of applications of machine learning can include playing games, uncovering patterns and anomalies in data, generating artwork, generating music and many more. People, when witnessing these feats, often tend to chalk up these feats as "AI" without knowing the complexity and intricacies behind the AI. In this article, we will be looking at the three paradigms of machine learning and how each one can be used to solve different classes problems.
The first and most common form of machine learning is known as supervised learning. For this type of learning labelled data is extremely important. Labelled data consists of a set of attributes and some classification, the following table is an example of a labelled dataset where we show attributes of a vehicle in the first four columns and finally a vehicle type in the final column.
For the above dataset, the size, wheels, fuel type and no. of seats can be considered the attributes of the dataset while the vehicle type is the classification. However, if we were to ignore the vehicle type and simply look at the attributes, we would be able to determine what type of vehicle the attributes are describing. This classification that we are able to make stems from our basic knowledge of vehicles that we have learnt over time.
This is the behaviour that supervised learning attempts to mimic. A labelled dataset is used as a source of knowledge for an AI and is used to train it. Given an AI we would provide that AI some form inputs, in the above example the inputs would be the attributes of a vehicle. The AI would then process that input in some manner and produce an output which is the classification based on the inputs. In the above example we had different types of vehicles as the classifications, therefore, the AI would output one of those classifications. But what if the AI outputs the wrong classification? Well, this is where classification in the labelled dataset becomes important. During training we observe the classifications produced by the AI we then compare those classifications to the actual classifications in the labelled dataset, if the two classifications are the same then we don't have to do anything since the AI figured it correctly, however, if the AI produces a wrong classification we adjust the AI to make sure we don't get it wrong again.
When training an AI using this type of learning we are effectively creating a supervised environment for that AI to learn in. Using the labelled dataset we can help AI learn a concept (such as vehicles) and help it improve and adjust when it makes a mistake. This is the core ideology behind supervised learning.
In the previous section, we dealt with supervised learning, which is a learning approach that can be used to learn how to classify different types of inputs assuming we have a labelled dataset available. However, what if we don't have labelled data available? For example suppose we have a smart home where we have multiple sensors set up across the house and they report on all different types of data such as geyser water temperature, motion in rooms, the brightness of lights in rooms, etc. This is data where no obvious classification can be made, it's just a large amount of data that will accumulate over time.
What do we do with this type of data?
Well, we can try and learn something from it. The goal of unsupervised learning is for an AI to analyse unlabelled data and detect patterns within that data. Unsupervised AI's are amazing at finding odd correlations and patterns that are not necessarily intuitive to humans. Of course, given enough time and statistical modelling we humans would be able to come to the same patterns in the data, but why not have an AI solve the patterns faster? Unsupervised learning algorithms are often used in Data Mining where once again we tend to deal with large amounts of unlabelled data.
The last and final form of machine learning is reinforcement learning. Initially, this type of learning was not as popular as the first two, however, in recent times more attention has been paid to reinforcement learning due to its applications.
By definition, reinforcement learning is a form of training where an AI observes some environment and depending on the state of that environment the AI will perform certain actions that will affect that environment. If said actions lead to an improvement in the environment the AI will be rewarded, however, if there is a decrease in the quality in the environment the AI is punished. For example, think of a new dog that you've just brought home. You want to make sure that the dog doesn't mess in the house. To stop it from doing this you make sure that every time the dog actually does messes in the house you spray it with some water and every time the dog messes outside you give it a little dog treat. This will entice the dog to rather mess outside and be rewarded as opposed to messing inside and being punished.
Reinforcement learning is a goal-driven method of learning where the actions available to the AI will enable it to achieve its goal which for all reinforcement learning AI's is to maximize the reward. This idea of rewarding vs punishing during training will cause the AI to learn how to adapt to different states of the environment and always try to perform actions that lead to improvement within that environment and further achieve its goal.
A very good example of this is OpenAI's Dota and DeepMind's StarCraft bots. Both these AI's were created to play both Dota and StarCraft against humans. To train an agent for these games we need to determine the set of actions available to the AI as well as the rewards/punishments. The actions are pretty easy since this AI's task is to play a game so it should be able to make any moves that a human would be able to make, the rewards and punishments are much trickier to figure out. A general rule of thumb for rewards is if the AI makes a move that negatively affects the opponent it's rewarded, similarly, if the opponent makes a move that negatively affects the AI then it's punished for allowing its opponent to make such a move in the first place. However, there are situations where the rewards can be complex.
An important concept to note here is that this form of training is conducted in a controlled environment. Usually, these environments are virtual i.e they were built using code. The core concept is that the AI will learn to solve a particular problem within this controlled environment and once the training is complete it can be placed in a real-world setting where it can be used. Another important concept to note here is that once the AI is placed in a real-world setting one of two things can happen:
1:The AI will stop learning completely and only apply what it has learned during the training phase
2:The AI will continue learning from its new environment. This will require feedback from the environment after an action is carried out. This feedback is required because we need to know if the action was beneficial i.e should we reward or punish the AI?