Supervised learning is where we provide the model with the actual outputs from the data. This let’s it build a picture of the data and form links between the historic parameters (or features) that have influenced the output. To put a formula onto supervised learning, it would be as below, where, Y is the predicted output, produced by the model and X is the input data. So, by executing a function against X, we can predict Y.
Y = f(X)
The goal of supervised learning is to be able to model the influence of input parameters (X) so well, that we can accurately predict the output (Y).
Supervised learning is used for regression, classification, decision trees and neural networks, a few examples are included below:
As supervised learning forms the bulk of machine learning projects in the real-world, I will be almost exclusively focussing on supervised learning models in the subsequent posts on Netshock. We will be looking at end-to-end projects, from definition, to feature extraction to modelling, to training to deployment.
Unsupervised learning is where we do not provide the model with the actual outputs from the data. Unsupervised learning aims to model the underling structure or distribution in the data to learn more about the data. The most popular use-cases of unsupervised learning are association rules – which is where we uncover rules that describe a large chunk of our data. For example, Amazon uses such learning to state that people that bought this, also bought that.
Semi-supervised learning sits in the middle of supervised and unsupervised. It’s where only some of our input parameters have associated outputs. So, we don’t receive the actual result/output for given input parameters.
A good use-case for semi-supervised learning is web page classification. Let’s say we wish to classify web pages as ‘news’, ‘learning’, ‘entertainment’, etc… It’s very cheap and easy to crawl the web and extract a list of webpages, but it’s very expensive for humans to sit and classify them manually. So, we may choose to classify a sub-set of the data manually & use that to help train the model.