Supervised Learning


In supervised learning, the data that the algorithm trains on has both input and output.

When using supervised learning, the algorithm uses the input to make a prediction and compares the prediction against the expected output. If it’s incorrect, the algorithm will modify itself in some way to make better predictions in the future.

Supervised learning is powerful and simple if you have the data that you need. If you don’t have neatly labeled data, or if your data is poorly labeled, then supervised learning is not going to be as effective.

Example for Supervised Machine Learning

In a model to predict churn, the data would be various historical facts about customers (the inputs at production), paired with whether they churned or not (the outcome we expect the model to predict).

The dataset is broken into two parts: the training set and the test set.

The training set is used, as the name implies, to train the model to map certain patterns in the data to the historical outcomes. Once the model is created, the test set is used to verify the accuracy of the model by comparing the model’s predictions to the known outputs.

You can imagine this scenario as being something like a textbook with an answer key. After studying, you can try to do the exercises in the textbook, and then compare those answers to the answer key to see how you did.

Typically in data science, a model trained through supervised learning is considered successful if it can make predictions that match the known outcomes at an acceptable level of accuracy.

Types of Supervised learning Algorithm - 

1. Regression

Regression algorithms are used if there is a relationship between the input variable and the output variable. It is used for the prediction of continuous variables, such as Weather forecasting, Market Trends, etc. Below are some popular Regression algorithms which come under supervised learning:

  • Linear Regression
  • Regression Trees
  • Non-Linear Regression
  • Bayesian Linear Regression
  • Polynomial Regression

2. Classification

Classification algorithms are used when the output variable is categorical, which means there are two classes such as Yes-No, Male-Female, True-false, etc.

Spam Filtering,

  • Random Forest
  • Decision Trees
  • Logistic Regression
  • Support vector Machines

Advantages of Supervised learning:

  • With the help of supervised learning, the model can predict the output on the basis of prior experiences.
  • In supervised learning, we can have an exact idea about the classes of objects.
  • Supervised learning model helps us to solve various real-world problems such as fraud detection, spam filtering, etc.

Disadvantages of supervised learning:

  • Supervised learning models are not suitable for handling the complex tasks.
  • Supervised learning cannot predict the correct output if the test data is different from the training dataset.
  • Training required lots of computation times.
  • In supervised learning, we need enough knowledge about the classes of object.

Unsupervised Learning

In contrast to supervised learning, unsupervised learning has input but no expected output. Unsupervised algorithms automatically learning patterns or groupings that exist in the data.

Unsupervised machine learning is useful for transactional data, such as sorting potential customers into categories based on shared attributes for more efficient marketing, or identifying the qualities that separate one group of customers from another.

Unlike supervised learning above there is no correct answers and there is no teacher. Algorithms are left to their own devises to discover and present the interesting structure in the data.

Example for Unsupervised Machine Learning

Data sets containing images of animals. The algorithms may then classify the animals into categories such as those with fur, those with scales and those with feathers. It may then group the images in increasingly more specific subgroups as it learns to identify distinctions within each category.

Types of Unsupervised learning Algorithm - 

  • Clustering: Clustering is a method of grouping the objects into clusters such that objects with most similarities remains into a group and has less or no similarities with the objects of another group. Cluster analysis finds the commonalities between the data objects and categorizes them as per the presence and absence of those commonalities.
  • Association: An association rule is an unsupervised learning method which is used for finding the relationships between variables in the large database. It determines the set of items that occurs together in the dataset. Association rule makes marketing strategy more effective. Such as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of Association rule is Market Basket Analysis.

Below is the list of some popular unsupervised learning algorithms:

  • K-means clustering
  • KNN (k-nearest neighbors)
  • Hierarchal clustering
  • Anomaly detection
  • Neural Networks
  • Principle Component Analysis
  • Independent Component Analysis
  • Apriori algorithm
  • Singular value decomposition

Advantages of Unsupervised Learning

  • Unsupervised learning is used for more complex tasks as compared to supervised learning because, in unsupervised learning, we don't have labeled input data.
  • Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled data.

Disadvantages of Unsupervised Learning

  • Unsupervised learning is intrinsically more difficult than supervised learning as it does not have corresponding output.
  • The result of the unsupervised learning algorithm might be less accurate as input data is not labeled, and algorithms do not know the exact output in advance.