Supervised vs. unsupervised machine learning

Noelle Robillard

4 years ago

Machine learning is a powerful tool that many companies can use to their advantage. The ability to have algorithms make decisions based on large scale sets of data enables teams to build efficient, scalable tools. Some of these algorithms require frequent monitoring and management from data scientists in order to get up to speed and continue learning. Others are able to operate and learn on their own in order to generate new information to act on!

Supervised and unsupervised machine learning algorithms both have their time and place. Let’s discuss a few examples, the difference between the two, and how they can be used together to create a powerful, AI-driven strategy for your company!

Supervised Machine Learning

Supervised learning algorithms are trained over time based on foundational data. This data will provide certain features as data points that will teach the algorithm how to generate the correct predictions. Figure 1, below, provides an example of a binary classifier and a set of data about cats and dogs that will teach the algorithm how to identify one or the other!

These models function best in situations in which there is an expected, intentionally designed output. In the example above, the expected output is that the algorithm can properly separate cats from dogs. In digital debt collection, it may be separating accounts that will be easy to collect on from ones that are more difficult.

Classification vs. Regression

The models above are both examples of a supervised learning model that is seeking classification, but supervised learning can also be used to build regression models. The key difference between the two is that in a regression model the output is a numerical value rather than categorical.

A regression-based model may use input features such as income and whether or not they have children to accurately predict a person’s age. When using a regression based model in combination with consumer data, you can even segment demographics for communication and marketing.

For full transparency we want to state that TrueAccord does not use its customer demographic data for these purposes. This is strictly an example.

With proper supervision, these models will become more accurate over time, and the data scientists building them can adjust them as business needs change. Whether you are gathering data using a regressor or a classifier, it is dependent upon the data scientists to build the most effective inputs in order to get the “correct” output.

Unsupervised Machine Learning

While supervised models require careful curation in building proper features that will lead to the “correct” output, unsupervised models can take large sets of unlabeled data and identify patterns without aid. The output variables (e.g. dog or cat) are never specified because it is now the algorithm’s job to process and sort the data based on similarities that it can identify. Using this method, you can learn things about your data that you didn’t even know!

Clustering vs. Association

Just as supervised models have primary methods for training their output data as either classification or regression models, unsupervised models can be trained using clusters or associations. Clustering algorithms gather data into groups based on like-features that exist in the data set.

If you have thousands upon thousands of customer accounts in your system, a clustering algorithm can learn using the customer data and form them into distinct (but unlabeled) groups. Once it has assigned these clusters, data scientists can review the output data and make inferences such as:

This cluster is all of the accounts that have not yet established a payment plan
This cluster is all of the users that started signing up for a payment plan but didn’t finish the process

This new data set then provides the foundation for a new outreach strategy!

Building the infrastructure to process this data is the hardest part. Learn more about how TrueAccord is laying the foundation for scalable machine learning systems!

Association algorithms are the other end of unsupervised learning algorithms. Associations take the idea of grouping random data points one step further and can make inferences based on the data available. Continuing on from our account creation example, an association-based model can identify two data points and draw conclusions based on the patterns it finds. One such pattern may be:

A person that signed up for an account the first time they opened an email is more likely to pay off their balance.

The algorithm recognizes that multiple steps in a customer’s journey creates another data point. Because association algorithms are still unsupervised, a team of scientists will be responsible for labeling the output data, but the algorithm can outline previously unnoticed patterns.

The power of teamwork

By leveraging both supervised and unsupervised machine-learning algorithms, you can make decisions based on previously unfathomable scales of data. While they cannot necessarily be used to substitute one another, they can be used to create a perpetually improving cycle. Using unsupervised models to extract meaningful information from large data sets and building new supervised models to further hone your data creates more opportunities than ever before.

Supervised Machine Learning

Classification vs. Regression

Unsupervised Machine Learning

Clustering vs. Association

The power of teamwork

Share this: