Multi-armed bandit models and machine learning

By on February 19th, 2020 in Machine Learning

The term “multi-armed bandit” in machine learning comes from a problem in the world of probability theory. In a multi-armed bandit problem, you have a limited amount of resources to spend and must maximize your gains. You can divide those resources across multiple pathways or channels, you do not know the outcome of each path, but you may learn more about which is performing better over time.

The name is drawn from the one-armed bandit—slot machines—and comes from the idea that a gambler will attempt to maximize their gains by either trying different slot machines or staying where they are.

How do multi-armed bandits fit into machine learning?

Applying this hypothetical problem to a machine-learning model involves using an algorithm to process performance data over time and optimize for better gains as it learns what is successful and what is not. 

A commonly used model that follows this type of structure is an A/B/n test or split test where a single variable is isolated and directly compared. While A/B testing can be used for any number of experiments and tests, in a consumer-facing world, it is frequently used to determine the impact and effectiveness of a message.

You can test elements like the content of a message, the timing of its delivery, and any number of other elements in competition with an alternative, measure them, and compare the results. These tests are designed to determine the optimal version of a message, but once that perfect message is crafted and set, you’re stuck with your “perfect” message until you decide to test again.

Email deliverability plays a key role in effective digital communications. Check out our tips for building a scalable email infrastructure.

Anyone that works directly with customers or clients knows that there is no such thing as a perfect, one-size-fits-all solution. Message A, when pitted against Message B may perform better overall, but there is someone in your audience that may still prefer Message B.

Testing different facets of your communication in context with specific subsets of your audience can lead to higher engagement and more dynamic outreach. Figure 1 below outlines how a multi-armed bandit approach can optimize for the right content at the right time for the right audience rather than committing to a single option.

Rather than entirely discarding Message A, the bandit algorithm recognizes that roughly 10% of people still prefer it to other options. Using this more fluid model is also more efficient because you don’t have to wait for a clear winner to emerge, and as you gather more relevant data, they become more potent.

Multi-armed bandits and digital debt collection

Collections continues to expand its digital footprint, and combining more in-depth data tracking with an omni-channel communication strategy, teams can clearly understand what’s working and what isn’t. Adapting a bandit algorithm to machine learning-powered digital debt collection provides endless opportunity to craft a better consumer experience. 

Following from Figure 1, digital collections strategies can determine which messaging is right for which consumer. Sorting this data in context can mean distinguishing groups based on the size or the age of the debt and determining which message is the most appropriate. In a fully connected omni-channel strategy, the bandit can take a step back and determine which channel is the most effective for each account and then determine messaging.

These decisions take time and thousands upon thousands of data points to get “right,” but the wonder of a contextual multi-armed bandit algorithm is that it doesn’t stop learning after making the right choice. It makes the right choice, at the right time, for the right people, and you can reach your consumers the way they want to be reached.

TrueAccord is optimizing how our multi-armed bandit algorithms create the ideal consumer experience. Come learn more about how we collect better!