Machine Learning Based Liquidation Looks Different – and That’s a Good Thing

By on June 26th, 2017 in Industry Insights, Machine Learning, Product and Technology
TrueAccord Blog

TrueAccord serves major issuers, debt buyers and lenders across the US. We compete with traditional collection agencies and beat them: TrueAccord collects more than 1.5 times the competition in a typical 90 day placement period. We use a machine learning based system, HeartBeat, that replaces the traditional call-heavy model with digital first communications that compliment consumer behavior and mode of communication, but that makes our liquidation curves look different from traditional agencies.

Traditional liquidation curves

Traditional liquidation curves typically shoot up in the first 30-45 days, followed by a plateau around 60-80 day, with a possible bump towards the end of the placement window. This pattern is driven by several factors.

Routine: agents receive fresh accounts and are eager to call them. They fire up dialers and quickly reach consumers who can either pay or be lightly pressured to pay. After a few weeks of calls, agents are tired of calling the same consumers. They heard what they think are excuses, have driven all the easy payments they could drive, and are ready for new accounts. “Old” accounts, as old as 30 days, get a worse treatment. Collection managers know this, and try to trick collectors into thinking they got fresh accounts by pulling the accounts out from the system and re-entering them. This rarely works. Collectors lose focus and with it, performance.

Net present value: settlements are better than payment plans for collectors – they mean more money now, versus a payment plan that may fail, and require reminders and additional work by the collector. Collectors opt for more settlements earlier, if they can get the consumer on the line. Under the pressure of a call, the consumer may commit to a payment plan. In this case the collector prefers as high a monthly payment as possible, since they assume the payment plan will fail early. The consumers, struggling with irregular cash flow and a large payment they shouldn’t have committed to, fail payment plans at a staggering rate: as much as 50% of payment plans fail.

Remorse: consumers who agree to settlements or plans often feel remorse after getting off the call, and tend to charge back on payments they made. Chargeback rates in the debt collection industry are so high (rates as high as 2% are not rare), that most payment providers won’t work with collection companies.

The initial bump in liquidation is often enough to beat other phone based agencies. Since all agencies use the same methods, a slight advantage in selecting the right accounts to call first can get an agency ahead of its unsophisticated peers.

The TrueAccord liquidation curve

In contrast to the traditional curve, TrueAccord’s liquidation curve is somewhat linear. It often starts lower than the traditional agency, but continues to rise through the placement period until it crosses and exceeds its competitors. That inflection point can happen as late as day 80 (before the algorithms have been tuned, early in a pilot) and as early as day 15 (once the algorithms have learned how to handle a new product). The difference is driven by several factors.

Data driven treatment at scale: TrueAccord’s system is machine learning based and digital first. Since it starts with an email, it can initiate contact with all consumers easily, without having to call them often – and consumers are much more likely to respond to digital communications than to a phone call: while Right Party Contact rates often hover around 4-5%, email open rates on TrueAccord’s platform reach 65-70% and click through rates reach 30-35%. Once it sends its first email, it uses real time tracking of consumer responses to tailor its next steps. The system relies on hundreds of millions of historic contact attempts to optimize its contact strategy. If the consumer doesn’t reply, the system can automatically switch between channels (from email to text, call, letter, and so on) to reach the consumer. It also uses data to figure out what time of the day to contact the consumer that will yield the best response rates, call centers are limited to just making phone calls, which often consumers ignore because they are busy or just don’t pick up calls from unknown numbers. Since a machine doesn’t get bored, it continues contact attempts (3 a week on average) until it is told to stop. Targeted, consistent communications at scale mean that more consumers will interact with our system compared to a call center.

Optimizing for liquidation: a data driven system can use historical data to understand what best fits consumer needs and leads to better liquidation. It doesn’t need to push for early settlements because its automation lets it serve each consumer according to their needs – making custom tailored plans viable. Consumers get easier payment terms that fit their needs, and end up paying more. We convinced several of our clients to move from a default payment plan length of 6 months to 12 months. Contrary to call center based intuition, these longer plans get more consumers to sign up and don’t cannibalize settlements, in turn leading to an increase in liquidation. The machine learning system can service these plans at scale and reduce failure rates: TrueAccord payment plans complete as much as 89% of the time (as low as 11% breakage). By the time payment plans for traditional collectors fail their second payment, TrueAccord’s liquidation rates start soaring.

Best in class user experience: consumers don’t like phone calls or letters. They prefer 24/7, personalized, easy to use services – and collections aren’t any different. Using our system they can customize and sign up for settlements, payment plans, or ask for debt verification. Having access to their account information, and a sense of control over payment options, consumers don’t feel pressured or remorseful after paying. TrueAccord’s chargeback rates are next to not existent.

Bottom line

Machine learning based debt collection is different in many ways that benefit creditors and consumers. Our liquidation curve tells the story of how our system behaves differently than call center based collections – serving consumers at scale, using their preferred communication channel, and while tailoring payment solutions that work for them.

Personalized Digital Experiences Drive Engagement and Liquidity for Consumers

By on June 15th, 2017 in Industry Insights
TrueAccord Blog

Debt collection has existed for as long as consumers have been taking loans. For the past few decades, collectors have been building call center businesses – hundreds and thousands of calling agents, using automated dialers to contact indebted consumers, compensated with commission once they reach their collection goals. Consumers are often harassed by overzealous collectors looking to meet their goals, calling as much as 6 times per day. It’s a stressful environment focused on one thing – get the money or get out.

Continue reading “Personalized Digital Experiences Drive Engagement and Liquidity for Consumers”

Fintech Companies Are Learning to Work with Regulators

By on April 24th, 2017 in Compliance, Industry Insights
TrueAccord Blog

This article, written by our In House Counsel Adam Gottlieb, first appeared in the RMA Insights Magazine

The word “startup” conjures images of stereotypical open offices, complete with ping pong tables, standing desks, and people in hoodies feverishly hammering at keyboards. Startups are often associated with high risk, scrappiness, and the ability to break things and move fast–all a stark contrast to the bureaucratic and highly-regulated environment that most debt buyers and collectors operate in. Yet, as startups begin venturing into the area of financial technology, they have had to adjust to new operating principles and new stakeholders, with the government chief among them.

Continue reading “Fintech Companies Are Learning to Work with Regulators”

How Tax Season Affects Debt Collection – and TrueAccord

By on March 29th, 2017 in Industry Insights
TrueAccord Blog

By Roger Lai, TrueAccord’s Head of Analytics.

Tax Season in Debt Collection

Tax season is to debt collection as holiday season is to retail. According to the National Retail Federation, of the 66% of consumers who are expecting a tax refund this year, 35.5% plan to spend their refund on paying down debt. For this reason, mid-February through May is considered the most productive time of the year for debt collection by many in the industry.

Continue reading “How Tax Season Affects Debt Collection – and TrueAccord”

Live from LendIt: TrueAccord on AI in FinTech

By on March 20th, 2017 in Industry Insights
TrueAccord Blog

In case you missed it, our CEO Ohad Samet spoke in a panel at the LendIt Conference about the use of artificial intelligence in FinTech.

Joined by industry leaders in a propelling talk, this video is not to be missed.

Continue reading “Live from LendIt: TrueAccord on AI in FinTech”

Using phone in a digital world. A Data Science story.

By on March 16th, 2017 in Data Science, Debt Collection, Machine Learning, Product and Technology
TrueAccord Blog

Contributors: Vladimir Iglovikov, Sophie Benbenek, and Richard Yeung

It is Wednesday afternoon and the Data Science team at TrueAccord is arguing vociferously. The white board is covered in unintelligible hand writing and fancy looking diagrams. We’re in the middle of a heated debate about something the collections industry has had a fairly developed playbook on for decades: how to use the phone for collections.

Why are we so passionately discussing something so basic? As it turns out, phone is a deceptively deep topic when you are re-inventing recoveries and placing phone in the context of a multi-channel strategy.


 

Solving Attribution of Impact

The complexity of phone within a multi-channel strategy is revealed when you ask a simple question: “What was the impact of this phone call to Bob?”

In a world with only one channel, this question is easy. We call a thousand people and measure what percentage of them pay. But in a multi-channel setting where these people are also getting emails, SMS and letters, there is an attribution problem. If Bob pays after the phone call, we do not know if he would have paid without the phone call.

To complicate matters further, our experiments have shown that phone has two components of impact:

  1. The direct effect — the payments that happen on the call.
  2. The halo effect — the remaining impact of phone; for example seeing a missed call from us and going back to an email from us to click and pay.

To solve the attribution problem and capture both components of impact, we define the concept of incremental benefit as:


 

Intuitively, the incremental benefit of a phone call is the additional expected value from that customer due to the phone call. For example, assume Bob has a 5% chance of paying his $100 debt. If we know that by calling him, the probability of him paying increases to 7%, then the incremental benefit is $2 (100 * (0.07 – 0.05)).

 

How we calculate incremental benefit

Consider the incremental benefit equation in the last section. It requires us to predict the probability of Bob paying for each scenario where we call him and do not call him.

Hence we created models that predict the probability of a customer paying. These models take as inputs everything we know about the customer, including:

  • Debt features: debt amount, days since charge-off, client, prior agencies worked, etc
  • Behavioral features: entire email history, entire pageview history, interactions with agents, phone history, etc
  • Temporal features: time of the day, day of the week, day of the month, etc

The output of the model is the probability of payment by the customer given all of this information. We then have the same model output two predictions: probability of payment with the current event history, and probability of payment if we add one more outbound phone call to the event history.

Back to our example of Bob, the model would output the probabilities of 7% and 5% chance of paying with and without an additional phone call respectively.

This diagram is a simplification that omits many variables and the actual architecture of our models

 

Optimal Call Allocation

The last step of the problem is choosing who to call, and when. The topic of timing optimization deserves its own write-up, so we will close with discussing who we call.

Without loss of generality, assume that we would only ever call a customer once. The diagram below has the percentage of customers called on the x-axis. And the y-axis is in dollars with 2 curves:

  • Incremental Benefit — this curve shows the marginal incremental benefit of calling the customer with the next highest IB
  • Avg cost — this horizontal curve shows the average cost of an outbound call

 

There are two very interesting points to discuss:

  • Profit max — calling everyone to the left of the intersection of incremental benefit and avg cost is the allocation that maximizes profit. Every one of these calls brings in more revenue than cost.
  • Conversion max — notice that incremental benefit dips below zero. This is especially true when you remove the assumption that we only call each customer once. The point that maximizes conversion for the client is to call everyone to the left of where incremental benefit intersects with zero.

Our default strategy is to call all customers to the left of the profit maximizing intercept. Interestingly, an intuitive investigation of the types of customers selected reveals customers at two extremes: we end up calling both very high value customers that have shown a lot of intent to pay (e.g. dropped off from signup after selecting a payment plan) and customers where email has been ineffectual (e.g. keeps opening emails with no clicks or no email opens.)

 

Conclusion

The world has become increasingly digital, and a multi-channel strategy is the right response. Bringing the traditional tool of phone, as just one channel within this strategy, forced us to rethink a lot of assumptions and see where the problem led us. We began by replacing the traditional “propensity to pay” phone metric with incremental benefit, found ways to predict this value, and implemented a phone allocation strategy that maximizes profits for the business.

Live from LendIt: TrueAccord on Breaking Banks

By on March 13th, 2017 in Industry Insights
TrueAccord Blog

This week, Breaking Banks host Brett King chatted about the ideas of debt rehabilitation with Ohad Samet, CEO of  TrueAccord, and how machine learning and AI can help people fix their credit situations.

Continue reading “Live from LendIt: TrueAccord on Breaking Banks”

Hear our CEO talk about AI in Fintech at LendIt

By on February 28th, 2017 in Industry Insights, Machine Learning
TrueAccord Blog

Our CEO , Ohad Samet, will be part of a panel discussing Artificial Intelligence Uses in Fintech. The panel will be held at 2:15pm Eastern on Tuesday, 3/7.

Continue reading “Hear our CEO talk about AI in Fintech at LendIt”

On American Banker: Real issue for debt collectors is the irrelevance of telephones

By on February 10th, 2017 in Compliance, Industry Insights
TrueAccord Blog

In a recent American Banker article, our team is saying: the regulatory discussion around phone calls in debt collection is rapidly becoming irrelevant for one very important reason: consumers don’t answer their phones.

Continue reading “On American Banker: Real issue for debt collectors is the irrelevance of telephones”

How Much Testing is Enough Testing?

By on February 2nd, 2017 in Engineering and Data, Product and Technology, Testing
TrueAccord Blog

Ggb by night


One hundred years ago, a proposal took hold to build a bridge across the Golden Gate Strait at the mouth of San Francisco Bay.  For more than a decade, engineer Joseph Strauss drummed up support for the bridge throughout Northern California.  Before the first concrete was poured, his original double-cantilever design was replaced with Leon Moisseiff’s suspension design.  Construction on the latter began in 1933, seventeen years after the bridge was conceived.  Four years later, the first vehicles drove across the bridge.  With the exception of a retrofit in 2012, there have been no structural changes since.  21 years in the making.  Virtually no changes for the next 80.

Now, compare that with a modern Silicon Valley software startup.  Year one: build an MVP.  Year two: funding and product-market fit.  Year three: profitability?…growth? Year four: make it or break it.  Year five: if the company still exists at this point, you’re lucky.

Software in a startup environment is a drastically different engineering problem than building a bridge.  So is the testing component of that problem.  The bridge will endure 100+ years of heavy use and people’s lives depend upon it.  One would be hard-pressed to over-test it.  A software startup endeavor, however, is prone to monthly changes and usually has far milder consequences when it fails (although being in a regulated environment dealing with financial data raises the stakes a bit).  Over-testing could burn through limited developer time and leave the company with an empty bank account and a fantastic product that no one wants.

I want to propose a framework to answer the question of how much testing is enough.  I’ll outline 6 criteria then throw them at few examples.  Skip to the charts at the end and come back if you are a highly visual person like me.  In general, I am proposing that testing efforts be assessed on a spectrum according to the nature of the product under test.  A bridge would be on one end of the spectrum whereas a prototype for a free app that makes funny noises would be on the other.

Assessment Criteria

Cost of Failure

What is the material impact if this thing fails?  If a bridge collapses, it’s life and death and a ton of money.  Similarly, in a stock trading app, there are potentially big dollar and legal impacts when the numbers are wrong.  On the contrary, an occasional failure in a dating app would annoy customers and maybe drive a few of them away, but wouldn’t be catastrophic. Bridges and stock trading have higher costs of failure and thus merit more rigorous testing.

Amount of Use

How often is this thing used and by how many people?  In other words, if a failure happens in this component, how widespread will the impact be?  A custom report that runs once a month gets far less use than the login page.  If the latter fails, a great number of users will feel the impact immediately.  Thus, I really want to make sure my login page (and similar) are well-tested.

Visibility

How visible is the component?  How easy will it be for customers to see that it’s broken?  If it’s a backend component that only affects engineers, then customers may not know it’s broken until they start to see second-order side effects down the road.  I have some leeway in how I go about fixing such a problem.  In contrast, a payment processing form would have high visibility.  If it breaks, it will give the impression that my app is broken big-time and will cause a fire drill until it is fixed.  I want to increase testing with increased visibility.

Lifespan

This is a matter of return on effort.  If the thing I’ve built is a run-once job, then any bugs will only show up once.  On the other hand, a piece of code that is core to my application will last for years (and produce bugs for years).  Longer lifespans give me greater returns on my testing efforts.  If a little extra testing can avoid a single bug per month, then that adds up to a lot of time savings when the code lasts for years.

Difficulty of Repair

Back to the bridge example, imagine there is a radio transmitter at the top.  If it breaks, a trained technician would have to make the climb (several hours) to the top, diagnose the problem, swap out some components (if he has them on hand), then make the climb down.  Compare that with a small crack in the road.  A worker spends 30 minutes squirting some tar into it at 3am.  The point here is that things which are more difficult to repair will result in a higher cost if they break.  Thus, it’s worth the larger investment of testing up front.  It is also worth mentioning that this can be inversely related to visibility.  That is, low visibility functionality can go unnoticed for long stretches and accumulate a huge pile of bad data.

Complexity

Complex pieces of code tend to be easier to break than simple code.  There are more edge cases and more paths to consider.  In other words, greater complexity translates to greater probability of bugs.  Hence, complex code merits greater testing.

Examples

Golden Gate Bridge

This is a large last-forever sort of project.  If we get it wrong, we have a monumental (literally) problem to deal with.  Test continually as much as possible.

Criterion Score
Cost of failure 5
Amount of use 5
Visibility 5
Lifespan 5
Difficulty of repair 5
Complexity 4

Cat Dating App

Once the word gets out, all of the cats in the neighborhood will be swiping in a cat-like unpredictable manner on this hot new dating app.  No words, just pictures.  Expect it to go viral then die just as quickly.  This thing will not last long and the failure modes are incredibly minor.  Not worth much time spent on testing.

Criterion Score
Cost of failure 1
Amount of use 4
Visibility 4
Lifespan 1
Difficulty of repair 1
Complexity 1

Enterprise App — AMEX Payment Processing Integration

Now, we get into the nuance.  Consider an American Express payment processing integration i.e. the part of a larger app that sends data to AMEX and receives confirmations that the payments were successful.  For this example, let’s assume that only 1% of your customers are AMEX users and they are all monthly auto-pay transactions.  In other words, it’s a small group that will not see payment failures immediately.  Even though this is a money-related feature, it will not merit as much testing as perhaps a VISA integration since it is lightly used with low visibility.

Criterion Score
Cost of failure 2
Amount of use 1
Visibility 1
Lifespan 5
Difficulty of repair 2
Complexity 2

Enterprise App — De-duplication of Persons Based on Demographic Info

This is a real problem for TrueAccord.  Our app imports “people” from various sources.  Sometimes, we get two versions of the same “person”.  It is to our advantage to know this and take action accordingly in other parts of our system.  Person-matching can be quite complex given that two people can easily look very similar from a demographic standpoint (same name, city, zip code, etc.) yet truly be different people.  If we get it wrong, we could inadvertently cross-pollinate private financial information.  To top it all off, we don’t know what shape this will take long term and are in a pre-prototyping phase. In this case, I am dividing the testing assessment into two parts: prototyping phase and production phase.

Prototyping

The functionality will be in dry-run mode.  Other parts of the app will not know it exists and will not take action based on its results.  Complexity alone drives light testing here.

Criterion Score
Cost of failure 1
Amount of use 1
Visibility 1
Lifespan 1
Difficulty of repair 1
Complexity 4

Production

Once adopted, this would become rather core functionality with a wide-sweeping impact.  If it is wrong, then other wrong data will be built upon it, creating a heavy cleanup burden and further customer impact.  That being said, it will still have low visibility since it is an asynchronous backend process.  Moderate to heavy testing is needed here.

Criterion Score
Cost of failure 4
Amount of use 3
Visibility 1
Lifespan 3
Difficulty of repair 4
Complexity 4

Testing at TrueAccord

TrueAccord is three years old.  We’ve found product-market fit and are on the road to success (fingers crossed).  At this juncture, engineering time is a bit scarce, so we have to be wise in how it is allocated.  That means we don’t have the luxury of 100% test coverage.  Though we don’t formally apply the above heuristics, they are evident in the automated tests that exist in our system.  For example, two of our larger test suites are PaymentPlanHelpersSpec and PaymentPlanScannerSpec at 1500 and 1200 lines respectively.  As you might guess, these are related to handling customers’ payment plans.  This is a fairly complex, highly visible, highly used core functionality for us.  Contrast that with TwilioClientSpec at 30 lines.  We use Twilio very lightly with low visibility and low cost of failures.  Since we are only calling a single endpoint on their api, this is a very simple piece of code.  In fact, the testing that exists is just for a helper function, not the api call itself.

I’d love to hear about other real world examples, and I’d love to hear if this way of thinking about testing would work for your software startup.  Please leave us a comment with your point of view!