How TrueAccord Thinks About Experimentation

By on October 16th, 2017 in Data Science, Industry Insights, Machine Learning, Product and Technology, Testing

Experimentation in the movies sometimes gets a bad rap – you think of mad scientists blowing up labs or aliens arriving to probe unsuspecting humans or accidental AI monsters. It leaves the imagination to form an image of experimenters as cold-hearted, calculating and removed from reality. Real world experimentation is typically much more mundane, but the stereotypes often linger. This is unfortunate. The primary goal of experimentation (if you’re not a mad scientist) is: Does this thing work like I think it does? Does this feature deliver the results or benefits it is supposed to? If not why?  This makes it an extremely powerful tool for designing products that work and are actually good for customers.

At TrueAccord we believe that experimentation is an integral part of designing a product that fulfills our mission toreinvent the debt collections space by delivering great customer experiences that empower consumers to regain control of their financial health and help them to better manage their financial future.Whenever possible we launch experiments, not outright features. This strategy has three main and essential benefits:

  • Tests our instincts are right or our models are functional

  • Allows us to gain valuable insights into who our customers are and what they need

  • Mitigates potential negative effects

Test Our Instincts: How do you ensure your team is actually moving the product forward? Only investing energy in features and experiences that will create an effective and positive debt collection experience? Experimentation. The TrueAccord team is full of clever people with clever ideas, but we know it’s important not to found our product on untested hunches. By testing our instincts before taking another step in the same direction, we make sure we invest energy where it matters and wait to develop our knowledge base before proceeding in directions we clearly do not yet understand.

Customer Insights: Understanding why your product works is often more important than understanding if it works. The real benefits of an experimentation infrastructure are in its ability to provide diversified and descriptive data as well as the emphasis on stopping to take a look. At TrueAccord we know it’s essential to understand if we’re looking at the problem the right way and if not what we’ve missed: Do we understand our customers’ needs?

Example:

We launched a new “better” email format that we rolled out as a variation across a spread of existing email content. After a 3 month run, we asserted that it was indeed performing significantly better in terms of both average open and click rate. This was surprising. We hadn’t changed anything that should have affected opens.

New base template content saw an open rate increase of ~10%!        First Email: New base template and Second Email: Control

Upon further investigation, we realized that the new format unintentionally changed the email preview from displaying the start of our email content to consistently showing a formally-worded disclaimer! We then launched another experiment to ensure our findings were correct.

Mitigates Negative Effects: It’s easy in any industry to get blindsided by simple outcome metrics, especially in debt collection where the end objective is repayment. At TrueAccord we would consider it a failure if our product worked, but it worked for the wrong reasons – if our collections system converted, but didn’t provide a good experience for the consumer. Experimentation is our first wall of defense against treading down this path.

Example:

After researching existing accounts, we realized there was a need for more self-service tools in payment plan management. We developed a new payment plan account page and rolled out an experiment that automatically redirected some customers to this page any time they viewed the website while their plan was active.

We found that this did decrease payment plan breakage and increase liquidation, but because our system was set up to detect other types of impact we discovered it also increased outreach to our engagement team in the category of “Website Help”. Consumers were confused as to why they were not landing on the pages they expected upon navigating to our website. We had the right idea, but our implementation was not ideal for the consumer.

Experiment vs Control: % of inbound engagement team communication by category (total # of inbound communications was approx. the same) 

Experimentation is not foolproof, getting these benefits comes from having an infrastructure that allows you to assess if what you built is useful and, if designed correctly, understand why. Indeed, through experimentation, we’ve grown our product to function effectively over diverse areas of debt and over the past few months alone improved the number of people who complete their plans by almost 4%, with a few simple experiments. Every small change compounds, and at TrueAccord’s scale, this means many more people who pay without experiencing any disruption. !  Check back soon for how we designed an experimentation structure that allows us to reap the benefits described above and fuel our collections product forward.

Introducing Account Dashboard To Enable Flexibility and Control for Consumers

By on October 4th, 2017 in Debt Collection, Product and Technology, Testing, User Experience

We are very excited to announce the release of a new feature in the TrueAccord Collections Platform, the Account Dashboard.  The dashboard gives a consumer a comprehensive view of their individual account, enabling consumers to manage their account in real-time, including view balance, payment plans, disputes, and access to financial resources.  This is going to significantly improve consumer experience by giving consumers more control and flexibility to manage their account and their financial obligations in a flexible way, according to their needs.

Our product and data science teams are always looking at ways to improve user experience and engagement by A/B testing new ideas and collecting user feedback.  Our machine learning platform is powered by a decision engine that draws upon millions of previous interactions to deliver digital, personalized experiences for each consumer. Sometimes, the result is a change in contact strategy for a specific set of accounts, but often we also see an impact on the way we design our user experience. This is one of these times.

A time for shifting paradigms

When we started TrueAccord, we were focused on creating a variety of contact strategies and the flexibility to deliver personalized consumer experiences and gather data that we could learn from and make actionable over time. So, we created a wide variety of offers and developed many landing pages with different value propositions, each of which was promoting a particular “offer”, as well as ways to attract consumers to look at those offers and act on them.  While it was simple for our team to create many pages and A/B test offers, we began to realize we were providing an “e-commerce” experience to consumers.

The “e-commerce” experience created a one-way relationship between TrueAccord and the consumer that responded in a limited way to changing consumer habits such as the use of digital technology to self-serve and a desire for individualized products and services. As consumers started getting familiar with the TrueAccord brand and our algorithms became more accurate, it was obvious that an ongoing relationship model better serves the consumer and yields better results, because consumers appreciate the transparency and feeling of control over their financial health.  Counterintuitively, they were starting to trust us, the collection agency. We also started to get feedback from our engagement team that consumers wanted to take advantage of offers they had previously received via email, but now couldn’t easily access. They were starting to think about TrueAccord like any account-based financial services firm they interact with.

We had to take a step back and ask a few key questions:

  • What was the market demanding from us?
  • What was our vision for the TrueAccord consumer experience?
  • Was the experience we were providing reflective of our vision and market needs?  

We realized that our first goal for the product was achieved: consumers don’t think about us as “the bad guys who chase me”. They think of us as a service provider that helps them with a part of their financial lives, and they want more: more engagement, more context, more options.  Consumers wanted the ability to login and view an account page, make payments, adjustments, etc. Part of TrueAccord’s mission is to become a platform for empowering financial health, through digital, data-driven, personalized experiences. So, a redesign was in order.

Creating a consumer-focused collection experience

It sounds counter-intuitive when it shouldn’t. Debt collection is an activity focused on recouping money that consumers owe but didn’t pay, but it can just as much be focused on helping consumers pay the money they owe. In fact, most consumers want to pay but are unable to for a variety of reasons. Creating a consumer-focused experience means providing a seamless, targeted, customized interface that is easy to manage and works with their day to day needs.

The Dashboard allows TrueAccord to show a consumer their available offers and options, while consumers, through their actions and feedback, let TrueAccord know what is or isn’t useful or helpful. It is truly a big step up in realizing our original vision for the product: introduce a system that puts consumers at the helm, in control of their lives and finances, and on the path to financial health.

Mobile

In some cases, more than 70% of traffic to TrueAccord’s web app is from mobile devices. We needed to make sure our new interface is easily accessible and navigable via mobile devices. The new dashboard interface is better optimized for mobile to meet consumer preferences. Consumers can access their account information at any time, from anywhere, giving them a reliable way to stay up to date with their account and to contact us if they have any questions or concerns.

Payment Plans

One of TrueAccord’s most popular payment options is our payment plans. 84% of consumers with debt balances over $300 choose to pay via a payment plan. Unfortunately, a large number of consumers set up plans but drop off before completely paying off their debt. Sometimes it’s because the payment plan amounts are too high, or dates don’t correspond well with the consumers pay days when they have money to pay. By developing a relationship with the consumer, TrueAccord is able to mitigate difficulties and provide solutions to help them get back on track.

Our goal is to be a platform for financial health that empowers consumers to get out of debt by giving them the control and flexibility of paying off their debt in a way that works for them.  This feature is a huge step in that direction.

How Much Testing is Enough Testing?

By on February 2nd, 2017 in Engineering and Data, Product and Technology, Testing
TrueAccord Blog

Ggb by night


One hundred years ago, a proposal took hold to build a bridge across the Golden Gate Strait at the mouth of San Francisco Bay.  For more than a decade, engineer Joseph Strauss drummed up support for the bridge throughout Northern California.  Before the first concrete was poured, his original double-cantilever design was replaced with Leon Moisseiff’s suspension design.  Construction on the latter began in 1933, seventeen years after the bridge was conceived.  Four years later, the first vehicles drove across the bridge.  With the exception of a retrofit in 2012, there have been no structural changes since.  21 years in the making.  Virtually no changes for the next 80.

Now, compare that with a modern Silicon Valley software startup.  Year one: build an MVP.  Year two: funding and product-market fit.  Year three: profitability?…growth? Year four: make it or break it.  Year five: if the company still exists at this point, you’re lucky.

Software in a startup environment is a drastically different engineering problem than building a bridge.  So is the testing component of that problem.  The bridge will endure 100+ years of heavy use and people’s lives depend upon it.  One would be hard-pressed to over-test it.  A software startup endeavor, however, is prone to monthly changes and usually has far milder consequences when it fails (although being in a regulated environment dealing with financial data raises the stakes a bit).  Over-testing could burn through limited developer time and leave the company with an empty bank account and a fantastic product that no one wants.

I want to propose a framework to answer the question of how much testing is enough.  I’ll outline 6 criteria then throw them at few examples.  Skip to the charts at the end and come back if you are a highly visual person like me.  In general, I am proposing that testing efforts be assessed on a spectrum according to the nature of the product under test.  A bridge would be on one end of the spectrum whereas a prototype for a free app that makes funny noises would be on the other.

Assessment Criteria

Cost of Failure

What is the material impact if this thing fails?  If a bridge collapses, it’s life and death and a ton of money.  Similarly, in a stock trading app, there are potentially big dollar and legal impacts when the numbers are wrong.  On the contrary, an occasional failure in a dating app would annoy customers and maybe drive a few of them away, but wouldn’t be catastrophic. Bridges and stock trading have higher costs of failure and thus merit more rigorous testing.

Amount of Use

How often is this thing used and by how many people?  In other words, if a failure happens in this component, how widespread will the impact be?  A custom report that runs once a month gets far less use than the login page.  If the latter fails, a great number of users will feel the impact immediately.  Thus, I really want to make sure my login page (and similar) are well-tested.

Visibility

How visible is the component?  How easy will it be for customers to see that it’s broken?  If it’s a backend component that only affects engineers, then customers may not know it’s broken until they start to see second-order side effects down the road.  I have some leeway in how I go about fixing such a problem.  In contrast, a payment processing form would have high visibility.  If it breaks, it will give the impression that my app is broken big-time and will cause a fire drill until it is fixed.  I want to increase testing with increased visibility.

Lifespan

This is a matter of return on effort.  If the thing I’ve built is a run-once job, then any bugs will only show up once.  On the other hand, a piece of code that is core to my application will last for years (and produce bugs for years).  Longer lifespans give me greater returns on my testing efforts.  If a little extra testing can avoid a single bug per month, then that adds up to a lot of time savings when the code lasts for years.

Difficulty of Repair

Back to the bridge example, imagine there is a radio transmitter at the top.  If it breaks, a trained technician would have to make the climb (several hours) to the top, diagnose the problem, swap out some components (if he has them on hand), then make the climb down.  Compare that with a small crack in the road.  A worker spends 30 minutes squirting some tar into it at 3am.  The point here is that things which are more difficult to repair will result in a higher cost if they break.  Thus, it’s worth the larger investment of testing up front.  It is also worth mentioning that this can be inversely related to visibility.  That is, low visibility functionality can go unnoticed for long stretches and accumulate a huge pile of bad data.

Complexity

Complex pieces of code tend to be easier to break than simple code.  There are more edge cases and more paths to consider.  In other words, greater complexity translates to greater probability of bugs.  Hence, complex code merits greater testing.

Examples

Golden Gate Bridge

This is a large last-forever sort of project.  If we get it wrong, we have a monumental (literally) problem to deal with.  Test continually as much as possible.

Criterion Score
Cost of failure 5
Amount of use 5
Visibility 5
Lifespan 5
Difficulty of repair 5
Complexity 4

Cat Dating App

Once the word gets out, all of the cats in the neighborhood will be swiping in a cat-like unpredictable manner on this hot new dating app.  No words, just pictures.  Expect it to go viral then die just as quickly.  This thing will not last long and the failure modes are incredibly minor.  Not worth much time spent on testing.

Criterion Score
Cost of failure 1
Amount of use 4
Visibility 4
Lifespan 1
Difficulty of repair 1
Complexity 1

Enterprise App — AMEX Payment Processing Integration

Now, we get into the nuance.  Consider an American Express payment processing integration i.e. the part of a larger app that sends data to AMEX and receives confirmations that the payments were successful.  For this example, let’s assume that only 1% of your customers are AMEX users and they are all monthly auto-pay transactions.  In other words, it’s a small group that will not see payment failures immediately.  Even though this is a money-related feature, it will not merit as much testing as perhaps a VISA integration since it is lightly used with low visibility.

Criterion Score
Cost of failure 2
Amount of use 1
Visibility 1
Lifespan 5
Difficulty of repair 2
Complexity 2

Enterprise App — De-duplication of Persons Based on Demographic Info

This is a real problem for TrueAccord.  Our app imports “people” from various sources.  Sometimes, we get two versions of the same “person”.  It is to our advantage to know this and take action accordingly in other parts of our system.  Person-matching can be quite complex given that two people can easily look very similar from a demographic standpoint (same name, city, zip code, etc.) yet truly be different people.  If we get it wrong, we could inadvertently cross-pollinate private financial information.  To top it all off, we don’t know what shape this will take long term and are in a pre-prototyping phase. In this case, I am dividing the testing assessment into two parts: prototyping phase and production phase.

Prototyping

The functionality will be in dry-run mode.  Other parts of the app will not know it exists and will not take action based on its results.  Complexity alone drives light testing here.

Criterion Score
Cost of failure 1
Amount of use 1
Visibility 1
Lifespan 1
Difficulty of repair 1
Complexity 4

Production

Once adopted, this would become rather core functionality with a wide-sweeping impact.  If it is wrong, then other wrong data will be built upon it, creating a heavy cleanup burden and further customer impact.  That being said, it will still have low visibility since it is an asynchronous backend process.  Moderate to heavy testing is needed here.

Criterion Score
Cost of failure 4
Amount of use 3
Visibility 1
Lifespan 3
Difficulty of repair 4
Complexity 4

Testing at TrueAccord

TrueAccord is three years old.  We’ve found product-market fit and are on the road to success (fingers crossed).  At this juncture, engineering time is a bit scarce, so we have to be wise in how it is allocated.  That means we don’t have the luxury of 100% test coverage.  Though we don’t formally apply the above heuristics, they are evident in the automated tests that exist in our system.  For example, two of our larger test suites are PaymentPlanHelpersSpec and PaymentPlanScannerSpec at 1500 and 1200 lines respectively.  As you might guess, these are related to handling customers’ payment plans.  This is a fairly complex, highly visible, highly used core functionality for us.  Contrast that with TwilioClientSpec at 30 lines.  We use Twilio very lightly with low visibility and low cost of failures.  Since we are only calling a single endpoint on their api, this is a very simple piece of code.  In fact, the testing that exists is just for a helper function, not the api call itself.

I’d love to hear about other real world examples, and I’d love to hear if this way of thinking about testing would work for your software startup.  Please leave us a comment with your point of view!