Our CEO , Ohad Samet, will be part of a panel discussing Artificial Intelligence Uses in Fintech. The panel will be held at 2:15pm Eastern on Tuesday, 3/7.
Find out how Upwork partners with TrueAccord to resolve customer issues and retain lost relationships. Click here to read the case study
Ever wonder what the numbers really look like for digital debt collection? As it turns out, pretty good. Click here to read more.
In a recent American Banker article, our team is saying: the regulatory discussion around phone calls in debt collection is rapidly becoming irrelevant for one very important reason: consumers don’t answer their phones.
One hundred years ago, a proposal took hold to build a bridge across the Golden Gate Strait at the mouth of San Francisco Bay. For more than a decade, engineer Joseph Strauss drummed up support for the bridge throughout Northern California. Before the first concrete was poured, his original double-cantilever design was replaced with Leon Moisseiff’s suspension design. Construction on the latter began in 1933, seventeen years after the bridge was conceived. Four years later, the first vehicles drove across the bridge. With the exception of a retrofit in 2012, there have been no structural changes since. 21 years in the making. Virtually no changes for the next 80.
Now, compare that with a modern Silicon Valley software startup. Year one: build an MVP. Year two: funding and product-market fit. Year three: profitability?…growth? Year four: make it or break it. Year five: if the company still exists at this point, you’re lucky.
Software in a startup environment is a drastically different engineering problem than building a bridge. So is the testing component of that problem. The bridge will endure 100+ years of heavy use and people’s lives depend upon it. One would be hard-pressed to over-test it. A software startup endeavor, however, is prone to monthly changes and usually has far milder consequences when it fails (although being in a regulated environment dealing with financial data raises the stakes a bit). Over-testing could burn through limited developer time and leave the company with an empty bank account and a fantastic product that no one wants.
I want to propose a framework to answer the question of how much testing is enough. I’ll outline 6 criteria then throw them at few examples. Skip to the charts at the end and come back if you are a highly visual person like me. In general, I am proposing that testing efforts be assessed on a spectrum according to the nature of the product under test. A bridge would be on one end of the spectrum whereas a prototype for a free app that makes funny noises would be on the other.
Cost of Failure
What is the material impact if this thing fails? If a bridge collapses, it’s life and death and a ton of money. Similarly, in a stock trading app, there are potentially big dollar and legal impacts when the numbers are wrong. On the contrary, an occasional failure in a dating app would annoy customers and maybe drive a few of them away, but wouldn’t be catastrophic. Bridges and stock trading have higher costs of failure and thus merit more rigorous testing.
Amount of Use
How often is this thing used and by how many people? In other words, if a failure happens in this component, how widespread will the impact be? A custom report that runs once a month gets far less use than the login page. If the latter fails, a great number of users will feel the impact immediately. Thus, I really want to make sure my login page (and similar) are well-tested.
How visible is the component? How easy will it be for customers to see that it’s broken? If it’s a backend component that only affects engineers, then customers may not know it’s broken until they start to see second-order side effects down the road. I have some leeway in how I go about fixing such a problem. In contrast, a payment processing form would have high visibility. If it breaks, it will give the impression that my app is broken big-time and will cause a fire drill until it is fixed. I want to increase testing with increased visibility.
This is a matter of return on effort. If the thing I’ve built is a run-once job, then any bugs will only show up once. On the other hand, a piece of code that is core to my application will last for years (and produce bugs for years). Longer lifespans give me greater returns on my testing efforts. If a little extra testing can avoid a single bug per month, then that adds up to a lot of time savings when the code lasts for years.
Difficulty of Repair
Back to the bridge example, imagine there is a radio transmitter at the top. If it breaks, a trained technician would have to make the climb (several hours) to the top, diagnose the problem, swap out some components (if he has them on hand), then make the climb down. Compare that with a small crack in the road. A worker spends 30 minutes squirting some tar into it at 3am. The point here is that things which are more difficult to repair will result in a higher cost if they break. Thus, it’s worth the larger investment of testing up front. It is also worth mentioning that this can be inversely related to visibility. That is, low visibility functionality can go unnoticed for long stretches and accumulate a huge pile of bad data.
Complex pieces of code tend to be easier to break than simple code. There are more edge cases and more paths to consider. In other words, greater complexity translates to greater probability of bugs. Hence, complex code merits greater testing.
Golden Gate Bridge
This is a large last-forever sort of project. If we get it wrong, we have a monumental (literally) problem to deal with. Test continually as much as possible.
|Cost of failure||5|
|Amount of use||5|
|Difficulty of repair||5|
Cat Dating App
Once the word gets out, all of the cats in the neighborhood will be swiping in a cat-like unpredictable manner on this hot new dating app. No words, just pictures. Expect it to go viral then die just as quickly. This thing will not last long and the failure modes are incredibly minor. Not worth much time spent on testing.
|Cost of failure||1|
|Amount of use||4|
|Difficulty of repair||1|
Enterprise App — AMEX Payment Processing Integration
Now, we get into the nuance. Consider an American Express payment processing integration i.e. the part of a larger app that sends data to AMEX and receives confirmations that the payments were successful. For this example, let’s assume that only 1% of your customers are AMEX users and they are all monthly auto-pay transactions. In other words, it’s a small group that will not see payment failures immediately. Even though this is a money-related feature, it will not merit as much testing as perhaps a VISA integration since it is lightly used with low visibility.
|Cost of failure||2|
|Amount of use||1|
|Difficulty of repair||2|
Enterprise App — De-duplication of Persons Based on Demographic Info
This is a real problem for TrueAccord. Our app imports “people” from various sources. Sometimes, we get two versions of the same “person”. It is to our advantage to know this and take action accordingly in other parts of our system. Person-matching can be quite complex given that two people can easily look very similar from a demographic standpoint (same name, city, zip code, etc.) yet truly be different people. If we get it wrong, we could inadvertently cross-pollinate private financial information. To top it all off, we don’t know what shape this will take long term and are in a pre-prototyping phase. In this case, I am dividing the testing assessment into two parts: prototyping phase and production phase.
The functionality will be in dry-run mode. Other parts of the app will not know it exists and will not take action based on its results. Complexity alone drives light testing here.
|Cost of failure||1|
|Amount of use||1|
|Difficulty of repair||1|
Once adopted, this would become rather core functionality with a wide-sweeping impact. If it is wrong, then other wrong data will be built upon it, creating a heavy cleanup burden and further customer impact. That being said, it will still have low visibility since it is an asynchronous backend process. Moderate to heavy testing is needed here.
|Cost of failure||4|
|Amount of use||3|
|Difficulty of repair||4|
Testing at TrueAccord
TrueAccord is three years old. We’ve found product-market fit and are on the road to success (fingers crossed). At this juncture, engineering time is a bit scarce, so we have to be wise in how it is allocated. That means we don’t have the luxury of 100% test coverage. Though we don’t formally apply the above heuristics, they are evident in the automated tests that exist in our system. For example, two of our larger test suites are PaymentPlanHelpersSpec and PaymentPlanScannerSpec at 1500 and 1200 lines respectively. As you might guess, these are related to handling customers’ payment plans. This is a fairly complex, highly visible, highly used core functionality for us. Contrast that with TwilioClientSpec at 30 lines. We use Twilio very lightly with low visibility and low cost of failures. Since we are only calling a single endpoint on their api, this is a very simple piece of code. In fact, the testing that exists is just for a helper function, not the api call itself.
I’d love to hear about other real world examples, and I’d love to hear if this way of thinking about testing would work for your software startup. Please leave us a comment with your point of view!
Our Head of Data Science, Richard Yeung, gave a talk at the Global Big Data conference. The talk focused on the first steps from heuristics to probabilistic model, when building a machine learning system based on expert knowledge. This feedback loop is what allowed our automated system to replace the old school call center-based model with a modernized, personalized approach.
You can find the slides here.
Many collection agencies and departments use commission-based compensation for their collection agents. This model is perceived as the only way to “make it” in collections: margins are slim to none, agencies themselves are compensated only for dollars collected, and commission-based compensation lets them hire cheap labor and have great performers rise to the top. In fact, this is a broken model, based on a flawed premise that only humans can collect from humans. It’s not only the Wells Fargo case that should alarm collectors and creditors who use them; it’s the conflict of interest that’s inherent to commissions in collections, and the legal and moral risks it introduces. With new technologies maturing and beating traditional call centers, it’s time to reconsider.
Though historically resistant to innovation, the collection industry feels pressured to make changes. Consumer preference, requirements from clients and mounting costs dictate increased use of technology – a welcome trend. Among those new tools, we are starting to see increasing adoption of emails for collections. Agencies have a small selection of vendors to blast out an email. Agencies with large call centers view this as a cost reduction exercise, and another way to get consumers to call in and talk to their agents.
Consumer behavior is changing. As more of us are glued to our mobile phones, emails, and social media accounts, it’s clear that the old ways of collecting debt are quickly becoming irrelevant. Still, the market doesn’t offer a multitude of collection solutions aimed at responding to the digital consumer. When we present our machine learning-based solution to prospective customers, we’re often asked about the difference between our solution and a self service portal. Although both solutions are digital, they cannot be less alike.
We’ve heard varying sentiments about the November 8th election results. Behind the scenes, many in the debt collection industry are excited and happy for them. They believe a Trump presidency will put an end to regulation in debt collection, and put the industry “back in business”. This is a short sighted view, focused on the wrong drivers of change for the industry. Debt collection and President Trump may not be the great allies some believe they will be.
Recently TrueAccord has grown to the size where our compliance stance requires the addition of photo ID badges. It’s a rite of passage all small-but-growing companies endure and ours is no different.
Since I have previous experience setting up badge systems and dealing with the printers, I volunteered to kickoff this process. I’ve evaluated pre-existing badge creation software in the past and found them all significantly lacking. In a previous environment, I wrote my own badge creation software which fit the needs at the time. The key phrase being “at the time“. For tech startups, it’s not unusual to go from onboarding one person every other week, to 10 people a week in a year or two. That means every manual step for onboarding someone will go from an “oh well, it’s just once every other week” to “we need to dedicate several hours of someone’s time every week to this process.” Typically that same growth period also happens to be when your operations (IT, Facilities, and Office Admin) organizations are the most short staffed and the least likely to have the free time to do that. “Where is this going?” and “How much work does this mean for me?”, you ask? Allow me to share with you how I automated our badge system – Photoshop included.
The CFPB put the full video from their debt collection field hearing on their YouTube channel. Participants were allowed 2 minutes to respond, and our CEO took that opportunity (watch here).
Thank you for the time today. My name is Ohad, I’m CEO of TrueAccord, a company that uses data and machine learning to fundamentally change the consumer experience in debt collection. We’ve been studying the new proposal since yesterday. We believe it is a big step towards improving consumer protection. Weeding out bad actors is going to level the playing field and create a race to the top that will benefit everyone.
When finalizing the rule, we think the CFPB should continue to encourage innovation in this space by providing clear and unambiguous guidelines on how to use new technology in the collections process. As a data-driven startup company, we have empirical evidence showing hat using new technologies in the collection space – text, email, social media, digitizing the dispute process – significantly improves consumer protection.
One, it improves protection measured by consumer feedback and a marked reduction in consumer complaints. Consumers understand and react to our personalized, targeted communication.
Two, it significantly reduces communication frequency; reduces call frequency by up to 95%, well under the limitations proposed in this new proposal, using channels that consumers feel are much less intrusive.
Finally, it does all of the above while meeting or exceeding traditional performance in liquidation. Nobody is going to go out of business by using new technology (and we’ll add here: versus continuing to insist on hardly-compliant calling tactics).
Again, the CFPB should considering supporting innovation by providing clear guidance for the use of technology. It will improve consumer protection and will help he industry as a whole. We look forward to cooperating with the CFPB and policymakers on this shared goal.