How TrueAccord Thinks About Experimentation

By on October 16th, 2017 in Data Science, Industry Insights, Machine Learning, Product and Technology, Testing

Experimentation in the movies sometimes gets a bad rap – you think of mad scientists blowing up labs or aliens arriving to probe unsuspecting humans or accidental AI monsters. It leaves the imagination to form an image of experimenters as cold-hearted, calculating and removed from reality. Real world experimentation is typically much more mundane, but the stereotypes often linger. This is unfortunate. The primary goal of experimentation (if you’re not a mad scientist) is: Does this thing work like I think it does? Does this feature deliver the results or benefits it is supposed to? If not why?  This makes it an extremely powerful tool for designing products that work and are actually good for customers.

At TrueAccord we believe that experimentation is an integral part of designing a product that fulfills our mission toreinvent the debt collections space by delivering great customer experiences that empower consumers to regain control of their financial health and help them to better manage their financial future.Whenever possible we launch experiments, not outright features. This strategy has three main and essential benefits:

  • Tests our instincts are right or our models are functional

  • Allows us to gain valuable insights into who our customers are and what they need

  • Mitigates potential negative effects

Test Our Instincts: How do you ensure your team is actually moving the product forward? Only investing energy in features and experiences that will create an effective and positive debt collection experience? Experimentation. The TrueAccord team is full of clever people with clever ideas, but we know it’s important not to found our product on untested hunches. By testing our instincts before taking another step in the same direction, we make sure we invest energy where it matters and wait to develop our knowledge base before proceeding in directions we clearly do not yet understand.

Customer Insights: Understanding why your product works is often more important than understanding if it works. The real benefits of an experimentation infrastructure are in its ability to provide diversified and descriptive data as well as the emphasis on stopping to take a look. At TrueAccord we know it’s essential to understand if we’re looking at the problem the right way and if not what we’ve missed: Do we understand our customers’ needs?

Example:

We launched a new “better” email format that we rolled out as a variation across a spread of existing email content. After a 3 month run, we asserted that it was indeed performing significantly better in terms of both average open and click rate. This was surprising. We hadn’t changed anything that should have affected opens.

New base template content saw an open rate increase of ~10%!        First Email: New base template and Second Email: Control

Upon further investigation, we realized that the new format unintentionally changed the email preview from displaying the start of our email content to consistently showing a formally-worded disclaimer! We then launched another experiment to ensure our findings were correct.

Mitigates Negative Effects: It’s easy in any industry to get blindsided by simple outcome metrics, especially in debt collection where the end objective is repayment. At TrueAccord we would consider it a failure if our product worked, but it worked for the wrong reasons – if our collections system converted, but didn’t provide a good experience for the consumer. Experimentation is our first wall of defense against treading down this path.

Example:

After researching existing accounts, we realized there was a need for more self-service tools in payment plan management. We developed a new payment plan account page and rolled out an experiment that automatically redirected some customers to this page any time they viewed the website while their plan was active.

We found that this did decrease payment plan breakage and increase liquidation, but because our system was set up to detect other types of impact we discovered it also increased outreach to our engagement team in the category of “Website Help”. Consumers were confused as to why they were not landing on the pages they expected upon navigating to our website. We had the right idea, but our implementation was not ideal for the consumer.

Experiment vs Control: % of inbound engagement team communication by category (total # of inbound communications was approx. the same) 

Experimentation is not foolproof, getting these benefits comes from having an infrastructure that allows you to assess if what you built is useful and, if designed correctly, understand why. Indeed, through experimentation, we’ve grown our product to function effectively over diverse areas of debt and over the past few months alone improved the number of people who complete their plans by almost 4%, with a few simple experiments. Every small change compounds, and at TrueAccord’s scale, this means many more people who pay without experiencing any disruption. !  Check back soon for how we designed an experimentation structure that allows us to reap the benefits described above and fuel our collections product forward.

Introducing Account Dashboard To Enable Flexibility and Control for Consumers

By on October 4th, 2017 in Debt Collection, Product and Technology, Testing, User Experience

We are very excited to announce the release of a new feature in the TrueAccord Collections Platform, the Account Dashboard.  The dashboard gives a consumer a comprehensive view of their individual account, enabling consumers to manage their account in real-time, including view balance, payment plans, disputes, and access to financial resources.  This is going to significantly improve consumer experience by giving consumers more control and flexibility to manage their account and their financial obligations in a flexible way, according to their needs.

Our product and data science teams are always looking at ways to improve user experience and engagement by A/B testing new ideas and collecting user feedback.  Our machine learning platform is powered by a decision engine that draws upon millions of previous interactions to deliver digital, personalized experiences for each consumer. Sometimes, the result is a change in contact strategy for a specific set of accounts, but often we also see an impact on the way we design our user experience. This is one of these times.

A time for shifting paradigms

When we started TrueAccord, we were focused on creating a variety of contact strategies and the flexibility to deliver personalized consumer experiences and gather data that we could learn from and make actionable over time. So, we created a wide variety of offers and developed many landing pages with different value propositions, each of which was promoting a particular “offer”, as well as ways to attract consumers to look at those offers and act on them.  While it was simple for our team to create many pages and A/B test offers, we began to realize we were providing an “e-commerce” experience to consumers.

The “e-commerce” experience created a one-way relationship between TrueAccord and the consumer that responded in a limited way to changing consumer habits such as the use of digital technology to self-serve and a desire for individualized products and services. As consumers started getting familiar with the TrueAccord brand and our algorithms became more accurate, it was obvious that an ongoing relationship model better serves the consumer and yields better results, because consumers appreciate the transparency and feeling of control over their financial health.  Counterintuitively, they were starting to trust us, the collection agency. We also started to get feedback from our engagement team that consumers wanted to take advantage of offers they had previously received via email, but now couldn’t easily access. They were starting to think about TrueAccord like any account-based financial services firm they interact with.

We had to take a step back and ask a few key questions:

  • What was the market demanding from us?
  • What was our vision for the TrueAccord consumer experience?
  • Was the experience we were providing reflective of our vision and market needs?  

We realized that our first goal for the product was achieved: consumers don’t think about us as “the bad guys who chase me”. They think of us as a service provider that helps them with a part of their financial lives, and they want more: more engagement, more context, more options.  Consumers wanted the ability to login and view an account page, make payments, adjustments, etc. Part of TrueAccord’s mission is to become a platform for empowering financial health, through digital, data-driven, personalized experiences. So, a redesign was in order.

Creating a consumer-focused collection experience

It sounds counter-intuitive when it shouldn’t. Debt collection is an activity focused on recouping money that consumers owe but didn’t pay, but it can just as much be focused on helping consumers pay the money they owe. In fact, most consumers want to pay but are unable to for a variety of reasons. Creating a consumer-focused experience means providing a seamless, targeted, customized interface that is easy to manage and works with their day to day needs.

The Dashboard allows TrueAccord to show a consumer their available offers and options, while consumers, through their actions and feedback, let TrueAccord know what is or isn’t useful or helpful. It is truly a big step up in realizing our original vision for the product: introduce a system that puts consumers at the helm, in control of their lives and finances, and on the path to financial health.

Mobile

In some cases, more than 70% of traffic to TrueAccord’s web app is from mobile devices. We needed to make sure our new interface is easily accessible and navigable via mobile devices. The new dashboard interface is better optimized for mobile to meet consumer preferences. Consumers can access their account information at any time, from anywhere, giving them a reliable way to stay up to date with their account and to contact us if they have any questions or concerns.

Payment Plans

One of TrueAccord’s most popular payment options is our payment plans. 84% of consumers with debt balances over $300 choose to pay via a payment plan. Unfortunately, a large number of consumers set up plans but drop off before completely paying off their debt. Sometimes it’s because the payment plan amounts are too high, or dates don’t correspond well with the consumers pay days when they have money to pay. By developing a relationship with the consumer, TrueAccord is able to mitigate difficulties and provide solutions to help them get back on track.

Our goal is to be a platform for financial health that empowers consumers to get out of debt by giving them the control and flexibility of paying off their debt in a way that works for them.  This feature is a huge step in that direction.

The Results Are In:TrueAccord Consumer Satisfaction Survey

By on July 17th, 2017 in Company News, Industry Insights, Product and Technology

Today, 80 million consumers are in debt. They are often not treated well by collectors, and subjected to harassment, intimidation and an overall bad user experience that does not encourage or empower resolution. According to a recent CFPB survey 1 in 4 consumers felt threatened by collectors, 3 in 4 consumers reported that a collector did not honor a request to cease contact, over ⅓ reported being contacted at inconvenient times, and 40% of consumers reported they were contacted 4+ times per week. These results are quite disheartening, and demonstrate that the traditional debt collection agencies have not adopted user centric practices and behaviors, nor have they integrated technology into the process to adapt to changing consumer needs. They are stuck making large volumes of phone calls to uninterested consumers who end up complaining.

When we set out to survey our consumers about their experience with TrueAccord we weren’t quite sure what to expect, or if they would even respond. On one hand, we believe our data driven, consumer centric, digital first experience is reinventing the debt collection process and will replace legacy agencies, and consumer will appreciate that. On the other, we are still talking about debt collection, and most likely a lot of these consumers have experienced multiple negative collections experience and have low expectations of the process. They aren’t likely to recommend a debt collector, and as we’ve seen above, are highly likely to have had a bad experience.

Overall satisfaction

What we found was both exciting and inspiring, 80% of respondents were satisfied with their experience with TrueAccord. It’s an unprecedented number in an industry that, for decades, only attracted negative attention. TrueAccord is building a product and brand focused on delivering great user experiences and helping consumers rebuild financial health, and consumers are reacting to that. Traditional agencies’ behaviors have been impacting liquidation, hurting brand reputation and causing a lot of compliance risk. Yet they haven’t changed their ways. We show that working differently is possible – and will yield better results.  

Tone

81% of consumers stated that the tone and personalized offers in our messages were appropriate for their individual needs. Our content is personalized, and tailored to empower and motivate consumers to want to pay off the debt, combined with the ability to offer a wide selection of custom payment plans. Consumers’ needs are served and they are treated like customers. Our clients understand that debt collection is part of a natural consumer life cycle;at one point or another, most of us will encounter debt collectors, but unfortunately traditional agencies lack the technology and best practices to deliver good user experiences, leaving consumers feeling frustrated, angry and wronged. This does not have to be the case.

User experience

80% of our users had an overall positive experience with TrueAccord and recognized TrueAccord as different and better than other agencies. A large proportion of the other 20% resolved their debt by disputing it, so even though they may not feel great about their experience, they were able to dispute and discharge a debt electronically and with minimum hassle. It’s exciting to see that consumers see our brand the way we see ourselves, as innovators focused on great user experiences. We believe helping people get out of debt has positive impact for everyone involved, even (and sometimes more so) if getting out of debt means it can’t be collected.

What consumers had to say:

You were easy to work with and the payment plan worked for me. Even when I had to make a small change, it was no problem. I’m glad to have the debt behind me. I appreciate the email correspondence as opposed to numerous phone calls.”

“They worked with me and I needed that.”

“It is always a pleasant experience dealing with True Accord.”

“Wish you could handle all my debts.”

“I love the fact that TrueAccord was kind and polite! I wanted to pay my debt but needed a plan that wouldn’t leave me over spent or struggling every month. TrueAccord was happy to accept the payment plan I requested. Thank you!”

“TrueAccord provided me a way to be true to my word.”

“The agents are all very friendly and accommodating. It doesn’t feel like you are dealing with a collection agency.”

“The best collection agency ever!”

 

Machine Learning Based Liquidation Looks Different – and That’s a Good Thing

By on June 26th, 2017 in Industry Insights, Machine Learning, Product and Technology
TrueAccord Blog

TrueAccord serves major issuers, debt buyers and lenders across the US. We compete with traditional collection agencies and beat them: TrueAccord collects more than 1.5 times the competition in a typical 90 day placement period. We use a machine learning based system, HeartBeat, that replaces the traditional call-heavy model with digital first communications that compliment consumer behavior and mode of communication, but that makes our liquidation curves look different from traditional agencies.

Traditional liquidation curves

Traditional liquidation curves typically shoot up in the first 30-45 days, followed by a plateau around 60-80 day, with a possible bump towards the end of the placement window. This pattern is driven by several factors.

Routine: agents receive fresh accounts and are eager to call them. They fire up dialers and quickly reach consumers who can either pay or be lightly pressured to pay. After a few weeks of calls, agents are tired of calling the same consumers. They heard what they think are excuses, have driven all the easy payments they could drive, and are ready for new accounts. “Old” accounts, as old as 30 days, get a worse treatment. Collection managers know this, and try to trick collectors into thinking they got fresh accounts by pulling the accounts out from the system and re-entering them. This rarely works. Collectors lose focus and with it, performance.

Net present value: settlements are better than payment plans for collectors – they mean more money now, versus a payment plan that may fail, and require reminders and additional work by the collector. Collectors opt for more settlements earlier, if they can get the consumer on the line. Under the pressure of a call, the consumer may commit to a payment plan. In this case the collector prefers as high a monthly payment as possible, since they assume the payment plan will fail early. The consumers, struggling with irregular cash flow and a large payment they shouldn’t have committed to, fail payment plans at a staggering rate: as much as 50% of payment plans fail.

Remorse: consumers who agree to settlements or plans often feel remorse after getting off the call, and tend to charge back on payments they made. Chargeback rates in the debt collection industry are so high (rates as high as 2% are not rare), that most payment providers won’t work with collection companies.

The initial bump in liquidation is often enough to beat other phone based agencies. Since all agencies use the same methods, a slight advantage in selecting the right accounts to call first can get an agency ahead of its unsophisticated peers.

The TrueAccord liquidation curve

In contrast to the traditional curve, TrueAccord’s liquidation curve is somewhat linear. It often starts lower than the traditional agency, but continues to rise through the placement period until it crosses and exceeds its competitors. That inflection point can happen as late as day 80 (before the algorithms have been tuned, early in a pilot) and as early as day 15 (once the algorithms have learned how to handle a new product). The difference is driven by several factors.

Data driven treatment at scale: TrueAccord’s system is machine learning based and digital first. Since it starts with an email, it can initiate contact with all consumers easily, without having to call them often – and consumers are much more likely to respond to digital communications than to a phone call: while Right Party Contact rates often hover around 4-5%, email open rates on TrueAccord’s platform reach 65-70% and click through rates reach 30-35%. Once it sends its first email, it uses real time tracking of consumer responses to tailor its next steps. The system relies on hundreds of millions of historic contact attempts to optimize its contact strategy. If the consumer doesn’t reply, the system can automatically switch between channels (from email to text, call, letter, and so on) to reach the consumer. It also uses data to figure out what time of the day to contact the consumer that will yield the best response rates, call centers are limited to just making phone calls, which often consumers ignore because they are busy or just don’t pick up calls from unknown numbers. Since a machine doesn’t get bored, it continues contact attempts (3 a week on average) until it is told to stop. Targeted, consistent communications at scale mean that more consumers will interact with our system compared to a call center.

Optimizing for liquidation: a data driven system can use historical data to understand what best fits consumer needs and leads to better liquidation. It doesn’t need to push for early settlements because its automation lets it serve each consumer according to their needs – making custom tailored plans viable. Consumers get easier payment terms that fit their needs, and end up paying more. We convinced several of our clients to move from a default payment plan length of 6 months to 12 months. Contrary to call center based intuition, these longer plans get more consumers to sign up and don’t cannibalize settlements, in turn leading to an increase in liquidation. The machine learning system can service these plans at scale and reduce failure rates: TrueAccord payment plans complete as much as 89% of the time (as low as 11% breakage). By the time payment plans for traditional collectors fail their second payment, TrueAccord’s liquidation rates start soaring.

Best in class user experience: consumers don’t like phone calls or letters. They prefer 24/7, personalized, easy to use services – and collections aren’t any different. Using our system they can customize and sign up for settlements, payment plans, or ask for debt verification. Having access to their account information, and a sense of control over payment options, consumers don’t feel pressured or remorseful after paying. TrueAccord’s chargeback rates are next to not existent.

Bottom line

Machine learning based debt collection is different in many ways that benefit creditors and consumers. Our liquidation curve tells the story of how our system behaves differently than call center based collections – serving consumers at scale, using their preferred communication channel, and while tailoring payment solutions that work for them.

Using phone in a digital world. A Data Science story.

By on March 16th, 2017 in Data Science, Debt Collection, Machine Learning, Product and Technology
TrueAccord Blog

Contributors: Vladimir Iglovikov, Sophie Benbenek, and Richard Yeung

It is Wednesday afternoon and the Data Science team at TrueAccord is arguing vociferously. The white board is covered in unintelligible hand writing and fancy looking diagrams. We’re in the middle of a heated debate about something the collections industry has had a fairly developed playbook on for decades: how to use the phone for collections.

Why are we so passionately discussing something so basic? As it turns out, phone is a deceptively deep topic when you are re-inventing recoveries and placing phone in the context of a multi-channel strategy.


 

Solving Attribution of Impact

The complexity of phone within a multi-channel strategy is revealed when you ask a simple question: “What was the impact of this phone call to Bob?”

In a world with only one channel, this question is easy. We call a thousand people and measure what percentage of them pay. But in a multi-channel setting where these people are also getting emails, SMS and letters, there is an attribution problem. If Bob pays after the phone call, we do not know if he would have paid without the phone call.

To complicate matters further, our experiments have shown that phone has two components of impact:

  1. The direct effect — the payments that happen on the call.
  2. The halo effect — the remaining impact of phone; for example seeing a missed call from us and going back to an email from us to click and pay.

To solve the attribution problem and capture both components of impact, we define the concept of incremental benefit as:


 

Intuitively, the incremental benefit of a phone call is the additional expected value from that customer due to the phone call. For example, assume Bob has a 5% chance of paying his $100 debt. If we know that by calling him, the probability of him paying increases to 7%, then the incremental benefit is $2 (100 * (0.07 – 0.05)).

 

How we calculate incremental benefit

Consider the incremental benefit equation in the last section. It requires us to predict the probability of Bob paying for each scenario where we call him and do not call him.

Hence we created models that predict the probability of a customer paying. These models take as inputs everything we know about the customer, including:

  • Debt features: debt amount, days since charge-off, client, prior agencies worked, etc
  • Behavioral features: entire email history, entire pageview history, interactions with agents, phone history, etc
  • Temporal features: time of the day, day of the week, day of the month, etc

The output of the model is the probability of payment by the customer given all of this information. We then have the same model output two predictions: probability of payment with the current event history, and probability of payment if we add one more outbound phone call to the event history.

Back to our example of Bob, the model would output the probabilities of 7% and 5% chance of paying with and without an additional phone call respectively.

This diagram is a simplification that omits many variables and the actual architecture of our models

 

Optimal Call Allocation

The last step of the problem is choosing who to call, and when. The topic of timing optimization deserves its own write-up, so we will close with discussing who we call.

Without loss of generality, assume that we would only ever call a customer once. The diagram below has the percentage of customers called on the x-axis. And the y-axis is in dollars with 2 curves:

  • Incremental Benefit — this curve shows the marginal incremental benefit of calling the customer with the next highest IB
  • Avg cost — this horizontal curve shows the average cost of an outbound call

 

There are two very interesting points to discuss:

  • Profit max — calling everyone to the left of the intersection of incremental benefit and avg cost is the allocation that maximizes profit. Every one of these calls brings in more revenue than cost.
  • Conversion max — notice that incremental benefit dips below zero. This is especially true when you remove the assumption that we only call each customer once. The point that maximizes conversion for the client is to call everyone to the left of where incremental benefit intersects with zero.

Our default strategy is to call all customers to the left of the profit maximizing intercept. Interestingly, an intuitive investigation of the types of customers selected reveals customers at two extremes: we end up calling both very high value customers that have shown a lot of intent to pay (e.g. dropped off from signup after selecting a payment plan) and customers where email has been ineffectual (e.g. keeps opening emails with no clicks or no email opens.)

 

Conclusion

The world has become increasingly digital, and a multi-channel strategy is the right response. Bringing the traditional tool of phone, as just one channel within this strategy, forced us to rethink a lot of assumptions and see where the problem led us. We began by replacing the traditional “propensity to pay” phone metric with incremental benefit, found ways to predict this value, and implemented a phone allocation strategy that maximizes profits for the business.

How Much Testing is Enough Testing?

By on February 2nd, 2017 in Engineering and Data, Product and Technology, Testing
TrueAccord Blog

Ggb by night


One hundred years ago, a proposal took hold to build a bridge across the Golden Gate Strait at the mouth of San Francisco Bay.  For more than a decade, engineer Joseph Strauss drummed up support for the bridge throughout Northern California.  Before the first concrete was poured, his original double-cantilever design was replaced with Leon Moisseiff’s suspension design.  Construction on the latter began in 1933, seventeen years after the bridge was conceived.  Four years later, the first vehicles drove across the bridge.  With the exception of a retrofit in 2012, there have been no structural changes since.  21 years in the making.  Virtually no changes for the next 80.

Now, compare that with a modern Silicon Valley software startup.  Year one: build an MVP.  Year two: funding and product-market fit.  Year three: profitability?…growth? Year four: make it or break it.  Year five: if the company still exists at this point, you’re lucky.

Software in a startup environment is a drastically different engineering problem than building a bridge.  So is the testing component of that problem.  The bridge will endure 100+ years of heavy use and people’s lives depend upon it.  One would be hard-pressed to over-test it.  A software startup endeavor, however, is prone to monthly changes and usually has far milder consequences when it fails (although being in a regulated environment dealing with financial data raises the stakes a bit).  Over-testing could burn through limited developer time and leave the company with an empty bank account and a fantastic product that no one wants.

I want to propose a framework to answer the question of how much testing is enough.  I’ll outline 6 criteria then throw them at few examples.  Skip to the charts at the end and come back if you are a highly visual person like me.  In general, I am proposing that testing efforts be assessed on a spectrum according to the nature of the product under test.  A bridge would be on one end of the spectrum whereas a prototype for a free app that makes funny noises would be on the other.

Assessment Criteria

Cost of Failure

What is the material impact if this thing fails?  If a bridge collapses, it’s life and death and a ton of money.  Similarly, in a stock trading app, there are potentially big dollar and legal impacts when the numbers are wrong.  On the contrary, an occasional failure in a dating app would annoy customers and maybe drive a few of them away, but wouldn’t be catastrophic. Bridges and stock trading have higher costs of failure and thus merit more rigorous testing.

Amount of Use

How often is this thing used and by how many people?  In other words, if a failure happens in this component, how widespread will the impact be?  A custom report that runs once a month gets far less use than the login page.  If the latter fails, a great number of users will feel the impact immediately.  Thus, I really want to make sure my login page (and similar) are well-tested.

Visibility

How visible is the component?  How easy will it be for customers to see that it’s broken?  If it’s a backend component that only affects engineers, then customers may not know it’s broken until they start to see second-order side effects down the road.  I have some leeway in how I go about fixing such a problem.  In contrast, a payment processing form would have high visibility.  If it breaks, it will give the impression that my app is broken big-time and will cause a fire drill until it is fixed.  I want to increase testing with increased visibility.

Lifespan

This is a matter of return on effort.  If the thing I’ve built is a run-once job, then any bugs will only show up once.  On the other hand, a piece of code that is core to my application will last for years (and produce bugs for years).  Longer lifespans give me greater returns on my testing efforts.  If a little extra testing can avoid a single bug per month, then that adds up to a lot of time savings when the code lasts for years.

Difficulty of Repair

Back to the bridge example, imagine there is a radio transmitter at the top.  If it breaks, a trained technician would have to make the climb (several hours) to the top, diagnose the problem, swap out some components (if he has them on hand), then make the climb down.  Compare that with a small crack in the road.  A worker spends 30 minutes squirting some tar into it at 3am.  The point here is that things which are more difficult to repair will result in a higher cost if they break.  Thus, it’s worth the larger investment of testing up front.  It is also worth mentioning that this can be inversely related to visibility.  That is, low visibility functionality can go unnoticed for long stretches and accumulate a huge pile of bad data.

Complexity

Complex pieces of code tend to be easier to break than simple code.  There are more edge cases and more paths to consider.  In other words, greater complexity translates to greater probability of bugs.  Hence, complex code merits greater testing.

Examples

Golden Gate Bridge

This is a large last-forever sort of project.  If we get it wrong, we have a monumental (literally) problem to deal with.  Test continually as much as possible.

Criterion Score
Cost of failure 5
Amount of use 5
Visibility 5
Lifespan 5
Difficulty of repair 5
Complexity 4

Cat Dating App

Once the word gets out, all of the cats in the neighborhood will be swiping in a cat-like unpredictable manner on this hot new dating app.  No words, just pictures.  Expect it to go viral then die just as quickly.  This thing will not last long and the failure modes are incredibly minor.  Not worth much time spent on testing.

Criterion Score
Cost of failure 1
Amount of use 4
Visibility 4
Lifespan 1
Difficulty of repair 1
Complexity 1

Enterprise App — AMEX Payment Processing Integration

Now, we get into the nuance.  Consider an American Express payment processing integration i.e. the part of a larger app that sends data to AMEX and receives confirmations that the payments were successful.  For this example, let’s assume that only 1% of your customers are AMEX users and they are all monthly auto-pay transactions.  In other words, it’s a small group that will not see payment failures immediately.  Even though this is a money-related feature, it will not merit as much testing as perhaps a VISA integration since it is lightly used with low visibility.

Criterion Score
Cost of failure 2
Amount of use 1
Visibility 1
Lifespan 5
Difficulty of repair 2
Complexity 2

Enterprise App — De-duplication of Persons Based on Demographic Info

This is a real problem for TrueAccord.  Our app imports “people” from various sources.  Sometimes, we get two versions of the same “person”.  It is to our advantage to know this and take action accordingly in other parts of our system.  Person-matching can be quite complex given that two people can easily look very similar from a demographic standpoint (same name, city, zip code, etc.) yet truly be different people.  If we get it wrong, we could inadvertently cross-pollinate private financial information.  To top it all off, we don’t know what shape this will take long term and are in a pre-prototyping phase. In this case, I am dividing the testing assessment into two parts: prototyping phase and production phase.

Prototyping

The functionality will be in dry-run mode.  Other parts of the app will not know it exists and will not take action based on its results.  Complexity alone drives light testing here.

Criterion Score
Cost of failure 1
Amount of use 1
Visibility 1
Lifespan 1
Difficulty of repair 1
Complexity 4

Production

Once adopted, this would become rather core functionality with a wide-sweeping impact.  If it is wrong, then other wrong data will be built upon it, creating a heavy cleanup burden and further customer impact.  That being said, it will still have low visibility since it is an asynchronous backend process.  Moderate to heavy testing is needed here.

Criterion Score
Cost of failure 4
Amount of use 3
Visibility 1
Lifespan 3
Difficulty of repair 4
Complexity 4

Testing at TrueAccord

TrueAccord is three years old.  We’ve found product-market fit and are on the road to success (fingers crossed).  At this juncture, engineering time is a bit scarce, so we have to be wise in how it is allocated.  That means we don’t have the luxury of 100% test coverage.  Though we don’t formally apply the above heuristics, they are evident in the automated tests that exist in our system.  For example, two of our larger test suites are PaymentPlanHelpersSpec and PaymentPlanScannerSpec at 1500 and 1200 lines respectively.  As you might guess, these are related to handling customers’ payment plans.  This is a fairly complex, highly visible, highly used core functionality for us.  Contrast that with TwilioClientSpec at 30 lines.  We use Twilio very lightly with low visibility and low cost of failures.  Since we are only calling a single endpoint on their api, this is a very simple piece of code.  In fact, the testing that exists is just for a helper function, not the api call itself.

I’d love to hear about other real world examples, and I’d love to hear if this way of thinking about testing would work for your software startup.  Please leave us a comment with your point of view!

Applying Machine Learning to Reinvent Debt Collection

By on January 24th, 2017 in Product and Technology, Uncategorized
TrueAccord Blog

Our Head of Data Science, Richard Yeung, gave a talk at the Global Big Data conference. The talk focused on the first steps from heuristics to probabilistic model, when building a machine learning system based on expert knowledge. This feedback loop is what allowed our automated system to replace the old school call center-based model with a modernized, personalized approach.

You can find the slides here.

Skipping Photoshop: How we made ID Badge creation 10x faster by using facial recognition

By on November 1st, 2016 in Engineering and Data, Product and Technology
TrueAccord Blog

Recently TrueAccord has grown to the size where our compliance stance requires the addition of photo ID badges. It’s a rite of passage all small-but-growing companies endure and ours is no different.

Since I have previous experience setting up badge systems and dealing with the printers, I volunteered to kickoff this process. I’ve evaluated pre-existing badge creation software in the past and found them all significantly lacking. In a previous environment, I wrote my own badge creation software which fit the needs at the time. The key phrase being “at the time“. For tech startups, it’s not unusual to go from onboarding one person every other week, to 10 people a week in a year or two. That means every manual step for onboarding someone will go from an “oh well, it’s just once every other week” to “we need to dedicate several hours of someone’s time every week to this process.” Typically that same growth period also happens to be when your operations (IT, Facilities, and Office Admin) organizations are the most short staffed and the least likely to have the free time to do that. “Where is this going?” and “How much work does this mean for me?”, you ask? Allow me to share with you how I automated our badge system – Photoshop included.

Continue reading “Skipping Photoshop: How we made ID Badge creation 10x faster by using facial recognition”

Repos: How we use MySQL as a key-value store

By on July 21st, 2016 in Engineering and Data, Product and Technology
TrueAccord Blog

When we started TrueAccord in 2013, we used MySQL to store our data in pretty traditional way. As business requirements came in, we found ourselves continuously migrating our table schemas to add more columns and more tables. Before MySQL 5.6, these schema changes would lock down the database for the entire duration of a change causing a brief downtime. When the company was smaller and just starting out, this was tolerable, but as we grew the increase in schema complexity was getting harder to manage via SQL migration scripts.

We were looking for an alternative, something like Big Table, the key-value store that I used back at Google. Using a key-value store enables storing an entire document as a value, and thus eliminating the need for migrations. We investigated several publicly available key-value stores, but none of them met our major requirements at the time. As a small engineering team, we wanted a hosted fully managed database solution, so that backups and server migrations are taken care of for us. Additionally we wanted security features like encryption at rest. DynamoDB came the closest to matching our requirements, but was missing encryption at rest.

We came across this old post from FriendFeed that describes at a high-level design that meets our requirements which inspired our implementation. First, we chose to use MySQL (now Aurora) managed by Amazon RDS as our backing datastore. This solves the requirement for a hosted, managed, encrypted database, and this is a battle-tested database. Then for the key-value interface (to avoid schema migrations), we built a thin library called Repos that provides a key-value interface implemented on top of MySQL. Now we have something that allows us to move quickly on top of a reliable datastore.

Enter Repos

Each repo represents a map from a UUID (key) to an arbitrary array of bytes representing the value. Each repo is stored in MySQL using two tables. The first table is the log table. Every time we wanted to insert or update an entity, we will insert it to this table.

Column name Type Description
pk bigint(20) Auto incremented primary key
uuid binary(16) Unique id for each entry
time_msec bigint(20) Time inserted
format char(1) Describes the format of the entry_bin column.
entry_bin longblog The value.

We always append to this table, never updating an existing row. By doing so, we get the full history of every object. This has proven to be really handy for debugging why a change has occurred, and when.

The format column can take two possible values: ‘1’ means the value in entry_pb is a serialized protocol buffer, and ‘2’ means it is compressed using Snappy (a compression scheme that aims for high speed and reasonable compression)

To optimize look-ups, we have another table, the “latest” table, with the following format:

Column name Type Description
parent_pk bigint(20) PK of this entry in the log table.
uuid binary(16) The unique id of the entry(here it is a primary key)
format Char(1) Describes the format of the entry_bin column.
entry_bin longblog The value.

 

Whenever we insert an element to the log table, we also upsert it to this table so it always has the latest inserted element. We do this as a transaction to ensure the tables are always in sync.

Secondary Index Implementation

The first hurdle when going in this route is secondary indexes. For example, if your Repo maps a user id to his account information (email, hashed password, full name), how would you look up an account by email? To do so, we implemented index tables. An index table maps the values in the key value store to a primitive value that MySQL can index. A single repo may have multiple indexes, and each one goes to its own table. Index tables have the following layout:

Column name Type Description
parent_pk bigint(20) PK of this entry in the log table.
uuid binary(16) Random id for each entity (here it is a primary key)
value * The indexed value (for example, the email address of the user)

 

We always insert to the secondary index. Therefore, over time, the index will contain stale values. To solve that, when querying, we join the uuid and parent_pk with the latest value and return the result only if there is a match.

For example, if we have a person with id “idA” and he changed his  email, the log table would look like this:

pk uuid time_msec value (format, entry_bin)
501 idA t1 {“user”: “john”, “email”: “john@example.com”}
517 idA t2 {“user”: “john”, “email”: “john@domain.com”}

 

The latest table, would have only the updated row:

parent_pk uuid value (format, entry_bin)
517 idA {“user”: “john”, “email”: “john@domain.com”}

 

The email index table would have the email value, for each version of the object:

parent_pk uuid value
501 idA john@example.com
517 idA john@domain.com

 

Now, to find an account whose latest email value is “john@domain.com”, the Repos library would build a query similar to this:

SELECT l.uuid, l.format, l.entry_bin FROM latest AS l, email_index AS e
  WHERE e.value = john@example.com" AND
        e.uuid = l.uuid AND e.parent_pk = l.parent_pk

Our Repo library provides a nice Scala api for querying by index. For example,

accountsRepo.byEmail.all("john@domain.com")

Would return all the accounts that have this email address.

Using Table Janitor to Manage Our Tables and Indexes

The table janitor is a process implemented as an Akka actor that runs on our JVMs. This actor is responsible for two main tasks:

  1. Ensuring that the underlying MySQL tables are created.It does this by reflecting all of the Repos and indices defined in the code and then creating the corresponding MySQL tables. This makes adding a new repo or adding an index as simple as just defining it in the code.
  2. Ensuring that the indices are up to date. This is necessary since when a new index gets added, there may still be servers that run old version of the code and do not write into the new index. The table janitor regularly monitors the log tables and (re-)indexes every new record. Adding an index to an existing repo is easy – we just declare it in the code.

How we do Analytics

We use AWS data pipeline to incrementally dump our log tables into S3. We then use Spark (with ScalaPB) for Bigdata processing. We also upload a snapshot of it to Google’s Bigquery. As all our repos use Protocol buffers as their value type, we can automatically generate Bigquery schemas for each repo.

Pros and Cons of Our Approach

By writing repos and have all our database access go through it, we get a lot of benefits:

  • Uniformity: having all our key-value maps being repos has the advantage that every optimization and every improvement applies to all our tables. For example, when we build a view that shows an object history, it works for all of our repos.
  • Schema evolution is free when using protocol buffers as values. We can just add optional fields, rename existing fields, or convert an optional to a repeated and it just works.
  • Security: storing data securely on RDS is a breeze. Encryption at rest? Click a checkbox. Require data encryption in transit? SSL is supported by default.
  • Reliability: We never had the RDS MySQL (later Aurora) instances go down (besides rare scheduled maintenance windows which require the instances to be rebooted). We have never lost data. Additionally we can recover the database to any given snapshot in time with RDS by replaying binary logs on top of a snapshot.
  • Ease of use: adding a Repo or an index is trivial. All of our ~60 or so Repos work in exactly the same way, and accessed through the same programmatic interface, our engineers can easily work with any of them using the same programming interface.
  • Optimization/Monitoring/debugging: Since MySQL is a mature and well-understood technology, there is a plethora of documentation on how to tune it, how to debug problems. In addition, AWS provides a lot of metrics for monitoring how an RDS instance is doing.

However, there are also downsides:

  • Storing binary data in MySQL limits what can be done using the command line MySQL client. We had to write a command line tool (and a UI) to look up elements by key so we can debug. For more complex queries, we use Spark and BigQuery for visibility into our data.
  • Being a homegrown solution, we occasionally had to spend time tuning our SQL queries when our repos grew in size. On the positive side, scaling up due to business growth is a good problem to have and fixing it for one repo, made an improvement for all others.
  • JDBC has Multiple Layers: JDBC/HikariCP/Mysql connector: we had quite a few issues where it was tricky to pinpoint the source of the problem.

Alternatives: What the Future Looks Like

As much as we’d like our homegrown solution, we are continuously thinking what our next storage solution will be like.

  • Current versions of both MySQL and Postgres come with built-in support for indexing JSON documents.
  • Google now offers a publicly hosted version of Bigtable.
  • We are moving towards having our data represented as a stream of events which may benefit from a different data store.

Success

The Repos implementation has enabled our engineering team to quickly develop a lot of new functionality, as well as iterating over the data schema. By implementing on top of RDS, we have the peace of mind that our data is safe and our servers are up to date with all the security patches. At the same time, having full control over the implementation details of repos allowed us to quickly implement additional security measure so we can satisfy the stringent requirements of card issuers and other financial institutions, without sacrificing development speed.

Put your alerts in version control with DogPush

By on July 7th, 2016 in Engineering and Data, Product and Technology
TrueAccord Blog

At TrueAccord, we take our service availability very seriously. To ensure our service is always up and running, we are tracking hundreds of system metrics (for example, how much heap is used by each web server), as well as many business metrics (how many payment plans have been charged in the past hour).

We set up monitors for each of these metrics on Datadog, that when triggered, will page an on call engineer. The trigger is usually based on some threshold for that metric.

As our team grew and more alerts were added we noticed three problems with Datadog:

  1. Any member of our team can edit or delete alerts in Datadog’s UI. The changes may be intentional or accidental, though our team prefers to review changes before they hit production. In Datadog, the review stage is missing.
  2. Due to the previous problem, sometimes an engineer would add a new alert with uncalibrated thresholds to datadog to get some initial monitoring for a newly written component. As Murphy’s law would have it, the new alert would fire at 3am waking up the on call engineer, and it may not even indicate a real production issue, but a miscalibrated threshold. A review system could better enforce best practices for new alerts.
  3. Datadog also does not expose a way to indicate that an alert should only be sent during business hours. For example, for some of our batch jobs, it is okay if they fail during the night, but we want an engineer to address it first thing in the morning.

To solve these problems, we made DogPush. It lets you manage your alerts as YAML files that you can check in your source control. So you can use your existing code review system to review them, and once they’re approved they get automatically pushed to DataDog — Voila! In addition, it’s straightforward to setup a cron job (or a Jenkins job) to automatically mute the relevant alerts outside business hours.
DogPush is completely free and open source – check it out here.