Photo by Deva Darshan on Unsplash
Enhancing Billing System Reliability: Harnessing the Efficiency of Inbox & Outbox Patterns
Architecture involvement for the Billing system
It is imperative to maintain data integrity, consistency, and reliability when working with the Billing service. Any missing or redundant data could lead to incorrect invoices and charges to customers and incorrect numbers for the finance department.
On the other hand, building a billing system from scratch is expensive. It requires not only technical effort but also integration with banks, working with authorities on taxes, laws, etc. Fortunately, many payment providers offer full-fledged payment and subscription functionality. Therefore, instead of reinventing the wheel, the billing team should focus on their business, for example, how to collect usage, calculate amounts, etc.
Using third-party payment providers has obvious advantages. However, introducing new integrations will increase the system's complexity and introduce one more point of failure.
In this article, we will see how the Inbox & Outbox patterns are applied to the billing service.
This is the simplest version and is straightforward to understand. The Billing service needs to send requests to Stripe to record subscriptions, usage, etc. And it is also need to register webhook events from Stripe to know when an invoice is finalized or paid, and when a payment succeeds or failed.
Ingress problems
Problem 1: Webhook event order
Stripe doesn't ensure the order of messages. Even if it does, the order of messages could be changed because of the network latency, failed node, etc. Let's take a look at the following example:
In theory, Stripe will send these 3 messages in order Message1
=> Message 2
=> Message 3
and the Billing service will handle these 3 messages:
Invoice created
=> Persists the invoiceInvoice finalized
=> Mark the invoice finalized => expects the invoice already exists in the systemInvoice paid
=> Mark the invoice paid => expects the invoice already exists in the system and its status isfinalized
In case message 3 goes to Billing before the 2 other ones due to an unknown reason from Stripe or network latency, the business logic from Billing will reject message 3 because the invoice does not exist in the system.
A simple solution, in this case, is our system should tolerate the order of messages and be able to enrich data. For example, when receiving message 3, if the invoice does not exist in the system, the Billing will request more data from Stripe and then Billing will record the invoice and then mark it as paid.
Fortunately, the message payload from Stripe has all the necessary information for the invoice to persist in the Billing system. Otherwise, the Billing service should be able to fetch the invoice from Stripe.
Problem 2: Idempotency
We should not trust Stripe that it will send exactly one webhook request for each message. Idempotency is an essential capability for a service in a distributed system.
There are some reasons that a message reaches the Billing more than 1 time. E.g. Retries from Stripe.
A simple idea is the Billing service stores the message IDs that it already received.
Problem 3: Timeout
Billing takes a long time to respond to Stripe then causes a timeout issue. The reason could be that Billing needs a long-running process to handle a message from Stripe.
For example: Invoice generated
=> parse message payload => check corresponding account => check corresponding subscription => update invoice table in the database.
In this case, Stripe requests should not depend on the Billing's business and processes. So, we can simply ingest the message to the system and return it immediately after the message is ingested successfully. And the Billing will have another process to handle the message.
A requirement for this approach is the Billing needs to persist message payloads to the database and get it later to handle. We can use the precedent diagram for this and it is called Inbox pattern
.
Egress problems
Problem 1: Transaction
In the diagram, subscribing a specific account to a particular plan requires 2 actions:
Persist the relation (subscription) between the account and the plan to the local database
Create a corresponding subscription on Stripe
They must be done at the same transaction, which means if an action fail, we must roll back another one.
But as you can see, the transaction is hard to achieve because they are 2 separate actions in 2 systems: database and Stripe.
An approach to this problem is to take advantage of database transactions. The idea is to persist the subscription to the database and the "job" to create a subscription on Stripe to the database in a transaction, and then, there will be a background handler to pick the job to request to Stripe. This is called Outbox
pattern.
Problem 2: Uncontrol failing reasons
Another advantage of the outbox pattern approach is to decouple the Billing system from Stripe, then it can respond to clients as soon as possible after the transaction is committed. The outbox requests could be failed due to many reasons rather than Stripe rejection. E.g. Rate limit, Network.
With the outbox processor, we can implement a retries mechanism and set alerts if the attempts reach or cross a retry threshold.
Challenges when using Inbox & Outbox patterns
From the precedent section, we know how the Inbox & Outbox patterns make the system reliable. However, as the article mentioned at the beginning, they come with the cost of complexities and possible failure points.
One of them is handling inbox messages and outbox jobs sequentially. Especially in a distributed system, scaling the Billing service to multiple nodes requires us to implement a queue mechanism to ensure the inbox and outbox processors pick jobs sequentially.
Another challenge is concurrent handlers. Imagine you have a huge number of inbox messages, and handling them sequentially will create a bottleneck for the system. Therefore, we would like to guarantee the sequentially is not global but at the account level. This opens another complexity to having concurrent handlers for different accounts.
In this article's scope, we won't discover the solutions for these challenges yet. However, if you have encountered and implemented solutions for them, feel free to leave comments!
In conclusion, Inbox & Outbox are power patterns to handle integration with other services in distributed systems. However, consider the trade-off before introducing them into your systems.
I hope you enjoy reading and any feedback is warmly welcomed!!
References