The journey to microservices is focused on breaking down monolithic applications into compose, domain driven services. There are many valuable aspects to creating smaller composable services. Smaller services are easier to deploy and scale. The overall system resilience using smaller composable services is stronger as failures can be isolated. It also supports more flexibility and reusability across the enterprise. However, the microservice architecture has drawbacks as well, making it more of a tradeoff scenario.
One of those drawbacks is the splitting of data across multiple data stores. Many times the monoliths we are breaking up are backed by huge monolithic relational databases as well. These huge schemas could contain hundreds of relationships and huge amounts of database-specific implementation details such as complex stored procedures, views or indexing jobs. The job of breaking down the monolithic database can sometimes be the most complex aspect of the journey.
The Joy of the Foreign Key
The foreign key in a relational database is a very powerful tool. A foreign key is a field that is used to uniquely identify a row in another table. It is called a “foreign” key because it is a field that is related to a primary key in another table. Foreign keys are used to establish and enforce relationships between tables in a database. However the power of the foreign key is it’s ability to ensure clean data. We can build select field drop downs with the foreign table entries on user interfaces to ensure they can only input the data we want them to. We also use the foreign key as a last resort in validating the data ensuring that whatever is getting inserted into the primary tables isn’t something unexpected.
In the energy trading world, there are many different examples of foreign key relationships. The best one for discussion is the trade to counterparty relationship. A trade is an agreement to buy or sell some commodity (or to get a financial settlement based on a commodity price) either now or sometime in the future. When executing a trade, there are the legal entities associated with the now contractual obligation to either provide the commodity or to take delivery of that commodity at a specific time and place for a specific price. These legal entities are known as counterparties.
When a new trade is executed, these counterparties already need to be established with the company. A trading company can’t just start trading with some other company. The company will need to do a KYC (Know Your Customer) procedure and establish underlying contracts governing the relationship and establish credit limits, among other things. This means when a new trade is created, and that trade references a counterparty, the business demands that the trade is validated to only be executing with established counterparties in good standing with the company.
In a trading system world, this means the trade capture system executes this validation when someone attempts to create a new trade in the system. In a monolithic ETRM, the process will probably do a counterparty lookup, check the status of the counterparty and then save the trade – with that foreign key constraint being one of the checks to ensure the trade tables only reference the entities already in the counterparty table.
Beginning the Split
In this example, the domains are pretty well established: trade and counterparty. They each have unique characteristics. The counterparty is a legal entity data structure and might have naming information, physical addressing, logical addressing, contact information and more details about the company. In the context of the trade, very little of this is relevant other than knowing that a particular side of a leg of a trade is with a particular counterparty.
Some of the positive aspects of microservices quickly come into view with the splitting. The trade service is going to be much more volatile in terms of change both due to new trade instruments and functionality, but also with fixing bugs. The trade service will also be much larger in scope of capability as it will need to possibly support many more data validations other than those associated with a counterparty. The counterparty service could also now be used by multiple other systems to manage the data associated with contractual events associated with the legal entities that are not trades, such as the logistics management system. However these benefits are not without tradeoffs.
What’s the Complications?
The first tradeoff is going to be related to data maintenance across independent systems. In the monolith, the entire trade could be validated from top to bottom in a single system. In a standalone trade service, we don’t want all of those capabilities to be present as they are not a core part of the domain at hand. We will want a slimmed down set of data validations, but there will be some choices that need to be considered.
Choices to Consider
The trade system will have an API to accept a trade object for persistence. As part of this process, the trade will need to be checked for both structural and compositional validity. In the case of a counterparty, we will want to check that the trade not only contains references to the correct counterparty in the correct places, but we will want to check that they are counterparties that we have already contractually agreed to trade with.
While the check on structural validity can be offloaded to the data format’s validation manager, the part about checking the actual counterparty is where we start to have to make choices and some additional complexity arises. The quick answer may be for the trade service to obtain the counterparties identifying information from the trade and check the counterparty service if the entity exists and is in good standing.
Four Options
- Check Service Every Time – in this scenario, for every trade submitted to the trade service, the validation step grabs the counterparty identifiers and checks them against the counterparty service for validation. The issues include that if the counterparty service is down, the validation will fail. Also, if there is a lot of traffic on the trade service, then the IO between the trade and counterparty service might become an issue. Not too mention that the counterparty service will need to be able to manage any load produced by the validation processes.
- Local Cache Plus Changes – in this option, when the service starts, it would reach out to the counterparty service, fetch them all and store the relevant information in either a local cache or a standalone table. When the trade is submitted, the service will not have to do a remote check for the counterparty data, but it would instead use the local cache. There are a couple of tradeoffs for this one. First, the trade service now has to create and maintain the caching mechanisms. Second, the loading of the cache at startup may inhibit startup times and depending on the implementation, possibly make the service no longer able to be “serverless” due to it’s startup times. Last but not least, if you have a cache, then you have to deal with the second of the two hardest things in computer science – cache invalidation!
- Cache Updates via Polling – one way to deal with the cache staleness is to poll the service and updated the local data on a regular basis. The counterparty service might even provide a luxury service that will give you a list of all the counterparties that had been updated since a given timestamp.
- Cache Updates via Events – another way is if the counterparty service is emitting events, then the trade service could listen to those events and update the cache. The trade service is probably going to be emitting and consuming events anyway, so the additional code here may not be significant.
- Local Cache Plus Changes Plus Read-Through – this is my favorite option as it’s going to be the most robust feature. In this case, we are still creating a local cache however the validation process is going to read-through the cache to the counterparty service if there is a cache miss, and upon return, fill in the cache. In this manner, the service has no startup delays because cache loading is running in a background thread and the trade consumption doesn’t require the cache to be loaded. The updates can also be best effort as on any cache miss, the read through will handle the validation check.
- Don’t Check – the fourth option is going to make anyone who has spent any time in the energy trading world shudder a bit. If the counterparty fields are present and well-formed, then it’s really a discussion on who might be the ones sending trades into the service.
- Exchanges – if the trades are coming from an exchange, then the event processing mechanisms in that data flow will *probably* be substituting the correct counterparty information.
- ETRMs – if an ETRM is sending a trade it captured to the trade service, then the counterparty information it sends is *probably* correct already because to enter the trade into the ETRM, it would need the counterparties.
- One-Off Trade Capture UI – many trade organizations have one-off quick trade blotter capabilities where some trader threw together a web application to capture the trades. If this system is already using the counterparty service to get it’s data, then again, it’s *probably* valid.
- Insert Cache Miss – One other option in this case is if the data is well-formed and structurally complete, is there a possibility for the trade service to submit a new counterparty to the counterparty service? This would require a fairly rudimentary interface be available for submitting barebones counterparty data along with a fairly robust service to all interested parties that a new trade just happened with a counterparty that’s not in our system. This alert should be accompanied by a horn blast in the risk department.
Conclusion
Breaking down a monolith is not an easy task. There are many times complications and tradeoffs that will need to be addressed, especially in the beginning as the new components only provide additional headaches and little relief from the pain of the monolith. There are also some ideas around what a microservice is that may need to be adjusted, both in terms of scope and other characteristics. In this case, we may end up with duplicated counterparty data in two different databases which isn’t ideal, but it also allows us to make changes to our trade system almost daily so the tradeoffs may be very valuable. Stick with the process and don’t be afraid to write throwaway code to get something across the goal line as long as your product manager promises that they will give you time next quarter to clean it up.