Blog article
See all stories »

Trade-offs Are Inevitable in Software Delivery - Remember the CAP Theorem

In the world of financial services, the integrity of data systems is fundamentally reliant on non-functional requirements (NFRs) such as reliability and security. Despite their importance, NFRs often receive secondary consideration during project scoping, typically being reduced to a generic checklist aimed more at compliance than at genuine functionality. Regrettably, these initial NFRs are seldom met after delivery, which does not usually prevent deployment to production due to the vague and unrealistic nature of the original specifications.

This common scenario results in significant end-user frustration as the system does not perform as expected, often being less stable or slower than anticipated. This situation underscores the need for better education on how to articulate and define NFRs, i.e. demanding only what is truly necessary and feasible within the given budget. Early and transparent discussions can lead to system architecture being tailored more closely to realistic non-functional requirements, potentially saving costs and focusing efforts where they are most needed.

Stringent NFR requirements are generally very complex and costly to implement, so they should only be enforced if truly necessary. Additionally, some NFRs are mutually exclusive and require a compromise, so stakeholders should be aware of the impact of certain choices. For example, a highly flexible system is naturally less optimized for a specific task and typically also less performant.

Comparisons between the NFR achievements of financial institutions and Big Tech companies like Google or Meta are often unfair and misleading. Big Tech’s ability to meet high standards in NFRs — such as high availability, perfect scalability, and high performance despite enormous volumes — is the result from several key differences:

  • Cost: With their vast scale, Big Tech companies can justify the extensive tuning of software by dozens of engineers, where even minor efficiencies translate into significant cost savings.

  • Technology Stack: Unlike banks that may rely heavily on standard one-size-fits-all technology, Big Tech companies utilize a complex stack of technologies. For example, where most banks predominantly use traditional SQL databases like Oracle for storing data, Big Tech firms deploy a diverse array of database technologies (SQL databases, time-series databases, key-value stores, column-based stores, document-based stores, graph-based stores) each tailored to specific use cases.

  • Legacy Constraints: Banks often deal with legacy systems and infrastructure not designed to meet the performance, scalability, and availability required for modern, continuous 24/7 usage. As a system is only as strong as its weakest link, NFRs should be adapted to these constraints. With the financial sector gradually replacing their legacy software by a modern, open-source-based stack and shifting also to the cloud, more ambitious NFR targets can be set.

  • Infrastructure and Platforms: Unlike Big Tech firms, which have dedicated engineering teams to build and maintain robust underlying technology platforms that abstract away non-functional complexity from development teams focusing on business features, banks often require their feature development teams to build systems from scratch, leading to inefficiencies as these teams are not specialized in NFR tuning.

  • Data Consistency: In banks, data consistency usually defaults to strong consistency to prevent issues like double spending or incorrect account balances. However, many scenarios might be adequately served by eventual consistency, which could simplify other NFRs. In Big Tech, many features operate with eventual consistency and in the majority of cases, end-users never notice this. This less stringent approach towards consistency allows Big Tech firms to deliver much better results on NFRs like performance and availability, which are often more visible to end-users.

Understanding trade-offs and compromises is crucial. As financial institutions increasingly adopt distributed systems to enhance their scalability and resilience, understanding the underlying principles that govern these systems becomes essential.

Distributed systems consist of numerous smaller, replicated servers, allowing the use of cheaper hardware and improving reliability by avoiding single points of failure. However, they also introduce complexities, such as more failure-prone individual servers (nodes) and the challenge of ensuring data consistency across servers.

This is where the CAP theorem comes into the picture, formulated by Eric Brewer, which theoretically proves that trade-offs are inevitable. This theorem states that a distributed data system can provide at most two out of three critical guarantees simultaneously:

  • Consistency: Every read (on any node) retrieves the most recent and correct information.

  • Availability: Every active node can always process requests and update others.

  • Partition Tolerance: The system continues to function even when parts of it lose communication.

Financial institutions, operating under a zero-tolerance policy for data loss, generally prioritize consistency ((i.e. ensure that all transactions are accurately recorded) but must understand that during network failures, a choice may be necessary between high availability and guaranteed consistency.

The CAP theorem is often misunderstood as necessitating a constant trade-off among the three guarantees. However, the necessity to choose arises primarily during network failures. At all other times, no trade-off is required. The choice is really between Guaranteed Consistency and High Availability only when a network partition or failure occurs.

  • High Availability (AP system): This type of system allows reads before all nodes are updated. This means the query is always processed, and the system will try to return the most recent available version of the information, even if it cannot guarantee it is up-to-date due to network partitioning.
    In such a system, eventual consistency can be achieved. This is managed through a background process that replicates all information. When a node is not available or the network connection is disrupted, the updates are buffered until the network is restored. Thus, consistency will eventually be reached, but during the time it takes to achieve consistency, the system may not provide the most recent information.

  • Guaranteed Consistency (CP system): In this system, all nodes are locked for reading—that is, the system will return an error or a timeout if it cannot guarantee that the information is up-to-date due to network partitioning—until all updates are processed.

The CAP Theorem is a prime example of the trade-offs and design choices system architects need to make. System design is, however, a constant balancing act between conflicting requirements. It is crucial to educate all employees about these limitations and trade-offs, so that choices can be made which align as closely as possible with the business requirements.

1741

Comments: (2)

Ketharaman Swaminathan
Ketharaman Swaminathan - GTM360 Marketing Solutions - Pune 02 May, 2024, 12:20Be the first to give this comment the thumbs up 0 likes

Woz an FI that spec'ced 2400TPS for its onramp system to an A2A RTP rail, expecting to hit that throughput in Year 3. After 15 years, the entire scheme itself hasn't reached half of that throughput! But I know exactly what would've happened if we'd moderated the customers' expectations upfront. 

While I agree with almost all your points, it's virtually impossible to win deals when the CEO / CIO has just come back fresh from attending a tradeshow where they were told by the Gartners and McKinseys and SamAs that e.g. ChatGPT can do everything faster, better and cheaper than humans. Against that backdrop, any vendor who tells them that they can get only any 2 out of 3 because some Eric Brewer / CAP Theorem said so would be branded as pessimistic and thrown out of vendor shortlist for being incompetent, unenthusiastic, or both. C'est la Vie.

Joris Lochy
Joris Lochy - Capilever - Brussels 02 May, 2024, 20:23Be the first to give this comment the thumbs up 0 likes

Thanks for the feedback. Indeed, you are absolutely right. As long as top management is not aware of certain necessary trade-offs, it will be impossible for analysts & architects to correctly do their job. But I hope my blog can bring maybe a little bit (not making too much illusions either :-)) of idea, that technology is always a matter of making trade-offs.

Joris Lochy

Joris Lochy

Product Manager at Intix | Co-founder

Capilever

Member since

05 Apr 2017

Location

Brussels

Blog posts

121

Comments

19

This post is from a series of posts in the group:

Banking Architecture

A community for discussing the latest happenings in banking IT. Credit Crunch impacting Risk Systems overall, revamp of mortgage backed securities, payment transformations, include business, technology, data and systems architecture capturing IT trends, 'what to dos?' concerning design of systems.


See all

Now hiring