What the Liskov Principle Saved Me From

Of all the SOLID principles, the Liskov Substitution Principle is the one with the scariest name and the simplest idea. Today I want to explain it the way I wish someone explained it to me twenty years ago, with a production story attached, because principles without scars are just trivia.

The idea in plain words

Barbara Liskov presented the idea in her 1987 keynote, Data Abstraction and Hierarchy, and it deserves a lot of respect, because it still saves money every day in codebases she never saw. The formal definition talks about subtypes and provable properties. The plain version is this.

If your code works with a parent type, it must also work with any child of that type, without knowing the difference and without surprises.

That is all. A subclass must keep the promises of its parent. It can do more, but it cannot do less, and it cannot change the meaning of what the parent promised. Robert Martin, Uncle Bob, made this the L in his SOLID framing, and he defends it in Solid Relevance on his blog. His way of saying it is that a subtype must be substitutable for its base type. Same idea, different words.

It sounds obvious. Everybody nods in the meeting. Then everybody violates it on Tuesday.

The story

Years ago I worked on a system that processed payments from several providers. The design looked clean. There was a base class, let me call it PaymentProvider, with a method to refund a transaction. The method returned a refund confirmation object. Every provider had its own subclass. The rest of the system did not care which provider it was talking to. Beautiful, right? That is the whole point of inheritance.

Then we added a new provider. This provider did not support instant refunds through their API. Refunds with them were a manual process on their side, taking up to two days. The developer who integrated it, under deadline pressure like all of us, made a decision that looked small. The refund method in the new subclass did not actually refund. It created an internal ticket for the finance team and returned a confirmation object anyway, with a fake confirmed status, so nothing upstream would break.

A child type that breaks the boundary its parent promised — The child keeps the signature but breaks the promise.

Read that again. The subclass kept the method signature and broke the promise. The parent said “when this returns, the customer got the money back”. The child said “when this returns, somebody will look at a ticket eventually”.

Nothing failed in testing. The types matched, the compiler was happy, the unit tests for that class passed because they tested the ticket creation. The code shipped.

Three weeks later customer support started getting angry emails. Customers were told by our own system, on the screen, that their refund was confirmed. The email confirmation went out too, because the upstream code trusted the confirmation object. The money arrived days later, and for a few customers, after a holiday, even later. Some of them filed complaints with their card companies. Each complaint had a real fee attached, and each angry customer was a customer we paid marketing money to get.

The debugging was painful in a special way. We were searching for a bug in the refund flow, and the refund flow had no bug. Every class worked exactly as written. The bug was in a broken promise between a parent and a child, and no stack trace points to a broken promise. It took two developers most of a week to even understand what was happening, because the system was lying with a straight face.

What LSP would have told us

The Liskov principle, taken seriously, would have stopped this at design time. The moment someone says “this subtype cannot really do what the parent promises”, the answer is not to fake it. The answer is that this thing is not a true subtype. Maybe the abstraction is wrong. Maybe we needed two concepts, instant refund and delayed refund, and the upstream code needed to know the difference, because the customer experience is genuinely different.

That is the part people miss. LSP is not a rule about class diagrams. It is a rule about honesty between parts of a system. When every subtype keeps the promises of its base type, you can reason about the system locally. You read the calling code and you know what happens, for every implementation, including the ones written next year by someone you never met. When subtypes lie, you have to read every implementation to know what your own code does, and nobody has time for that. So nobody does it. So bugs ship.

The money angle

Let me translate the principle into the language I care most about, which is cost.

A LSP violation is a bug that hides from the type system, hides from most unit tests, and waits for production. That means you pay for it at the most expensive possible moment, with support tickets, with engineers context switching into panic mode, and with customer trust, which you cannot buy back at any price. In our case, one shortcut that saved maybe two days of design discussion cost weeks of cleanup and a group of upset customers who lost trust in the product.

Compare that with the cost of respecting the principle. It is mostly the cost of one uncomfortable conversation. “This provider does not fit our abstraction, we need to change the design.” That conversation feels expensive when the deadline is close. It is the cheapest thing in this whole story.

This is a principle I always keep in mind when I review a pull request. When I see a subclass throwing NotSupportedException, or returning empty results where the parent returns real data, or quietly doing something different than its siblings, I stop everything and ask about it. Because I have seen what the alternative costs, and the company should not pay that bill twice.

Keep your promises in code like you keep them with people. That is the whole principle.

Pax et bonum.