It's all very well to say we could simplify caps management by looking at what is actually supported by real hardware, then bucketing everything into just a few of the more common combinations. But there is an obvious flaw in this logic: we can only examine the caps of hardware that exists today! What if we baked this into our API, then a new GPU with entirely different caps was released tomorrow?
Enter DirectX 10.
DX10 has no caps. All hardware is required to support the entire DX10 feature set*. So caps management is trivially simple.
But we must still support the huge install base of DX9 hardware, right?
Here's the awesome part: DX10 has taken over the PC market, even way down into the low power laptop integrated space, to the point where nobody is designing new DX9 hardware any more. The DX9 chips that exist today are all we will ever have.
The arrival of DX10 has frozen DX9 at a single moment in time. This lets us design APIs based on what combinations of caps exist right now, and be confident these decisions will remain valid in the future. Nice!
* note for pedants: ok, some format support is allowed to vary. But the number of optional features is extremely low.