2014 m. rugsėjo 8 d., pirmadienis

Bakcward compatibility sins

Maintaining backward compatibility is more of most important values for every software library, tool or system used by other systems via it's API. However, as system evolves, maintaining compatibility gets harder and sometimes it's not possible to improve it in a desired way, because that would mean breaking compatibility. At those points a tough decision has to be made: maintain compatibility or break it.
The list bellow is not complete by any means, but it shows few examples where I doubt that being backward compatible was the right decision. I also add what I think was the right decision and what we can learn from mistakes made.

  1. WinMain [dead parameter in rarely used function]

    In Windows API, the programs entry point is as follows:
    int CALLBACK WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow);
    Note the second argument: it is always NULL!
    The idea is, that this argument had a meaning in 16-bit Windows, but that was completely removed in 32-bit Windows. So, this parameter is in effect meaningless and is here just for backwards compatibility. While that seem to make sense at first, consider that Win16 and Win32 are not entirely compatible! Applications had to be migrated from one to another. And application has exactly one WinMain.
    As a consequence, what you see here is short term backward compatibility (Win16 died quite soon after Win32 appeared) at the cost of long term API pollution. All for something as trivial as application entry point (that can be solved via preprocessor macro).

  2. WPARAM ["hungarian compatibility"]

    In Windows the signature for Window Procedure is like this:
    LRESULT WINAPI DefWindowProc(HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam);
    and the parameter in question is wParam, it's datatype to be exact.
    The point here is that "W" stand for "WORD" in both datatype and parameter name. This was true in Win16 (WORD=2 bytes), but not anymore since Win32 (WORD datatype still is 2 bytes, but WPARAM now is 4 bytes).
    There are two issues apparent here:

    • Hungarian notation is a bad idea and one of the reasons is in front of you (if you haven't noticed - parameter name LIES to you)
    • Generic datatypes (like int) are redefined as something else to make it possible to change them later. This was done in this case too, except that the name encoded the original type before and LIES to us now.
  3. Double.parseDouble(null) vs. Integer.parseInt(null) [bug-compatibility]

    In Java:
    Integer.parseInt(null) // throws NumberFormatException
    Double.parseDouble(null) // throws NullPointerException

    This inconsistency originated from old versions of Java and is kept here for backward compatibility. This is documented behavior, so it's "a feature".
    What this actually is, is called bug-compatibility. The funniest part is that both these methods can throw NumberFormatException, so a fix is quite simple and hardly will break badly something. I mean, if you handle exception properly, it will just work, otherwise you probably have a quite buggy system, one more or less doesn't make a difference...
    Most importantly, these two are very old. Double.parseDouble() dates to Java 1.2, no such number on the other, but probably around the same. YOU REALLY REALLY COULD HAVE FIXED THIS BACK THEN! Instead, Sun maintained backward, sorry, bug-compatibility, just to see the bug getting harder to fix later.

  4. Java generics vs. C# generics [focus on past, not on future]

    Both languages fell into idea of object collections just to find out, that they destroyed a lot of type safety for more verbose coding (lose-lose situation, that is). How did they add generics later, without breaking languages backward compatibility?
    Java went the hard way by turning existing non-generic into generic. They faced three issues:

    • both non-generic and generic collections should be available (so that old code still compiles)
    • convertibility between generic and old non-generic variants (mix old and new code)
    • behavior compatibility (non-generic accepts anything, generic is limited)
    It was easy and correct to default to Object for non-generic collections. Problems arised for generic collections that are more specific than Object. Solution was to make generic argument a syntactic sugar, only available at compile time, that is collection is still like it was before, just casting is auto-added by compiler. This was done, because old (existing) collections always accepted anything, so if an exception is introduced for non-compatible type, an existing code would be broken. Non-generic collection was made convertible to any generic collection (whatever the argument is). That in turn added two new issues:
    • what happens is non-generic is converted to a generic with incompatible argument?
    • how type-safety is controlled in new code, when under the hood is the old non-generic collection?
    Java creators went the easy way in both these cases: they accepted ClassCastException for first and completely forbid direct conversion of one generic collection to other (i.e. List<Integer> can't be directly casted to List<Number>).
    What went wrong here? Four issues:
    • a generic collection can be passed to old non-generic code which is free to insert anything, that will only explode in new code!
    • no generic type can still be casted to other one, even if arguments are compatible, you have to work that around via cast through non-generic collection
    • generic only exists at compile time, no runtime type checking exists
    • you can't forbid new stuff to be generic-only, it still can be used without generic arguments, where Object is assumed
    What they could do instead? Make collections aware of their generic argument and throw exception, when incompatible object is inserted. That would accomplish the following:
    • type safety of generic collection - it will simply never contain incompatible objects
    • casting of references would simply work, as the protection is there at runtime
    • passing generic collection to old code would reveal bugs (object of wrong type inserted) or show invalid assumptions about it ("oops, it's not String-only collection")
    Yes, this approach could break old code. But the alternative, that was chosen, made all new code suck. Looking forward, new code will slowly outnumber old and Java as a language will have inferior generics than it could!

    C# took different approach here. It simply added generics as something completely new, not compatible with old in any way. Not ideal, as interoperability between old and new code is troublesome. But looking forward, old code will die out. So IMO it's a better approach, that that of Java

  5. C++ compatibility with C [not quitting in time]

    So, C++ is designed to be compatible with C, that is "a valid C program is a valid C++ program", as they say... Well, not really for several reasons:

    • the compatibility is lost with the first new keyword introduced (*caugh* class *caugh*) - what used to be a valid identifier, now is not
    • C++ has different linkage because of name mangling, which makes it incompatible with C. Worse, now the C libraries are forced to add extern "C" markers under __cplusplus define, to make themselves compatible with C++
    • enums and structs have tag names in C, but these are real type names in C++
    What could they do? Well, actually they did the right thing, just for far too long. If C++ the goal was to completely replace C, it failed to to that. And it's long past the time to become an independent language and throw some old C junk away (well, you can introduce some constructs to access C from C++, we have so many of them, that few more doesn't really matter).
    What breaking ties with C would achieve:
    • string literals can become real std::string objects with their functionality (like concatenation using "+")
    • arrays can be std::array by default (being assignable is the first win)
    • a lot of standard C library could be wrapped by C++ function that would accept C++ types (imagine printf() accepting std::string)
    • forget extern "C", you could just have something like #cinclude for C headers
Lessons that can be learned

  • A compatibility break, that is almost guaranteed to have a very small impact, is worth to do (WinMain)
  • If you redefine some type via typedef, make new type more generic, so you can change it (i.e. "an integer of size, which is at least X")
  • Hungarian notation is a bad idea, full one is ten times so
  • Bugs should be fixed! A fix that breaks some small importance thing will give you few rants from people, who are the ones to ignore (I can't imagine good developer complaining about fixed bug, even if it broke something in his buggy code).
  • New code or file format will gradually outnumber old one by large margin, thus look forwards, not backwards
  • If you fail to maintain full compatibility, use it as opportunity to break for better future
  • The number of "breaks" doesn't matter, what matter is overall pain introduced by compatibility break. So, if you broke something important, making minor things compatible wont help much.
  • Bonus: it's not possible to maintain backward compatibility forever, plan break in advance and don't give false promises.