2014 m. liepos 14 d., pirmadienis

The lost war against duplicate code

From what I've seen so far, duplicate code is impossible to avoid in any large project. There are multiple reasons, how duplicate code is created and while it is typically assumed, that duplicate code is bad, this is not always the case.

Why duplicate code is bad

  • Duplicate bugs - it's obvious: if bug is discovered in code, the same bug exists everywhere the same code is used, thus there are many places to fix, instead of one
  • Hard to maintain - pretty much the same as previous, but more extended. In particular, you not only fix bugs, but also add features, optimizations and other improvements. Worse is that duplicate code diverges, making it harder to spot.

What "justifies" code duplication

  • Easier to maintain - while we claim the opposite, this one has some truth in it. By copying code written by someone else you are free to change it in any way you want. Changing common code is harder and often requires agreement across multiple involved parties. Bust: it looks so, but it makes code base larger, which in turn makes it harder to maintain.
  • More freedom to change - common code has to remain common, that is you can't add your specific features to it. The biggest problem with this is that it's an organizational issue: if code is duplicated to have more freedom to change it, it indicates a problem with management or company culture.
  • Faster to develop - everything, that requires involvement of multiple parties, takes more time to do. Bust: short term gain, you usually lose in the long run (unfortunately short term gains is what many manager only care about).

How duplicate code happens

  • Incompetence - it's sad, but there are a lot of bad developers. Many of the write code via copy-paste, and, as always, abusing copy-paste results in duplicates. This is what is often assumed when talking about duplicate code and yes, that is what we should fight.
  • Forgot to refactor - this is trickier. It's like the first one except that the developer is actually not bad. It's fine to use copy-paste in order to make things work. The problem is that you have to refactor at the end. Not forgetting to that is the hardest part... There is a gray area between this and the first one. Code review might be an answer to this one.
  • Too much trouble - sometimes avoiding code duplication is more trouble than worth. The place for common code might not exist! Create a library just for couple of functions? Don't forget, that this will bring entire maintenance hell for that library. Also there often is such thing as code ownership and shared code is owned by someone else. In short, we avoid code duplication to reduce problems, not to add new ones. When that is not the case, duplicating code can be acceptable.
  • Created naturally - it's not impossible that two developers might actually write almost identical code. In large projects with a lot of people this does happen and might take a while to find, that two guys of completely different teams wrote almost identical helper function.
So, to summarize, next time before blaming someone for incompetence, have a second thought.