2016-03-26

Micro Libraries and Monoliths

The internet is abuzz with the kik controversy and the 11 lines of code from left-pad that, when removed, 'broke the internet'. This whole thing is fascinating to me, and I feel compelled to talk about it.

Kik is a registered trademark, and trademarks are a thing for a reason. Also, companies are required to defend their trademark or they will lose it. I'm not a lawyer, so that's all I can say with certainty. Now, you are not at all related to Kik and you make an NPM package called kik. Then you are asked to rename it because Kik wants their name. How would you respond? Personally, I'd recognize that I'm in over my head already and just go ahead and pick a different name. The legalities of the situation are beyond my comprehension, but I have a strong feeling they are not in my favor. That's not what happened, though. The real kicker? None of this would have happened if packages were namespaced under their owner's name (what I consider to be 'sane'). "someperson/kik" and "Kik/kik" could have happily coexisted and nobody would have ever complained.

The end result is that NPM gave Kik their name, and the other guy reacted by removing all his code from NPM. Of that code was left-pad, an 11 line snippet that is now duplicated everywhere on the internet and probably etched in stone somewhere, so I will refrain from perpetuating it here as it isn't too impressive. What is it? It's a micro-library that lets you pad the left side of a string. By removing it from NPM, it broke the build of several projects that directly depended on it and caused a resonance cascade from the further indirect dependencies. Everything that depended on it in some form or fashion broke. Until this point most people didn't even know it existed or that they depended on it.

The whole situation has sparked several discussions on the various parts in play. The way I see it there are three main discussion points:

  • Who was in the right, Kik or the other guy?
  • Is it a good or bad idea to have so many dependencies like this?
  • Should left-pad really even have existed in the first place?
I'm more interested in the last two. Let's start with micro-libraries. There's an excellent GitHub comment about the benefits of micro-libraries that basically sums up my own thoughts on the subject pretty well.
People get way too easily caught up in the LOC (Lines Of Code). LOC is pretty much irrelevant. It doesn't matter if the module is one line or hundreds. It's all about containing complexity. Think of node modules as lego blocks. You don't necessarily care about the details of how it's made. All you need to know is how to use the lego blocks to build your lego castle. By making small focused modules you can easily build large complex systems without having to know every single detail of how everything works. Our short term memory is finite. In addition, by having these modules as modules other people can reuse them and when a module is improved or a bug is fixed, every consumer benefits.
The difference between a 'normal' library and a 'micro' library is scale, but there isn't much agreement on what that scale should be. The general consensus, at least, is that it should not be at the scale of a single function. Let's consider left-pad: my first thoughts are, what about right-pad? Middle-pad? Vertical-pad? Shouldn't this really be string-pad which contains several functions including left-pad and right-pad? I want the level of granularity to be more than just a single function. The fact that left-pad exists and was used in so many projects is mind-boggling to me. It doesn't make sense to exist on its own without other string-related functions to go with it. It doesn't make sense to be used on its own when you could easily replicate it better yourself in less time than it takes to find it. I find myself wishing it to be part of a larger collection of functions.

But then I see huge monolithic monstrosities like Boost, a massively bloated project with dozens of actual libraries that should really be independent projects in their own right but which are instead clumped together and treated as one. Because they are grouped together they have a tendency to depend on each other rather strongly, making it hard to get what you want without pulling in two or three other components. They actually had to create a special tool to let you isolate the stuff you want because of how non-trivial it is. It also takes a long time to compile when you just want to use a single function from it. That doesn't sound good to me either, and to be honest, I think I'd rather take left-pad.

But what about the language? They say PHP is a fractal of bad design, but at least it actually was designed. JavaScript was made in such a short time that I find it hard to believe any real thought went into it at all. In JavaScript, design is an overstatement. Worse, fixing problems in newer versions  of the language doesn't really work because everyone still wants to support older browsers anyway, which as a result requires sticking to older versions of the language. The whole mess has resulted in the necessity for entire packages just to determine if a value is a positive integer. You may scoff at that, because in any other language it would be trivial, but it's downright dangerous in JavaScript and the chances you will make a mistake when doing it yourself are high enough to warrant relying on external code.

The problem is scale. Why does an entire package need to be fully dedicated to just seeing if a value is a positive integer? Couldn't you fit in other things like, you know, seeing if a number is a negative integer? Seeing if a number is equal to or close to 0? My point is, splitting out such closely related questions into such separate answers is unhealthy because it results in more dependencies. More dependencies results in more trust, and more trust results in more risk.

It's clear there are two ends of the spectrum: massive all-in-one megalibraries like Boost which aim to provide everything you could ever want or need in one neat package whether you like it or not, and a gazillion microlibraries all owned and operated by different people to perform simple tasks that are trivial enough to be one liners but complex enough that you could screw it up if you did it yourself. Thankfully, I think most libraries fall in the middle of the spectrum and that most people would agree the middle is better.

So if most people would agree the middle is better, why do so many libraries still go the extreme route? Well, micro-libraries are easy to write, easy to reason about, and easy to use. Seems like win-win-win! We should totally all start making micro-libraries that are just one function with no more than 11 lines of code! Except you're missing a huge glaring problem: the dependency. This entire incident highlights the catastrophic flaw of removing one block from the jenga tower and watching it collapse. Sure, it can be repaired, but only if you notice it.

There are other more subtle ways to do bad things to a jenga tower. You could add malicious code to one of the blocks, or change the behavior of one of them in a not-so-obvious way. As long as it doesn't break builds, you can do a lot of damage without anyone even noticing. Remember when I mentioned trust and risk?
In addition, by having these modules as modules other people can reuse them and when a module is improved or a bug is fixed, every consumer benefits.
So what happens when a module is made worse or a bug is added? What happens when malicious code is introduced? Micro-libraries impose a problem caused by their sheer number. Where you can easily examine the practices and behavior of the maintainers of a single library like Boost, it becomes an overwhelming task to look at each and every developer of each and every micro-library, and so on for each of their dependencies recursively. It takes a lot of trust to use so many dependencies from so many different people.

Each micro library is typically maintained by a single person, whereas each megalibrary is typically maintained by many people. If we say that most people are good people, then megalibraries are generally immune to evil code because the other project members can spot it and stop it. But microlibraries don't have a self-check, they just have a single person to do with them what they please, and as we have already seen it only takes one.

At the end of the day, though, microlibraries aren't in general a bad idea, and neither are megalibraries. I just think we could avoid the problems of both with middle-sized libraries, even if it means we partially sacrifice some of the benefits. So I leave you to answer the non-trivial question of how related various things need to be to be in the same library and how unrelated they need to be to be in separate libraries. I'm sure ninjas can totally answer that.