Seeing ML Clearly

I went to school at Carnegie Mellon. And there, the computer science department is big on research. For system-level programming, they use C. But for almost all theoretical or type related topics, they use ML. In particular, Standard ML.

So since I was really interested in compilers and type-theory, I became very familiar with ML. First how to use it. Then how to use it to make interpreters. How compilers work. And eventually, how to compile ML code. A relative expert in ML.

While at CMU, I was thoroughly trained in the benefits of strongly typed languages, the pitfalls of weakly typed languages, and why static typing can result in more efficient code than dynamic typing. I was also introduced to the idea of typed intermediate languages — that compilers have multiple phases which translate code to entirely different languages, each of which is strongly typed, getting closer and closer to the target language after each phase. In other words SourceLang => IL1 => IL2 => … => ILn => TargetLang. And after I got the hang of it, I thought ML was great! Oh, how I became to loath writing code in other languages. Look! Look how easy and beautiful the code would be in ML.

But recently, for the first time, I’ve had a real reason to write some code in Scheme. Scheme is similar to ML in that it’s functional. But it’s dynamically typed. Moreover, some flavors have “features” in them like dynamic scope that make it very difficult to look at a piece of code and determine whether it will compute gracefully or result in an error. One of the biggest benefits of static typing is that it reveals errors in your programs as early as possible — at compile time. Dynamic typing on the other hand, reveals errors as late as possible. If a branch of code is never taken, you’ll never know whether that piece of code will fail, possibly until you ship your code and your users break it, losing millions (even lives) in the process.

So all through school, I was on one side. I was very very close with ML. But now that I’ve been using Scheme (and also toying with Qi [pdf]), I’ve been on the other side. And now, for the first time ever, I can judge ML for what it truly is. And here’s what I’ve found.

I still think ML’s a great language. Early bug detection and the invariants that are captured in types are so utterly essential to writing correct code that I can’t believe it’s still being done the other way. I literally can’t go writing more than a couple Scheme functions before I have to write down the types in comments, because otherwise, I have to try to hold all this stuff in my head. It’s not something that is in addition to writing the code; types are something I implicitly think about when I create code. When I look at a piece of code, to see if it makes sense I type-check it in my head, in the same way that I execute it in my head when debugging for example.

However, I’ve realized there are a few very powerful abstractions that are missing from ML (or lacking).

  1. Macros
  2. Reader Macros
  3. Unrestricted Execution of Types

…Maybe I’ll go into the details later.

Fitting Inadequate Tools

As an after-thought to yesterday’s post, I remembered Kyle (another person who prefers Emacs over Eclipse) showing me to double loop iterator variable names as in ii instead of i or jj instead of j. This allows you to search for loop iterator variables without results coming up everywhere the letter “i” is used in your file.

What he was basically suggesting was that I change the code I write in order to accommodate the inadequacies of a tool, namely, textual searching.

This does not make sense to me at all. If I had a tool that understood the scoping rules of my language (e.g. Eclipse editing Java), I could select a variable and the tool would show me all the references to that variable, regardless of its name. Granted, this doesn’t help if you’re searching for all the loops that use the variable name, but (again, maybe this is just me) this is not something I ever do. And if I needed to, a regex would handle that (or find whole word).

Changing your work, no matter how slight, to fit your tools is an indicator that you need a better tool. Because basically, you’ve found a pattern. And whenever you find a pattern, you’ve found an opportunity for improvement. Changing your work to fit your tools is equivalent to optimization. You should only do it if you have to. In other words, if it is a bottleneck. And there’s two ways to optimize for a bottleneck: from the top down, or from the bottom up. Doing it from the top down means you must always think about it. Doing it from the bottom up, abstracting the optimization away in a tool, means you don’t have to think about it anymore. It is equivalent to pushing a process down to a lower level in the hierarchy. And the top level is the only level you have to think about.

…This brings up an interesting question. What is above me in the hierarchy? as it would be silly to think I was at the very top.

Tool Power and Why I Use Eclipse

I was talking with Kyle the other day about how with programming languages, when you look up at languages more powerful than the ones you know, all you see is what you already know and the supposedly more powerful languages seem not so great. However, when you look down at less powerful languages, it’s obvious that they are less powerful. I’m not talking about computational power, as all Turing-complete languages can express all the same programs. It’s more about abstractions. Paul Graham describes this more in depth calling it the Blub Paradox.

That is all very well and interesting in itself. But if you accept it, you must also accept its implications. Namely, it makes sense to learn and use more powerful languages.

But it’s not just about languages. It’s about tools. Yesterday at work, I had a discussion with two co-workers about the differences between Emacs and Eclipse as IDEs. Them both being advocates of Emacs, I figured I could get the lowdown on it from them since I only use it occasionally and am not an expert. For writing Java (unfortunately, yes, I must do this at my current job), I prefer Eclipse. But I’m open to using other more productive tools. So I thought to myself, if Emacs is so great, maybe I can find out from one of these guys why, and then perhaps I’ll switch.

Basically, their argument came down to 2 things. First of all, Emacs can be more easily extended than Eclipse which has a clumsy plugin system. And secondly, only Emacs allows you to re-map the key bindings which is extremely helpful for writing code, which is where you spend the majority of your effort compared to other tasks you’d use an IDE for.

However, there’s one feature of Eclipse in particular (there are others too, but I’ll not go into them) that Emacs does not have and neither of the guys I talked to knew of a standard extension out there that already does this. It is the code refactoring features — specifically, renaming variables, classes, and packages.

I admit, any programmer is going to spend 10 times as much of his time on writing code. Renaming things is a rare task comparatively. However, when it must be done in Emacs or editors that don’t support this, you must use regex replacing. And this is both tedious and error-prone. Textual find and replace will always be error-prone since it does not take into account scoping rules of the language.

The fact that Eclipse allows you to do this correctly every time means that the cost of renaming drops to practically zero. This means you no longer have to avoid it, which in this case means you no longer have to plan ahead to try to avoid it in the future. But refactoring and renaming is inevitable. In fact, the more often you refactor the better because it prevents code complexity and messiness from creeping in. But say you wanted to try to think of the right name from the beginning to try to prevent having to refactor it later. It is impossible to know the correct name for a thing when you create it because code is fluid — it never stops changing. Requirements change, goals change, scope changes. And so too must the code. Moreover, as you design a solution to a problem, your understanding of the problem changes. So even if the actual problem doesn’t change, your understanding of how to most efficiently solve that problem will change. Thus, refactoring including renaming is absolutely inevitable to keep the code as close as possible to the model in your mind.

So why should I use an IDE that is oblivious to such things, forcing me to think about more things than I have to. The details which are solely a result of code being expressed in text.

It’s true that such a feature could probably be made as an Emacs extension. I do not doubt this at all. But the fact that it is not already done means that like Lisp over C, or C over machine-code, Eclipse is more powerful than Emacs. And this is obvious looking down the power continuum.

…And as for the whole key bindings thing, personally I almost never find my keystroke rate as the bottleneck. Writing code is usually limited by my conceptual understanding and the translation of ideas into the language I’m writing in. But maybe that’s just me.

So I don’t think the guys I had this conversation with got this out of it, but because of this conversation, I realized that Eclipse is a more powerful tool than Emacs (for editing Java), and I don’t plan on switching any time soon.

Post Post: Since writing this I’ve realized I’ve fallen into the same trap as others. Namely, the trap of not seeing the power of something you don’t understand. I’ve done the exact same thing that users of less powerful tools and languages do when they look at something more powerful. I’ve disregarded the importance of re-mapping key bindings w/o actually getting to know them. It’s possible this is a huge gain. This doesn’t change the fact that Eclipse is more powerful in other respects. So it is really a value-judgment of mine that the cost of typing slowly is less than having to refactor things manually. But I won’t know for sure until I learn and use the more powerful features of Emacs.