Ode to Map (Pulling the String) | Letters to an Unknown Audience

letters

to an unknown audience

Ode to Map (Pulling the String)/ # /March 26, 2006

What follows is a brief explanation of what programming is, to someone like me. I am a minimalist, and I see programming as mainly a process of removing unnecessary moving parts and features—not removing the essential behavior, of course, just elements of the code. This is for people who don't know much about programming, but also for people who do.

Most people who have done a little programming have seen the famous for loop. A for loop typically looks something like this:

  for i := 1 to 25 do
    do something with array[i]

That says, "let the variable i take on the values from 1 to 25, and for each value, do something with the ith element of the array." 'The array' is some ordered list of objects, such as blog posts, files on disk, celestial obesrvations, income receipts, or frames of porn video. A loop like the one above would typically be written if you knew the array actually had 25 elements in it. You certainly wouldn't want to do this if there were less than 25 elements in the array, because then the code would end up trying to work with an element called array[25] even though it doesn't exist. So you have to know how many elements are in the array, and you have to keep the first line of the for loop in sync with that.

This example is so deeply embedded in the history and culture of programming that it's hard to imagine that anyone who has studied at least a half a semester of programming has not seen this construct. "Looping" seems to be the simplest non-trivial example of anything in computer science.

Yet, speaking as a professional programmer, I almost never use this form. It's almost never the right thing for my needs and my tastes.

In this example, note that I'm just operating on each element of the array once. The number of operations I do should really always be the same as the number of elements of the array. A more direct way of expressing this would be something that means "Do this operation on each element of the array." This is easy to write down, and most modern programming languages have some standard way to do this. Here's what it looks like in haskell, perl, scheme, python, and ruby, respectively:

  map ( \theThing -> do something with theThing ) array
  map { do something with $_) } @array
  (map (lambda (theThing) (do something with theThing)) } array  # see [1]
  map(lambda theThing: do something with theThing, array)
  (array.map { |theThing| do something with theThing }
  map (fn theThing => doSomething with theThing) array

"Map" is a word inherited from mathematical vocabulary, but I like to think of it as constructing an ordinary sort of map—a streetmap, if you will—from the input array. The streetmap has all the parts of the city, yet only in miniature: by analogy, each element in the result—or each little bit of action that this instruction performs—corresponds to one of the elements in the original array. You can't make a map without surveying the whole city—and it would be some horrible kind of map if you mixed up the streets of the city, repeating some or re-arranging them. In the same way, the "map" command in programming languages makes a faithful map, containing an image of all the parts of the thing mapped, and keeeping them arranged the same way.

[1] The keyword "lambda" in this context comes, oddly enough, from the history of logic, and it still lingers in a couple of today's popular languages, even though it has no everyday meaning that evokes its function. In fact, it has no special function except to highlight the part of your program that assigns a name to the individual array element that's being processed. Since its only function is to call out a specific point in the line, many languages do away with it, such as ruby which uses pipes (|theThing|) to surround the name.

The map form is a tighter form than the for loop, and a minimalist should prefer it. In the original "for" loop, things were kept in order by means of the variable "i". But we never really cared about the "i" itself—we weren't printing it out or storing it anywhere (I mean, when programming, we typically don't). The "map" form dispenses with that distasteful number and expresses exactly what the programmer wants: do something for each element of this list. "Map" doesn't bother with numbers; map simply knows what's what in the array, and makes an image of each part, according to your given code. "Map" is like a child, who can easily turn over every coin on a table without having to count them.

As soon as you get comfortable with the "map" form, you realize just how extraneous the numbers were—the numbers 1 and 25, which are written into the program, as well as the intermediate numbers that "i" takes on. In fact, being extraneous, these circumlocutions begin to seem kind of dangerous: after all, we had to keep track of that limit number 25, and make sure that it always showed the size of the array. If something changed, we might forget to update the 25, in which case the program could be totally broken. Since the numbers can be wrong, why bother with them at all?

Many people may think of programming as a task of logical deduction, and it does call on that skill at times. But in the process of programming, you need to take time to rough things in, to think creatively, make experiments and strange intuitive forays, side-trips which don't have a perfect logic and may not be quite right. When you're finished, though, when you've discovered what you were trying to discover, you need to make it "right" and taut. You need to grab the little string that hangs out of the program and pull it tight so that all the superfluous bits come away, and your program is just what it needs to be. I like to call this the "principle of least shape." When you look at the program, you can see its shape, and the shape helps you understand the program quickly, or else obscures the program, if the shape is too ornate, too complicated. The principle of least shape ensures that you can read the program quickly, and that there are no unnecessary moving parts to break. In the end, the form of the code (in a sense) closely follows its function.

When I do this, pulling the string tight, I nearly always find that the numbers go away. Numbers are not very useful, it turns out!—at least not in programming.

The goal of my brand of programming, and an important goal in language design, is to find these places where programs are too complex, and to find the construct that does just what they need with no extra. In some languages you are constantly putting in unnecessary bits of string (complex shapes) because the language itself relies on unnecessary shapes. Some languages, for example, make it difficult to use "map" as I discussed above, and push you towards using that circumlocution of the index "i".

Forthwith: other examples.

Keep Reading >

Comments

Eventually, someone, somewhere must worry about the indices of the array. Either the language designer who implements the map operation, or the API developer who implements the getEnumerator() for some collection class.

Similarly, someone has at some point needs to concern themselves about the address of the code for your lambda.

The numbers are always there, but the abstraction frees the programmer not to have to think about them. Until you hit a corner case that the programmer of the abstraction didn't think of, and that code breaks, spilling the numbers out into your consciousness again (array index out of bounds exception, or null pointer exception)

—posted by Jim at March 26, 2006 12:08 PM

I don't buy all the fearmongering around "leaky abstractions." Which is more likely, that I make an error in coding up the bounds of the for loop, or that the compiler handles "map" incorrectly? The former by far.

Some abstractions are better than others; some are quite sucky and provide a false sense of security.

The best abstractions are the ones that stay close to the underlying mechanisms anyway. For example, on linked lists there's no need for array indices, and map can be implemented in a quite straightforward way.

I'm not saying programmers should never learn about lower-level systems; just that, once an algorithm is settled and solid, you can often simplify it, removing unnecessary parts, which are likely to cause trouble when you maintain the code. It's easier to make mistakes in 110 lines of code than in 100.

—posted by Ezra at March 26, 2006 1:17 PM

http://ezrakilty.net/ezlog/archives/000904.html

ah!! thanks very much! i ran across "map" for the first time recently in the bowels of combinatorial arrays in comp.lang.python (i recently had a crack at python for the first time -- it's lovely), but couldn't work out from the syntax examples exactly what it was doing. thank you-- you've made it crystal clear.

i call this sort of thing "semi-declarative" or "set-based" code. essentially the data is a parameter to a block of code --arbitrarily sized but iteratively identical in tuple, and self-bounding-- and it's second-nature to anyone with a strong relational background: define a set/predicate, transform it, persist the transformation. in your example languages you've used "map"; in SQL you'd (mostly) use 'update'.

closed-set handling vs arbitrary looping. not a new concept by any manner of means, nor should it stress infrastructure. the original bourne shell had the arbitrarily-sized no-array-numbers construct "for ThisData in ClosedList" from the outset back in, what, 1977?

your "principle of least shape", when taken to its extreme, i call "no moving parts" -- the code is as unmessy as possible and never need change no matter how the inputs ("numbers") vary in future. kinda like a rules-based engine has no moving parts, merely changing values in parameters.

Harry Hawker's (the legendary aeronautical engineer) oft-repeated maxim to his engineers -
"simplify and add lightness"

—posted by Saltation at March 27, 2006 3:31 PM

it occurs to me you might enjoy this: Coding for Fun

—posted by Saltation at March 27, 2006 4:08 PM

Others