Solving problems recursively is often very convenient and elegant, but many people find recursion to be less intuitive than plain iterative looping. Many schools teach programming by focusing primarily on solving problems and implementing algorithms using a procedural style of programming, and it takes a bit of leap to go from C or Java to a language like Scheme, which uses recursion rather than for
or while
loops.
My article on callback functions worked out pretty well and got a positive reaction, so I’m going to continue with the theme of programming/comp-sci tutorials and write out an introduction to tail recursion, how/why it works, why we use it, and how to translate regular recursion into tail recursion.
Overview of tail recursion
One of the most important forms of recursion is tail recursion. A function is said to be tail recursive when the last thing it does is call itself. Here’s an example, using C-like pseudo-code:
int foo (int n) { print n; if (n == MAX) { return something; } else { return foo(n+1); } }
This simple code takes a number and increments it until it reaches MAX, then returns something. Earlier I said that a tail recursive function has the recursive call at the end, and here (for sake of clarity) I wrote it as the last thing in the function, but strictly speaking it doesn’t have to be the last thing written in the code; it needs to be the last thing that gets run (not counting the base case, or the case that stops the recursion).
It would have been easy to write the above function using a loop, but many complex problems are easier to express in terms of recursion. Furthermore, some languages, especially functional programming languages, have little or no support for traditional loops (if you cannot change the value of existing variables, how can you have a control variable for a loop?). Fun fact: notice that the above function never explicitly changes the value of any variable. The expression n+1
does not change n
.
If your compiler or interpreter is clever enough, it can optimize tail recursive functions so that they behave like a loop in a procedural language. To understand why this is significant, we need to know how function calls work.
Functions and the call stack
Imagine the computer keeps track of a list of the functions you’ve called. But it’s not an ordinary list: when you call one, the computer makes a note of it (and also jots down the local variables of the function and the other little details it needs to remember), and when that function returns it is removed from the list. What I’m describing is of course a stack, and the things put on the stack are called activation records. Activation records contain state information about different procedures, and the stack they’re put on is called the call stack.
Regular recursive functions grow the size of the call stack. Each time the function calls itself, another entry has to be pushed onto the stack. Here’s an example:
int factorial (int n) { if (n == 1) { return n; } else { return n * factorial(n-1); } }
Imagine you’re a computer trying to run this code. The function is called, you set n to whatever number was passed. The only thing the function has to do before returning is multiply n with the result of factorial(n-1)
, assuming n is greater than 1. This may look like a tail recursive call, but it isn’t. The computer has to evaluate n * factorial(n-1)
, and in order to do that it must evaluate factorial(n-1)
. For now, let’s treat the call to factorial(n-1)
as a black box: it just returns a value. We need to wait for that call to complete so we can multiply the value it returns with n. So, the last thing done here is actually the multiplication, not the recursive call.
This is more obvious when the code is written in Scheme:
(define (fact n) (if (= n 1) 1 (* n (fact (- n 1)))))
So, every time this function runs, it has to keep all the old information around (such as the value of local variable n) when it does the recursive call, because it needs all that information afterward. This means that the call stack must grow on every recursive call, because the new activation record is pushed onto the stack on top of the current one.
What if the recursive call was the last thing we did? We wouldn’t have any use for the old information. Because there’s nothing to do after the recursive call, we can throw out all of our old information and simply run the function again with the new parameters (imagine running functions in sequence, rather than nested inside each other). This means that we don’t need to add a new entry to the call stack, which means better performance and lower memory usage.
Visualizing functions
Another way to illustrate this is to think about what the computer needs to compute when calculating a factorial using this method. Here it is in Scheme:
(fact 5) (* 5 (fact 4)) (* 5 (* 4 (fact 3))) (* 5 (* 4 (* 3 (fact 2)))) (* 5 (* 4 (* 3 (* 2 (fact 1))))) (* 5 (* 4 (* 3 (* 2 1)))) (* 5 (* 4 (* 3 2))) (* 5 (* 4 6)) (* 5 24) 120
It would be similar looking in a C-like language:
factorial(5) 5 * factorial(4) 5 * 4 * factorial(3) 5 * 4 * 3 * factorial(2) 5 * 4 * 3 * 2 * factorial(1) 5 * 4 * 3 * 2 * 1 5 * 4 * 3 * 2 5 * 4 * 6 5 * 24 120
Tail recursion (when it is supported by the language or the compiler) allows us to avoid this. Earlier I said tail recursion means the function calls don’t expand the call stack, but it also means the function calls are not nested. This means the function doesn’t need to hang around until after its recursive call has completed.
Here’s an example of a tail recursive factorial function, written in C-like pseudo-code:
int factorial (int n, int p) { if (n == 0) { return p; } else { return factorial(n-1, p*n); } }
And the same idea expressed in Scheme:
(define (factorial i prod) (if (= i 1) prod (factorial (- i 1) (* i prod))))
We can use the same method to visualize the computation of the factorial of 5.
factorial(5,1) factorial(4,5) factorial(3,20) factorial(2,60) factorial(1,120) 120
As you can see, the lines didn’t have to get longer on each iteration like they did with the non-tail recursive function.
Accumulators
What do you do if you already have a recursive function and want to turn it into a tail recursive function? The simplest way often involves the addition of an accumulator, or a parameter that has an operation done to it on each iteration, like the ongoing product of the factorial in the above example. Then, on the base case, we return the accumulator.
The basic idea is that we need some information to persist between function calls, and the easiest way to do that is to pass it around as a parameter. So, each individual function call doesn’t need to know anything about the other ones: factorial(4,5)
doesn’t care where its 5 came from, it just knows to call factorial(3,20)
.
Here’s another basic example. We’ll write a recursive function for finding the sum of all numbers from 0 to n.
int arbitrary (int n) { if (n == 0) { return n; } else { return n + arbitrary(n-1); } }
We can add an accumulator to this function to make it tail recursive.
int arbitrary (int n, int acc) { if (n == 0) { return acc; } else { return arbitrary(n-1,acc+n); } }
Of course this is not the only way to write tail recursive functions, and many times you’ll have to make more significant changes to your logic if you want to re-write your functions to take advantage of tail recursion.
How it works
How does the computer do all this? There’s no single answer to that, but I can describe a general approach that a computer might take.
Tail recursive functions are often easy to write as loops in a procedural language. Imagine that, instead of running the recursive call, you look at the parameters, set the local variables of your function equal to them, and jump back to the top. Something like this:
int arbitrary (int n, int acc) { while (true) { if (n == 0) { return acc; } else { // make copies of n and acc before changing them n2 = n-1; acc2 = acc+n; n = n2; acc = acc2; } }
Tail calls
As I said, that’s only one way of implementing tail recursion. There’s another, more general technique usually used in implementations of functional languages. It’s called tail-call elimination, and it has an added benefit of working for any number of mutually recursive functions, rather than a single function that calls itself recursively.
To give you an intuition about how it works, imagine a machine that is executing a simple program. The program is a list of machine instructions which are grouped into procedures. When a procedure is invoked, the computer saves its state and “jumps” into the procedure, and when the procedure finishes (invoking a “return” instruction), the machine will restore its old state. This corresponds with the pushing/popping on the stack that a C program would do when calling functions.
Now, imagine what happens when there’s a procedure call at the very end of a procedure (right before the “return”). When the machine reaches that call, it pushes its state and jumps into the new procedure. That procedure finishes, pops off the saved state (jumping back to the old procedure), then immediately hits another “return” instruction, causing another pop! So, if we have a procedure call right before a return, it’s redundant to save the state beforehand, since it will be popped immediately before the return anyway.
Avoiding this redundant stack push involves changing the procedure call statement into a jump (in x86, “call” statements push the current instruction pointer on the stack, but “jmp” statements don’t). This has a side effect of turning all tail recursion into loops (directly, using goto/jmp), and even doing the same to groups of mutually recursive functions.
Wrap-up
I only covered the basics of this topic. Hopefully by now you should be able to take advantage of tail recursion optimization in compilers that support it (as well as in languages that guarantee it, like Scheme). It’s not universally supported — Java, as an example, doesn’t support it — but when it is it can be very beneficial.