TEB's Blog


On stack growth direction

I was working on making a llvm ir to java bytecode translator, and found myself in need of a virtual stack. Now most cpu’s grow the stack downwards, which means that pushing to the stack decrements the stack pointer. But why do they do so? And would growing it upwards (there are a few systems that do that) have any advantages?

From what I can tell, there isn’t actually any clear and concise answer. Many attribute it to sheer convention, but there are some minor differences. I thought I’d write a summary of what I found, to congregate the information on this topic.

Whilst I’m not an expert on embedded systems (where the minor differences become at least somewhat apparent), I’ve attempted to compile a list of all such difference. I’d be really interested to hear about any other details or examples I missed, don’t hesitate to contact me <3

Note: this article assumes the reader knows what a stack is, and has a rudimentary understanding of cpu architectures.

Convention

This is the least interesting explanation. In the early days of computing, memory was limited. You wanted to make the most of it, whether you used the heap or the stack. Putting the heap and stack at either ends of memory allows you to grow both into the middle. This means that a program that does mainly stack allocations can grow the stack size to take up all the memory it wants, whilst keeping the heap size small. Or a program can keep stack size small and grow the heap.

Of course this argument goes both ways. You can swap heap and the stack, and the strategy still works. But it means that one of them necessarily needs to grow downwards.

The argument here is that the stack was arbitrarily chosen to occupy the high memory regions, and that this convention just stuck around.1

Note that this is no longer relevant in modern computers. We now have multithreaded cpu’s, which need multiple stacks for each thread.2 Growing from two sides of RAM no longer works if you have more than two things growing. Luckily, modern cpu’s accommodate this with their larger address space and use of virtual address translation.3 If backwards compatibility and existing standards weren’t a concern, you could now easily make both the heap and stack grow upwards. That doesn’t mean it’s a meaningless choice though, there are some other small details to take into account. Let’s get into those.

Where does the stack pointer point to

What happens if you access the memory at the stack pointer? In a downwards stack, you’re accessing the last value you pushed. In an upwards stack, you’re accessing the next value. This is a fundamental asymmetry between the two. Neither are bad per se. In a downwards stack you would have to decrement the stack pointer and then write, for an upwards stack you’d write and then increment.

Dedicated PUSH and POP instructions are a bit more awkward in an upwards stack. In a downwards stack you could PUSH and then operate on the value you pushed, since the stack pointer will be pointing to it. With an upwards stack the same PUSH instruction would leave your stack pointer pointing to uninitialized memory, and any operations on it needs to be offset. Really you’d want to write to the address, do your math on it, and then move the stack pointer, which doesn’t work well with instructions that combine writing to and moving the stack pointer.

An advantage of an upwards stack is that you can write to it immediately without needing to do math on the stack pointer. This may be relevant on niche systems which don’t have stack-based instructions built-in. Let’s examine a cpu architecture without a jump-subroutine instruction. If you have a downwards stack you must first move the stack pointer, then write your current address (for the subroutine to return to), and then jump. In an upwards stack, you can instead do the “move stack pointer” part at the start of each subroutine. So now you’re writing your current address, jumping to the subroutine, and then moving the stack pointer. This is the same amount of instructions, but since a single subroutine will have multiple call sites, this saves space.4 A leaf function might get away with not moving the stack pointer at all.

Whether any of this has practical effects depends on your cpu architecture. Most allow for addressing modes that just contain the needed offsets. It’s also highly dependent on how your stack is actually being used. It’s a question of if you’ll make more use of the next allocated space, or the last allocated value.

It is of note that you can choose some esoteric frankenstein options: a downwards stack with the stack pointer pointing to the next available address, or an upwards stack that points to the last pushed values. The downwards franken-stack has the same properties as a regular upwards stack, and vice versa. What matters is whether they point to the next available address or the last pushed value. But franken-stacks come with an additional downside. Instead of each pop/push being self-contained, an upwards franken-stack needs to remember the size of the previous allocation in order to properly increment the stack pointer. Similarly, a downwards franken-stack needs to predict the size of the next stack allocation in order to make enough room. As such, these esoteric stacks are simply a plain downgrade.5

Pointer offsets

This is one you see quite often, and is once again more of a historical artefact. If your stack grows downwards, any offset into values placed on the stack is always positive. This is relevant because most architectures allow you to add some (constant) offset to any pointers you dereference. In older architectures, this offset was sometimes limited to be unsigned/positive. Thus, a downwards stack allowed you to more easily reference things on the stack.

This might have been a contributor to popularizing the downwards stack. Modern architectures allow for a wider range of offsets, so this is no longer relevant there.6 For embedded systems it may still be a factor.

Buffer overflow

This one is actually an argument in favor of an upwards stack. It’s mentioned quite often that upwards stacks are more resilient to buffer overflows. This would make sense, arrays are always indexed with a positive offset7, no matter the direction of the stack. If you were to write past the end of an array on an upwards stack, you’d be writing into memory that’s not being used. If you do the same thing on a downwards stack, you’re going to be writing through all the data previously written to the stack, most notably the return pointer.

Note that buffer overflows are far from gone, even with an upwards stack. After all, it’s unlikely that your buffer is at the top of the stack. More often, you’d allocate a buffer and then call into a different function to fill that buffer. This means you can still overwrite return pointers regardless of stack direction.

Still, there is an argument to be made that you have less options on an upwards stack. Especially when you consider out-of-bound read/write attacks which don’t target return pointers. There are simply fewer variables you can read or write to. For out-of-bound reads on an upwards stack you do get to read past the stack pointer, which may have juicy old stuff.

Dynamic stack alignment

This is an interesting one because I haven’t seen anyone ever mention it. For optimization purposes, the stack tends to be aligned to a multiple of 8 or 16 bytes. This is specified by the ABI, and it’s the callee’s responsibility to align the stack whenever a function is called. Generally the size of anything pushed/popped to the stack is known at compile time, and it’s easy for the compiler to just adjust those allocations by fixed amounts to comply with the ABI. However, in some very rare occasions it is actually not known what alignment the stack has, and it may need to be dynamically realigned.8 It’s actually slightly easier to round down than up. A downwards stack just needs stack_pointer &= 0xFFFF_FFF0, whilst an upwards stack needs stack_pointer += 15; stack_pointer &= 0xFFFF_FFF0. This absolutely does not have any real impact whatsoever, but I thought it was neat.


  1. Some people have argued that this might make intuitive sense because the program counter already counts upwards, therefore the stack should be the one counting downwards. See here and here. This is quite a weak argument though, would it not also be quite natural to have the stack count upwards right after the code section, and have the heap count downwards? ↩︎

  2. Multithreading isn’t the only reason. If you’re running multiple programs on the same cpu you might also need multiple stacks ↩︎

  3. Virtual address translation (wikipedia) means the addresses in your program might not correspond with addresses in your RAM. The OS dynamically gives you segments of RAM. As far as the program is concerned, all addresses are part of one big continuous span, but that span may actually be backed by completely different segments of ram. This completely solves the problem of growing any number of heaps and stacks. The OS can simply grab a segment of ram and use it to expand any heap/stack that may need it. ↩︎

  4. I’ve taken this optimization from this page, published Dieter Mueller ↩︎

  5. The 6502 actually uses a downwards franken-stack. It can do so without issue because all of the push/pop operations operate on a single byte. ↩︎

  6. Interestingly it’d technically be optimal to use both negative and positive offsets, as noted here. You can do so by either placing some local variables beyond the stack pointer, or by adding a constant offset to your stack pointer. The latter has the disadvantage that you now always need an offset, which may have implications on the size of the encoded instruction. The former does see actual use. The area beyond the stack pointer is commonly referred to as the redzone. See this article for some further reading. ↩︎

  7. Or at least, all compilers which are relevant today will generate code that use positive offsets. Of courses there’ll be exotic systems which don’t. Apparently when running Fortran for the IBM 704 it’ll be stored in descending order for example (source, p11↩︎

  8. Here’s a sample Godbolt which uses a special attribute to force it to assume the stack pointer was not aligned properly before the function was called. As you can see the program temporarily stores the original rsp and rbp, then aligns rsp using an and instruction. The rsp and rbp registers are restored before returning. I can’t get Clang nor Gcc to generate and instructions when using alloca, however. ↩︎