Sunday, November 13, 2011

Smuggling data in pointers

While reading up on The ABA Problem I came across a fantastic hack.  The ABA problem, in a nutshell, results from the inability to atomically access both a pointer and a "marked" bit at the same time (read the wikipedia page).  One fun, but very hackish solution is to "smuggle" data in a pointer.  Example:

#include "stdio.h"
void * smuggle(void * ptr, int value){
  return (void *)( (long long)ptr | (value & 3) );
}

int recoverData(void * ptr){
  return (long long)ptr & 3;
}

void * recoverPointer(void * ptr){
  return (void *)( (long long)ptr & (~3) );
}


int main(){
  int a = 90;
  int b = 2;
  int * c = &a;
  void * d = smuggle((void *)c, b);
  printf("The value of a is %d\n", *( (int *) recoverPointer(d) ) );
  printf("The value of b is %d\n", recoverData(d) );
}

The above code outputs:

The value of a is 90
The value of b is 2

So what happened? On a Unix system like OS X or Linux (as well as almost every operating system), memory is aligned to specific addresses. Since a pointer is nothing more than an address, the alignment has an interesting effect on the value of the pointer.

If I run the above code, the integer a has an address of 0x68537518.  This isn't set in stone, and will be different each time I run the program.  If we look at the addresses in binary it looks like this:

0110 1000 0101 0011 0111 0101 0001 1000

Run the code again and we get 0x66601518:

0110 0110 0110 0000 0001 0101 0001 1000

If we look at the address of integer b in the same run we get 0x66601514:

0110 0110 0110 0000 0001 0101 0001 0100

There is a lot of similarity in those addresses, but I want to point out the last two binary bits.  In all addresses, they are zero.  This is because of alignment.  The compiler will align the bits so they always start on a value that is a multiple of four, since integers are 4 bytes (32 bits) long.  This means all integer addresses will always end in 00.

So the smuggle method above takes advantage of that fact.  Since we know the values are always going to be 00, we can replace them with an arbitrary value between 0 and 3.  When we want to retrieve the smuggled data, we only look at the lowest two bits.  When we want to dereference the pointer, we zero those bits out and voila, we have our original address.

In almost all applications, this would be a pretty useless trick, but in the case of the ABA problem, one can provide a couple of flags in addition to the pointer to better describe the value it is pointing to.  This allows instructions such as compare-and-set to atomically access both pieces of data.

Pretty cool little trick, eh?

EDIT:  After a number of comments condemning this practice, I wanted to state that, for the record, you don't want to do this.  There are some corner cases where this is very useful, but generally should be avoided.

12 comments:

  1. NEVER USE THIS IN PRODUCTION CODE. FOR THE LOVE OF ANYTHING THAT IS SACRED IN COMPUTER SCIENCE FORGET THAT YOU KNOW YOU CAN DO THIS AND NEVER DO IT AND SAVE HURTING POOR KITTENS!... kthxbye.

    ReplyDelete
  2. zbowling: I've actually seen it used "in the wild". It is used in QEMU to return a pointer some generated code and a flag to go with it.

    http://git.savannah.gnu.org/cgit/qemu.git/tree/cpu-exec.c

    search for next_tb

    You'll see it used all over the place as I describe above, like on line 594

    It is actually very useful, in the right circumstances. It solves an unsolvable problem, in regards to the ABA problem. The x86 ISA does not have a Load-Locked/Store-Conditional instruction like ARM or PowerPC, and this trick is one of the only ways to get around that hardware limitation.

    ReplyDelete
  3. And this is why the javascript standard specifies that integral numbers are only 31 bits long...

    ReplyDelete
  4. No, Adam, the reason it says they are 31 bits long is due to the pervasiveness of 32-bit architectures when the standard was written. The last bit is used for signage...

    ReplyDelete
  5. @Adam I sincerely hope you are joking. The restriction is almost certainly due to the pervasive 32-bit word size used in most modern processors. Since javascript does not support unsigned integer, 1 bit must be reserved to indicate if the integer is positive or not.

    ReplyDelete
  6. Yeah, Apple used to do this on 24-bit-address 68000 Macs. They got burned when that code had to move 32-bit addresses on the 68020. Lots of tyros thinks the world's an x86, that x86 alignment and endian-ness and other quirks are universal, but every time you make that assumption you're pissing on programmers who write code for non-x86 devices. You probably use some of those devices yourself, so the extra development "drag" affects you too. Write code that will work on other architectures, instead of this "neat trick" that breaks stuff.

    ReplyDelete
  7. More comments on Hacker News: http://news.ycombinator.com/item?id=3233156

    ReplyDelete
  8. Dear b4c87dfa-0eb9-11e1-a847-000bcdcb2996,

    I'm afraid it is you who is entirely confused and incorrect here. What Adam said is completely correct. Here it is right from the horse's mouth, from the V8 (Chrome's Javascript VM) source:

    http://code.google.com/p/v8/source/browse/branches/bleeding_edge/src/objects.h#123

    Note that "smi" is a small integer, and it is represented by a 31 bit signed value with the LSB set to zero. The sign bit is included in the 31. This is called tagging and it means that by checking the LSB of an "Object *" you can immediately tell whether it's a small integer or not without having to de-reference it. This is precisely what Adam was referring to in his comment.

    ReplyDelete
  9. @Platypus: Apple was kind of doing the opposite on 68000, though -- they were stuffing flags into the *most* significant bits, not the *least* significant bits. This happened to work on the 68000 because it only had 24 address lines, so the upper 8 bits of an address were silently ignored on all memory access. Apple's public APIs generally handled this cleanly; where the trouble started was with some third-party applications that manipulated the flags directly, causing havoc on true 32-bit systems (which no longer stored flags there).

    Storing data in the least significant bits of a pointer is less of a big deal, as it's something that you have to explicitly strip off whenever you use the affected pointer. It's still ugly, but at least it's a well-defined kind of ugly.

    ReplyDelete
  10. > The x86 ISA does not have a Load-Locked/Store-Conditional instruction like ARM or PowerPC

    No. The x86 has cmpxchg8b, which does exactly that. Newer ones have cmpxchg16b to compensate for larger pointers on 64-bit machines.

    More realistic though, your ISA should tell you whether or not those assumptions hold.

    - If you have a pointer to a char array in the middle of a struct, it's not going to be 32-bit aligned, so your pointer would get mauled.
    - If you have an ARM on which thumb code is executing, those last bits aren't 00 but 10. If you change it to 00, the software will just crash.

    Please stop abusing pointers; there are proper ways to do what you're trying to hack in.

    ReplyDelete
  11. @Candy: cmpxchg is not the same as LL/SC. It is a CAS instruction which is subject to the ABA problem. Read the wikipedia page I link to and http://en.wikipedia.org/wiki/Compare-and-swap#ABA_Problem to understand what I mean.

    In case you don't want to read it, here's the difference: CAS (cmpxchg) will only store the value if the comparison returns true. LL/SC will only store the value if the contents have not been updated. So in the ABA problem, the contents of the address are initially A. They are changed to B and back to A again. CAS won't care that the value was changed, because the values it is comparing (A and A) match. LL/SC will care that the value was changed, because if it is at all updated between LL and SC being called then SC will fail.

    It's not a hack. It's not an abuse of pointers. This is an established solution to solving the ABA problem on x86. There are plenty of other valid uses. Read the hacker news thread for both good (and bad) examples of using this technique.

    That being said, this should only be used if necessary. There is almost always a better way to do it. But in the rare corner cases where a tagged pointer is required, it does the job.

    ReplyDelete
  12. Apple uses something like this in their 64-bit Obj-C 2.0 runtime on Lion:
    http://objectivistc.tumblr.com/post/7872364181/tagged-pointers-and-fast-pathed-cfnumber-integers-in

    ReplyDelete