Smuggling data in pointers

While reading up on The ABA Problem I came across a fantastic hack. The ABA problem, in a nutshell, results from the inability to atomically access both a pointer and a "marked" bit at the same time (read the wikipedia page). One fun, but very hackish solution is to "smuggle" data in a pointer. Example:

#include "stdio.h"
void * smuggle(void * ptr, int value){  
  return (void *)( (long long)ptr | (value & 3) );
}

int recoverData(void * ptr){  
  return (long long)ptr & 3;
}

void * recoverPointer(void * ptr){  
  return (void *)( (long long)ptr & (~3) );
}

int main(){  
  int a = 90;
  int b = 2;
  int * c = &a;
  void * d = smuggle((void *)c, b);
  printf("The value of a is %d\n", *( (int *) recoverPointer(d) ) );
  printf("The value of b is %d\n", recoverData(d) );
}

The above code outputs:

The value of a is 90  
The value of b is 2  

So what happened? On a Unix system like OS X or Linux (as well as almost every operating system), memory is aligned to specific addresses. Since a pointer is nothing more than an address, the alignment has an interesting effect on the value of the pointer.

If I run the above code, the integer a has an address of 0x68537518. This isn't set in stone, and will be different each time I run the program. If we look at the addresses in binary it looks like this:

0110 1000 0101 0011 0111 0101 0001 1000

Run the code again and we get 0x66601518:

0110 0110 0110 0000 0001 0101 0001 1000

If we look at the address of integer b in the same run we get 0x66601514:

0110 0110 0110 0000 0001 0101 0001 0100

There is a lot of similarity in those addresses, but I want to point out the last two binary bits. In all addresses, they are zero. This is because of alignment. The compiler will align the bits so they always start on a value that is a multiple of four, since integers are 4 bytes (32 bits) long. This means all integer addresses will always end in 00.

So the smuggle method above takes advantage of that fact. Since we know the values are always going to be 00, we can replace them with an arbitrary value between 0 and 3. When we want to retrieve the smuggled data, we only look at the lowest two bits. When we want to dereference the pointer, we zero those bits out and voila, we have our original address.

In almost all applications, this would be a pretty useless trick, but in the case of the ABA problem, one can provide a couple of flags in addition to the pointer to better describe the value it is pointing to. This allows instructions such as compare-and-set to atomically access both pieces of data.

Pretty cool little trick, eh?

EDIT: After a number of comments condemning this practice, I wanted to state that, for the record, you don't want to do this. There are some corner cases where this is very useful, but generally should be avoided.

comments powered by Disqus