C language explanations

Regarding this little quiz, here I will write down proper explanations for the snippet of code which does array access using the “2[p]” syntax instead of the usual “p[2]” form. In order to properly understand which is going under the roof, I will try to explain things from the ground up, starting with some basic language issues which may surprise the non-seasoned C programmer.

First of all, what is an array? I bet a coffee someone told you arrays are “containers” which can hold a fixed number of elements of a given type, and that the contained elements may be directly accessed by using their indexes. Conceptually thisis right, but remember this: in C things are how the compiler treats them, not how you think the compiler will treat them (this is axiom number zero of the C programmer). So you definetely want to look at things the way the compiler does… at least sometimes. So let’s rewrite the question: what the heck the compiler thinks an array is? From the language point of view, an array is no more than a memory area capable of holding a number of elements of the given type, so declaring size_t foo[10] will reserve sizeof(size_t) * 10 bytes somewhere. Period. So what is the array name “foo”? It is just a pointer to the first byte of allocated space. The array syntax is handy, but does not really add boilerplate to the core language.

Then, when on writes p[2] to access the third element of the “array”, as with the previous issue, it is no more than syntactic sugar for the *(p+2) syntax. The language already knows that p+2 increases the value of the “p” pointer in the same value os two times the size of the elements it holds. If you wondered why pointer arithmetic works like that now you have the answer: this behaviour allows for directly mapping array syntax into pointer arithmetic syntax.

When I was a lot younger I had to learn that “2+3” is exactly the same as “3+2”: sum is commutative. If we are dealing with pointers, which wouldn’t we maintain that property? The C language designers did, so you can write safely p+2 or 2+p, whatever makes you feel comfortable. This looks natural, so let’s follow a simple chain of transformations:

    p[2] = *(p + 2) = *(2 + p) = 2[p]

And we ended up with the bizarre-looking array syntax I presented in my quiz. I hope now you think it must be some quirk in the lexical analyzer of GCC, or some other captcha. You are dead wrong. If you take a look at a serious standard (read: ISO C99) you’ll find that array syntax is specified in a way that allows writing “2[p]”: you must have a pointer at the left and an arithmetic expression into the brackets. Using C’s type conversion rules, integer literals can be promoted to pointers and pointers can be treated as integers!

I hope you learned something and liked the explanation 😉


2 thoughts on “C language explanations

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s