My place in space

Magical initialization in for loop

Nov. 30, 2016, 4:45 p.m.

Recently one of my friends had a strange problem - he wanted to initialize a few variables with a different type in a single for loop. What was even more strange was that he wanted to declare just one of them and both of them have different type.
A declaration of variables is forbidden in the loop for C standard which are older than C99, so let’s assume we are using exactly that standard. One of the questions could be “why is your friend so crazy that he would want to do that” - basically he wanted to create a macro which would declare for him a head of the loop very similar to what STAILQ_FOREACH is doing. He also had a special foo function which would initialize the newly declared function and return some magic value to the argument of function. Let’s assume the foo functions look something like this:

const char *
foo(int *val)
     *val = 5;
     return ("cool text bro");

The first reaction is of course “dude what’s the problem, just use a loop initialization" - the problem is that C does not provide you anything that would allow you to declare and initialize more variables of different types. So the construction below will not work in C, because it will create a new variable val (of type int) which will override variable val of type const char, and we would end up with error that we are trying cast const char * to int.

const char *val;
for (int i = 0, val = foo(&i); i < size; i++)

One of the simplest solutions which I came up with is to just declare an i variable before the loop:

#define some_macro(arg)    \
    int i;                 \
    for (arg = foo(&i); i < size; i++)

Of course a construction like that was not enough for my friend, because he could use the macro only once in a function, otherwise he would get an error which says that he is trying to redeclare the i variable. Furthermore he can’t use a construction which you can see below:

#define some_macro(arg)                    \
    {                                      \
        int i;                             \
        for (arg = foo(&i); i < size; i++) \

Why? Because after preprocessing of code as follows:

const char *str;
    printf(“nice text\n”);

we would end up with code which does not exactly look like what we wanted:

const char *str;
    int i;
    for (arg = foo(&i); i < size; i++)

An the end we end up with some juicy errors. So are we doomed?

Structures for help!
One crazy idea was to use a structure inside the declaration of a loop, (yes you can declare a new anonymous structure inside for loop). Let’s see our new macro:

#define some_macro(arg) \
        for (struct {int i; const char *f;} s = {0, arg = foo(&s.i)}; s.i < size; s.i++)

We declare a special structure and a new variable s of that unnamed type.
One of the parameters is our new i which we want to use around the loop, and the second parameter f which is never used but is useful because it allows us to call our foo function. Because we are using an assignment, not only the arg field will be set but also the f variable will be set to exactly the same value (it just equals to s.f = arg = foo(&s.i)).

So another thing which I learned was that you can use a structure variable in the declaration of the next variable of structure (SIC!), so we are able to use an i variable from structure s to initialize an f variable in the same structure.

This solution is very ugly, but hey it works!
This method was inspired by the post on stackoverflow .

Oh my dear, here is that comma!
I gave my friend a structure solution but I was not satisfied with that it was very very hackish, and I didn't like it. After a few minutes I was stunned. Maybe if we can use that weird assignment maybe we could use a comma assignment as well? Basically if we have multiple expressions separated by a comma the left one will be “evaluated as a void expression” which means that no value will be returned, and the right side of the expression will be returned. So if we would have an expression such as:
int s = bar(), 19 + 20;in C first we would call the bar() function then the result of it would be dropped, and then the result of 19 + 20 would be assigned to an s variable. Worth of notice is that if bar function is very small and don’t change anything compiler can be smart and remove call to it - but let’s assume it didn’t do that. So I tried a macro definition as follows:

#define some_macro(arg)    \
        for (int i = (arg = foo(&i), i); i < size; i++)

Now knowing how a comma works we can go step by step, first we would go into the left side of the expression in the brackets arg = foo(&i), inside function foo to i will be assigned some value, then the returned value of foo function would be assigned to arg variable and that expression would be dropped because it’s on the left side of the comma, it doesn't even matter that arg has a different type than the i variable! The next step is on the right side of the comma expression, on this side we don’t need to calculate anything and we just return i. So the result of the expression in the brackets is i. At the end we just assigned the i variable to i variable which means that we didn’t change anything.

So a careful reader can ask the question is that Undefined Behavior? Do we have any certainty that the left side will be executed before assigning of the right side? It turns out that this behavior of comma operators is well described in standard C and we have full certainty that the left side will be executed before the right side like described in section 6.5 of standard C:
The grouping of operators and operands is indicated by the syntax. Except as specified later (for the function-call (), &&, ||, ?:, and comma operators), the order of evaluation of subexpressions and the order in which side effects take place are both unspecified.
And section 6.5.17 which describes the comma operator:
The left operand of a comma operator is evaluated as a void expression; there is a sequence point after its evaluation. Then the right operand is evaluated; the result has its type and value.95) If an attempt is made to modify the result of a comma operator or to access it after the next sequence point, the behavior is undefined.

EXAMPLE As indicated by the syntax, the comma operator (as described in this subclause) cannot appear in contexts where a comma is used to separate items in a list (such as arguments to functions or lists of initializers). On the other hand, it can be used within a parenthesized expression or within the second expression of a conditional operator in such contexts. In the function call f(a, (t=3, t+2), c) the function has three arguments, the second of which has the value 5.

Anonymous strikes back!
I asked Gynvael to proof read this text, and I was not disappointed :). He point me out (with this link) that there is a tricky way to define pseudo-anonymous variable, and we could even use that method with C89. Compiler provides you special macro __LINE__ which preprocessor will replace with number indicated you on which line of source code it was. We can use that number to distinguish call to our macro.

#define _CONCA(a, b)   a ## b
#define CONCA(a,b)     _CONCA(a, b)
#define AVAR           CONCA(anonymous, __LINE__)
#define some_macro(arg)                         \
    int AVAR;                                   \
    for (arg = foo(&AVAR); AVAR < size; AVAR++)

So when we will call our macro the name of anonymous variable will be suffixed with line of which we did call that macro. You can see that using -E option which will tell compiler to stop just after preprocessing and print current code.


 int anonymous17; for (anonymous17 = 0; anonymous17 < 10; anonymous17++)

 int anonymous22; for (anonymous22 = 0; anonymous22 < 10; anonymous22++)
 return (0);

As you probably noticed we need to use two macros for concatenate, if we would not do that the b argument in macro would be just a text ‘__LINE__’ which would not provide as unique suffix. So for code as below:

#define CONCA(a, b)   a ## b
#define AVAR          CONCA(anonymous, __LINE__)

The preprocessor output would be: anonymous__LINE__, which is not what we wanted.

The only problem with that method is that if you would like to use variable in a body of loop in could be a little hard to guess name of it, this problem you will not have if you will use previous method.

The methods are mostly hackis, so I doubt you will found a lot of code which is using it. If I would have problem which needed from me to use one of those techniques I would use method with anonymous variables if I would need to be compatible with c89, in any other cases I would use the solution which comma. The solution with structures is interesting but it’s really useful only if you would have more than 2 variables of different type, but then you really should rethink if you are doing right thing.
If you know other interesting methods to combine for loop initialization and macro don’t hesitate to share it with me.