Structs and Unions

Structs

A structure (struct) is a programmer defined type composed of fields (or members). The structure definition groups a set of variables under a common name (and a shared pointer). As the Arduino IDE uses a C++ compiler, structures we define are implemented as C++ classes although classes have additional properties that will be explored in a later section of this book. For the moment, we should probably start with an example.

This first example represents an RGB colour, which has three colour components, as a structure. We can start with that as it might be helpful if you get an opportunity to play with an RGB LED.

We could declare the required struct type as follows:

struct RGB {
  byte red;
  byte green;
  byte blue;
};

which creates a sort of template for a structure containing three named byte variables (one for each of the colour components) with a type name of RGB. We could take the conventional path and declare an instance of that type and then set one or more values. Individual members of a struct type can be accessed using “dot” notation.

RGB ledBlue;
ledBlue.blue = 255; // note the . between the instance name
// and the member name

or we could create an instance of that type and initialise all of the values in one go by providing suitable values within curly braces:

RGB ledColour = {255, 255, 255}; // white

Note that any uninitialized members have an indeterminate value so avoid reading them before setting them.

We can declare one or more instance of a structure alongside the struct template:

struct RGB {
  byte red;
  byte green;
  byte blue;
}ledRed, ledGreen;

which declares two instances of the RGB type (named ledRed and ledGreen).

We could also have declared them and initialised one or more instance with code like the following.

struct RGB {
  byte red;
  byte green;
  byte blue;
}ledRed, ledGreen = {0, 255, 0};

We can even declare and initialise an instance of an anonymous struct type. The next struct instance has a name (ledBright) but the struct type from which it is derived has no name – which is why it is called an anonymous type.

struct { 
  byte r;
  byte g;
  byte b;
  byte bright;
} ledBright = {255, 255, 255, 128};

Anonymous structs are fine, provided you are not intending to create any additional instances of that anonymous type or pass one to a function as an argument or define a pointer for the type.

[OK this is C and C programmers can do (almost) anything. Thus we can create new instances of an anonymous type using the C++ typeof() operator. Thus typeof(ledBright) newBright; would create a new instance of that type named newBright based upon the structure of the ledBright instance. Aren’t you glad you are a C programmer?]

Yet another way to create a struct type would be to use the typedef keyword.

typedef struct {
  byte red;
  byte blue;
} TwoColour;

That code creates the named type TwoColour from which we can create one or more instance

TwoColour twoColour = {0, 128};

A struct can be constructed that includes any other data type including char arrays, pointers and even other struct types. Any inner struct would be addressed using a dot notation chain.

struct {
  RGB rgb;
  byte bright;
} newLed;

That struct includes the RGB struct type we started with and we could address the red value with:

newLed.rgb.red = 128;

or an instance might be initialised in one go:

struct {
  RGB rgb;
  byte bright;
} newLed, newNew = {{255, 128, 128}, 255};

The newNew instance is initialised with a set of values for the inner RGB object defined within an inner set of curly braces.

A struct “template” can contain one or more anonymous struct templates.

struct compo { 
  struct {
    byte red;
    byte green;
    byte blue;
  } colours;
  byte bright;
}

with the colours, in that struct, being accessed using dot notation and the anonymous type instance name (colours).

compo newComp;
newComp.colours.red = 127;

You could also write something like:

struct compo { 
  struct {
    byte red;
    byte green;
    byte blue;
  } ;
  byte bright;
};

but as far as I can discern the inner anonymous struct declaration (that has no instance) effectively evaporates (or just becomes transparent) to leave a single struct with 4 named byte members.

I think we have now run out of different ways to declare and/or define a struct. The flexibility on offer rather begs you to make use of it. Grouping related data elements together into logical blocks can only assist in writing clear easy to maintain programs. The fact that a struct corresponds with the assembly language data type of the same name is some assurance that these data objects are implemented efficiently in the compiled code that runs on an Arduino.

A struct is passed by value to a function although if an array of structs is presented to a function as an argument then this will be in the form of an array pointer as you would expect. If a struct contains an array, it is still passed by value to a function but wise programmers will avoid using this mechanism to pass anything but the smallest of arrays as arguments. You can manage arrays as arguments just fine via pointers. Passing an array by value using a struct would probably make a heavy demand upon the available Arduino data memory – nearly always in short supply.

An array of struct types can be defined the usual two ways

RGB myLeds[5];

for a 5 element array of the RGB type or

TwoColour colourArray[] = {{125, 245}, {200, 190}, {54, 87}};

for an initialised 3 element array of the TwoColour type.

You address an individual member of a struct array element using the normal dot notation.

myLeds[3].red = 128;

A struct can be the return type of a function:

RGB getRed() {
RGB newRed = {255, 0, 0};
return newRed;
}

The getRed() function above returns a new initialised RGB struct.

Structure Scope

Struct type instances have the same scope as any other variable type. If they are declared outside of a function then they have global scope. If they are declared within a function or code block then they have local scope. A struct can also be declared as static.

Pointers to a struct

Defining a pointer to a struct works just as you would expect but it is worth noting how members of a structure are accessed using a structure pointer. This little code sample should cover it:

struct compo { 
  struct {
    byte red;
    byte green;
    byte blue;
  } colours;
  byte bright;
};
void setup() {
  Serial.begin(115200);
  compo nextComp;
  nextComp.colours.red = 255;
  compo* comptr = &nextComp;    // defines the pointer
  Serial.println((*comptr).colours.red); // uses the pointer to access              
                                         //the struct instance member
}

Note the brackets around the dereference operator * and pointer name combination before the dot notation used to access the struct member. If you forget the brackets then the compiler will complain that the member is not part of the pointer (which it is not of course it is part of that struct being pointed to). That should remind you to add the brackets to resolve the pointer to a struct that does have the required member.

Arrow Notation

Now you know how to reference a struct member using a dereferenced pointer, I am going to tell you that you will never use that format. You need to know about it so that you will recognise it when you meet it in other people’s code but you are going to use “arrow notation”. Arrow notation can be used to access any member of a struct (plus union or class) from a pointer to that struct. It also looks cool and makes code easier to read.

The last line of the little program above can be changed to:

Serial.println(comptr->colours.red);

With the format being: pointer_name, arrow, member name. Members that are sub-members continue to be addressed using the familiar “dot” notation as above.

Now because we just love pointers we had better have a pointer to a struct member just in case you come across one.

compo* comptr = &nextComp; // creates a pointer to the struct instance

byte* grnptr = &nextComp.colours.green; //creates a byte pointer to a 
                                        //member
Serial.println(*grnptr);          // uses that pointer to access value

byte* redptr = &(*comptr).colours.red;  // creates member pointer via 
                                        //struct pointer
Serial.println(*redptr);         // and again uses that member pointer

As before, we can also access the value of a struct using a pointer and the -> operator. Thus (assuming the comptr pointer from before) we can assign a value to a member:

comptr->colours.green = 33;

Serial.println(comptr->colours.green); // and use the value

Note a * is not required. The arrow operator (->) is used to access the value via the pointer but if you want a pointer to the value then you have to explicitly create that pointer. We could create the byte pointer to one of the inner members with:

byte* blueptr = &comptr->colours.blue;

A pointer to a struct member would allow you to pass a member into a function by reference to allow that function to directly update the individual member. That could result in some very efficient code although it might need some comments to clarify things.

Unions

A union type can be used to store one of a set of values with different data types. The actual memory area storing the value has no distinct type, it is just the number of bytes required to hold the largest type size included in the union “template”. In some respects, a union acts as a buffer. An example would help clarify.

union {
  int tempInt;
  float tempFloat;
  long tempLong;
} tempVar;

declares an anonymous union type instance called tempVar that is just 4 bytes in size. This is the size of the largest included variable type (in this instance two of them are 4 bytes in size).

You can store (and retrieve) any one of an int value or a float value or a long value in the union instance tempVar. However, you would normally have to retrieve the same value type you assigned to it as the union has no “knowledge” of which value type was set. This is a partial weakness and, as we will see, a great strength of the union data type.

You can initialise a union instance when it is declared or you can have a named union type that can be used to declare many instances. Time for a short program.

union {
  int tempInt;
  float tempFloat;
  long tempLong;
} tempVar = {tempFloat: 67.8}; // initialises a float value in tempVar

union vars{
  int intVar;
  float fVar;
};                  // declares a named union type vars

void setup() {
  Serial.begin(115200);
  Serial.println(tempVar.tempFloat);    // displays 67.80
  vars myVar;            // creates an instance of the vars union type
  myVar.fVar = 8.9876;      // assigns a float value to that instance
  Serial.println(myVar.intVar);     // result is ??
  vars myOtherVar = {intVar: 327};  // creates and initialises new 
                                    //instance of vars
}

Note the two ways of setting a value in a union type. There is the {} object notation approach that separates the value type variable name from the new value using a colon “:” and the dot notation used to set the value in myVar in the above code.

If you have run the code (or a version of your own) you will have seen that the compiler was quite happy for us to “use” an int type extracted from the myVar union that had had a value set as a float.

Unions can contain any data type (including structs and arrays) and are passed to functions by value. It is perfectly possible to pass one by reference using a pointer or to pass a pointer to a single data type within a union.

The very fact that the programmer has to keep track of the current value type in a union is usually the reason that many just shrug and move on – perhaps muttering “not much to see here”. In fact, I am going to suggest there is a lot to see here.

Suppose we put a union type inside a struct that has an additional enum value member that does keep track of the current data type in the union.

enum ValueType {INTS, FLOATS, LONGS, UINTS, ULONGS, BYTE};
struct variant {
  ValueType v_type;
  union {
    int intVar;
    float floatVar;
    long longVar;
    unsigned int uintVar;
    unsigned long ulongVar;
    byte byteVar;
  };
};

OK, you might think that is a little profligate when it comes to potential data types but it declares a struct type just 6 bytes long that can contain any of the listed value types. The size of the enum (defaults to the size of an int) plus the largest variable. Note that the inner union is anonymous and has no declared instance name as this simplifies data access.

If a number of these types were needed in a program then a further byte could be saved by declaring v_type as a byte which would be fine to store the maximum enum value of 5. However, I was keen to demonstrate the use of the enum name in the struct as this usage is one of the rare instances when the enum name is used in code.

If we take care to set the enum along with a given value type then we have something rather interesting. We could use one of these structs to pass any of the value types into a function as a single argument. Plus, a function could return one of these types containing any of the possible value types. We could, at the very least, avoid a number of overloaded functions while providing a lot of flexibility.

[I named the above struct a variant type in rueful memory of a similar object that was part of “classical” Visual Basic (not the later .NET VB) that could accept any one of a range of value types. VB variants were themselves inherited from Pascal but were slightly easier to use than the originals.]

Pointers to unions

Just as you would expect a pointer to a union is created in the same way as any other pointer. The code below creates an instance of the composite struct/union, initialises a value, creates a pointer to the struct and uses dot and arrow notation to access the inner value.

enum ValueType {INTS, FLOATS, LONGS, UINTS, ULONGS, BYTE};
struct variant {
  ValueType v_type;
  union {
    int intVar;
    float floatVar;
    long longVar;
    unsigned int uintVar;
    unsigned long ulongVar;
    byte byteVar;
  };
};
void setup() {
  Serial.begin(115200);
  variant arg = {INTS, {intVar: 77}};
  Serial.println(arg.intVar);
  variant* vptr = &arg;
  vptr->intVar = 45;
  Serial.println(vptr->intVar);
}

We can create pointers to our composite struct inner data types using both the dot and arrow notations:

long* lptr = &arg.longVar;

float* fptr = &vptr->floatVar;

with both pointers now pointing to the same 32 bits within the inner union – but only one of them (or neither of them) may be pointing to a properly constructed value.

Let us leave aside the potential of our variant type and return to the union. A union allows us to treat a single chunk of memory as if it contained a number of different data types – that is the whole point after all. There are a number of ways that a programmer might make use of that.

Using overlapping union data types

A simple union like:

union {
long lVal;
byte[4];
} gb;

can be used to access each of the four bytes in a long data type individually.

Why might that be useful? Well we could send each byte sequentially through a serial interface to be reconsituted at the other end as a long integer. The alternative would be to convert the long into a string and to transmit each string character individually and to then convert the straing back into a long at the other end. If the number was (say) 4875289 that would require passing 7 or 8 bytes through the serial interface and would require some process time at each end to manage the conversions. Passing just the 4 bytes between two similar union types would allow faster data transmission and no explicit conversions would be required. Speed and simplicity.

On a related front, different computer systems store the bytes forming an integer in different orders. Intel CPUs and Arduinos for instance, store the least significant bits of an integer in the lowest memory address used by the integer. This is known a little-endian. Motorola microprocessors, many RISC based computers and (in ancient times) IBM 370 mainframes reversed this and are deemed to be big-endian. Moving data between computing environments can require adjusting the byte order of integers and a union type would make that a very straightforward operation (although other methods exist).

Fun bit twiddling

union {
  struct{
    byte b0:1;
    byte b1:1;
    byte b2:1;
    byte b3:1;
    byte b4:1;
    byte b5:1;
    byte b6:1;
    byte b7:1;
  }bits;
  byte bt;
} binb;
void setup() {
  Serial.begin(115200);
  binb.bt = 'a';                 // set the union byte value
  Serial.println(binb.bt, BIN);  //display the significant bits 
                                 //of the byte
  Serial.println(binb.bits.b0);  // display individual bits
  Serial.println(binb.bits.b1);
  Serial.println(binb.bits.b2);
  Serial.println(binb.bits.b3);
  Serial.println(binb.bits.b4);
  Serial.println(binb.bits.b5);
  Serial.println(binb.bits.b6);
  Serial.println(binb.bits.b7);
  binb.bits.b1 = 1;              // set one of the bits
  Serial.println((char)binb.bt); // review the result
}

That little demonstration defined a union that contained a struct and a byte. The struct is created to address each bit in the union independently. The :n modifier (as in byte b0:1; ) applied to the bytes in the struct tell the compiler that we are only using a set number of bits. In this case we have specified :1 which is just one bit but we could (say) use :4 for 4 bits (in ancient times, amusingly called a nibble).

The code sets the byte to the value of char ‘a’ (ASCII decimal 97). The code then displays the byte value via the Serial Monitor using the BIN format. Then, just to confirm, the bits in the inner struct (which shares the same memory as the byte) are displayed individually. The second bit in the byte is then set on (incrementing the byte decimal value by 2) and then the byte value is sent as a char to the serial monitor where it is shown, as expected, as ‘c’ (ASCII 99).

The Arduino environment provides functions for reading and writing bits within any integer type. The bit setting function is called bitWrite() and uses bitwise logical operators but it could have used a method like this.

Operator Overloading

One of the great features we have access to, because the Arduino IDE uses as C++ compiler, is operator overloading. This allows us to define an operator to carry out a specific task when applied to a specified struct (or C++ class).

Suppose we had a struct that represented a three dimensional position – perhaps in a game. It might need position values for the horizontal, vertical and depth dimensions. In school maths lessons introducing graphs, these values would often be called x, y and z. If the game involved movement, then a “vector” might be applied to those three dimensions to effect a given location change. Both the position of our notional game object and the vector could be represented by a struct with 3 int variables. Having an operator to add two such struct types together would combine an initial position with a movement to arrive at a new 3D location.

We can define our own + addition operator and apply it to the struct definition:

struct Position{
  int x;
  int y;
  int z;
  Position operator+(const Position& p){
    Position pos;
    pos.x = this->x + p.x;
    pos.y = this->y + p.y;
    pos.z = this->z + p.z;
    return pos;
  };
};

The variable named this in the above code is an inbuilt pointer to the current struct instance. It is always available to you and does not need declaring.

If you also wanted to use += in your code for this struct then += would need to be defined as a separate operator just like + was. It does not come for free.

We have seen how to add an operator to an object we have created. There is a very similar mechanism to add an operator to a pre-existing struct, maybe supplied by a library. In such a case we would be unwilling to make changes to the library code itself as this could be a long-term maintenance issue. The following code demonstrates the alternate format by adding a ++ operator to our Position struct. Not terribly clear if you would use one in this context but a good demonstration of the technique.

inline Position operator++(Position& p) {
  p.x++;
  p.y++;
  p.z++;
}

The inline keyword is used to reduce the overhead in calling the related function. The C++ compiler inserts the required code where (in this case) the ++ operator is used. This can reduce the size of the resulting machine code after compilation.

The following program represents a pretty minimal test of these two new operators (well new in the context of our struct) and demonstrates the two ways of applying an operator. Starting with those operators.

struct Position{
  int x;
  int y;
  int z;
  Position operator+(const Position& p){
    Position pos;
    pos.x = this->x + p.x;
    pos.y = this->y + p.y;
    pos.z = this->z + p.z;
    return pos;
  };
};
inline Position operator++(Position& p) {
  p.x++;
  p.y++;
  p.z++;
}

And then the setup() function code to use them.

void setup() {
  Serial.begin(115200);
  Position pos = {23, 45, 46};
  Position vect = {5, 10, 0};
  pos = pos + vect;
  Serial.print("x: ");
  Serial.print(pos.x);
  Serial.print(", y: ");
  Serial.print(pos.y);
  Serial.print(", z: ");
  Serial.println(pos.z);
  pos++;
  Serial.print("x: ");
  Serial.print(pos.x);
  Serial.print(", y: ");
  Serial.print(pos.y);
  Serial.print(", z: ");
  Serial.println(pos.z);
}

It would be nice to tidy up all of those Serial.print() statements and to be able to use something closer to the << syntax of the conventional C++ cout stream that you might have come across in example code online. Well we can add an operator to the print method used by the Serial object to do just that. Add the new line that start “template” and then simplify the setup() function, you just ran, as below.

template<class T> inline Print &operator <<(Print &obj, T arg) {
                                                        obj.print(arg); return obj; }
                                                        void setup() {
                                                        Serial.begin(115200);
                                                        Position pos={23, 45, 46};
                                                        Position vect={5, 10, 0};
                                                        pos=pos + vect;
                                                        Serial<< "x: "<< pos.x<< ", y: "<< pos.y<< ", z: "<< pos.z<< '\n';
    pos++;
    Serial<< "x: "<< pos.x<< ", y: "<< pos.y << ", z: "<< pos.z << '\n';
}

That << is an operator that is going to be used elsewhere in this book and one you might like to add to your own programs when there is repeated output to the Serial Monitor. Notice though that the lines sending values to the Serial class ended with a newline character '\n') to get the same result as the Serial.println() method.

The template phrase needs some sort of acknowledgement before we leave this topic. It is the way that a generic function can be defined in C++. The object T is a placeholder for the datatype that will be used by the function. This is putting the C++ cart a bit before the C horse but here is a quick demonstration of a generic function to return the minimum of two values.

template<class T> inline T const& Min (T const& a, T const& b) 
{ 
   return (a > b) ? b : a; 
} 
void setup() {
  Serial.begin(115200);
  Serial.print("Min of 8 or 9 is: ");
  Serial.println(Min(8, 9));
  Serial.print("Min of 76.9 or 4.5: ");
  Serial.println(Min(76.9, 4.5));
}

You will have spotted that the function Min() is quite happy to accept different data types. What happens if you pass in two strings (char arrays) as arguments? Give it a try and why not use the Print operator from above instead of the Serial.print() statements.

Chapter Review

That’s it, the whole language and we even trespassed a bit into C++. Some might argue that macros with arguments are a fundamental language feature but they are featured through examples in the next chapter. Most of the rest of this book is about applying C to the Arduino environment which provides plentiful opportunities to write and run lots of C programs. It is writing code that will allow a newcomer to C to develop competence and then confidence in their skills.

Code downloads for this chapter are available here.