Data Structures Part 2

A Generic Queue

The FiFo queue introduced in this chapter is fast and has a minimal memory footprint. Used correctly, it should clean up after itself returning all memory used for later reuse by your program. The only snag with that queue format is that it needs at least some code changes to re-implement another queue for a different data type or structure. It is perfectly possible to write a generic version that can contain any arbitrary data type at the cost of a little extra memory when in use and using a few more clock cycles to complete the tasks of adding and removing queue entries. This may often be a price worth paying, so I am going to explore the creation of a C library (rather than a C++ class based library discussed later in this book) that could be added to any program.

C libraries (like C++ libraries) are normally based upon two files. The first. YourName.h (header file) contains the function prototypes and the second, YourName.c holds the executable code. We will start with the C code and then look at the function prototypes and associated declarations in the header file. We can then knock up a quick test program to try our queue out.

Create a new sketch in the Arduino IDE. Worth making a point of clicking the save icon straight away and naming the program – in this instance I called mine GenericQueue. Then click the “pull down” icon on the tab bar and select the “New Tab” option from the menu that is displayed. You will be invited to enter a file name – go for FiFoQ.h. Then repeat the exercise to create a new tab with a file name of FiFoQ.c. We can the start adding functions to the C file. These are going to seem reasonably familiar so I will comment the changes from the custom FiFo Queue we wrote for the Morse reading program.

First, we need to include the Arduino.h header file and the new (as yet unwritten) FiFiQ.h file. The include statements are followed by two stucts.

#include "Arduino.h"
#include "FiFoQ.h"

struct FiFoNode 
{
  void *data;
  struct FiFoNode* next;
};
struct FiFoQ
{
  struct FiFoNode* head;
  struct FiFoNode* tail;
  int count;
  size_t dataSize;
};

The FiFoQ struct is similar to the SignalQueue struct used before with an added dataSize member (size_t is a synonym for an unsigned int). This value will hold the size of the items that will be added to the queue at run time.

A FiFoNode struct will be added to the queue for each data item added. Its function is to act as a surrogate queue member and to hold a pointer to the actual data that the program wants to store. It is therefore similar to the SignalInt struct used before but rather than holding the individual data elements it just holds a pointer to another memory location where the data is stored. The pointer is declared as a void pointer because void pointers can be cast to any other pointer type.

A new queue is created with the newQueue() function which returns a pointer (effectively) to the new instance.

struct FiFoQ* newQueue(size_t memSize){
  struct FiFoQ* p = malloc(sizeof(struct FiFoQ));
  p->head = p->tail = NULL;
  p->dataSize = memSize;
  p->count = 0;
  return p;      
}

The function requests some memory to store a FiFoQ struct and initialises the relevant members. Note that the value dataSize is set from an argument passed to the function. This would be set at runtime to the size of the data items that are going to be stored in the queue. The function returns a FiFoQ pointer.

The queueAdd function has to do a bit more than our original custom queue equivalent. In this instance the function is passed the all-important pointer to the queue (you might be managing more than one queue in the same program) and a pointer to the data element to be stored. This could be a pointer to a struct or union or just a regular data type.

bool queueAdd(struct FiFoQ* fq, void* dta) {
  struct FiFoNode* node = malloc(sizeof(struct FiFoNode));
  if(node == NULL) {
    return false;
  }
  node->next = NULL;
  void* dat = malloc(fq->dataSize);
  if(dat == NULL) {
    free(node);
    return false;
  }
  memcpy(dat, dta, fq->dataSize);
  node->data = dat;
  if(fq->head == NULL) {
    fq->head = fq->tail = node;
  } else {
    fq->tail->next = node;
    fq->tail = node;
  }
  fq->count++;
  return true;
}

The function allocates memory for a new FiFoNode struct but then also uses malloc() to reserve and return a pointer to a memory area big enough to hold the data that needs to be stored. The memcpy() function is used to copy the data into that new memory allocation. The data pointer is set in the FiFoNode struct and then the queue values for “next” plus “head” and “tail”, as required, are set just as before. This function returns a bool value which can be false if memory is not available to store either the FiFoNode struct or the data. The data has been stored in the queue without the code needing to know anything more about the data than the number of bytes needed to store it.

The readFirst() function now returns a void pointer to the stored data associated with the first element in the queue. When required, that void pointer can be cast to a pointer of the required data type by the calling program as we shall see.

void* readFirst(const struct FiFoQ* fp) {
  struct FiFoNode* p = fp->head;
  if(p) {
    return p->data;
  } else {
    return NULL;
  }
}

The deQueue() function again has to do a little more than the equivalent function used by the Morse code reading program. Memory has to be freed for the data and the FiFoNode struct associated with the first queue entry.

bool deQueue(struct FiFoQ* fp){
  if(fp->head == NULL) {
    // already empty
    return false;
  }
  struct FiFoNode* node = fp->head;
  struct FiFoNode* nxt = node->next;
  free(node->data);
  free(node);
  fp->head = nxt;
  if(nxt == NULL) {
    fp->tail = NULL;
  }
  fp->count--;
  return true;
}

I have also added a getQCount() function that can be used to track the number of current entries in the queue. Accessing the count through a function is preferred as there is less danger of accidentally changing the value from non-queue code.

int getQCount(const struct FiFoQ* fp) {
  return fp->count;
}

Plus a zapQueue() function that will empty any queue of any remaining entries and then recover the memory associated with the FiFoQ struct.

void zapQueue(struct FiFoQ* fp){
  bool res = deQueue(fp);
  while(res){
    res = deQueue(fp);
  }
  free(fp);
}

The FiFoQ.h file contains the function prototypes and one extra item.

#ifndef FiFoQ_h
#define FiFoQ_h

struct FiFoNode;
struct FiFoQ;
typedef struct FiFoQ* FiFoQPointer;

struct FiFoQ* newQueue(size_t memSize);
bool queueAdd(struct FiFoQ* fq, void* dta);
void* readFirst(const struct FiFoQ* fp);
bool deQueue(struct FiFoQ* fp);
void zapQueue(struct FiFoQ* fp);
int getQCount(const struct FiFoQ* fp);
#endif

It is normal to start a header file with two lines similar to the #ifndef and #define lines seen here and end the file with #endif. These preclude difficulties that might arise if the header file is included into a program more than once (perhaps by another library file that is included).

The “extra” is the typedef that creates a synonym for a FiFoQ pointer as I thought that this would be easier to remember when creating a new queue using this library.

Now we can turn our attention to the GenericQueue program tab and add some code to test the queue. This has to start with a line that includes the FiFoQ header file. The format for this is slightly different for a C library file as we need to make it clear that we are not including a C++ class. So the program starts:

extern "C"{
  #include "FiFoQ.h"
}
template<class T> inline Print &operator <<(Print &obj, T arg) { obj.print(arg); return obj; } 
struct MyData {
  int myVal;
  int yourVal;
};
FiFoQPointer myQ;

The MyData struct is what we are going to use to test the queue. We can create instances of this struct and pass a pointer to those instances as items to be added to the queue. We also have an instance of the FiFoQPointer (which is a synonym for “struct FiFoQ*”).

Do please, think about writing your own test but my setup() function looked like this:

void setup() {
  Serial.begin(115200);
  MyData myData;
  MyData* mDta;
  myQ = newQueue(sizeof(MyData));
  for(int i = 1; i < 21; i++) {
    myData.myVal = i << 2;
    myData.yourVal = i;
    queueAdd(myQ, &myData);
  }
  Serial << "Queue has " << getQCount(myQ) << " items loaded\n";
  mDta = readFirst(myQ);
  while(mDta) {
    Serial << "Read myval: " << mDta->myVal << " yourVal: " << 
           mDta->yourVal << "\n";
    deQueue(myQ);
    mDta = readFirst(myQ);
  }
  Serial << "Queue now has " << getQCount(myQ) << " items\n";
  zapQueue(myQ);
}

Once the code is compiling OK and the test has run to your satisfaction then you might want to install the code as a library for future use. Start by locating the Arduino\libraries folder for your Arduino IDE installation. Mine was on my C: drive at C:\Program Files (x86)\Arduino\libraries. In the libraries folder create a new folder called FiFoQ. Then copy the FiFoQ.c and FiFoQ.h files from where they are stored alongside the GenericQueue.ino file to the newly created folder.

[My files were at C:\Users\mike\Documents\Arduino\GenericQueue\ but your user name will probably be different.]

Once the files have been copied to the library location then they can be included into any program with the following lines. Notice that you can now use rather than "FiFoQ.h" which was used by the GenericQueue.ino program file to access the header file stored locally in the same folder.

extern "C"{
  #include <FiFoQ.h>
}

When you reach the next chapter that looks at creating a C++ class library, we will re-visit this library and create a C++ version. This would allow the programmer to manage multiple queues in a program using class “instances” rather than (perhaps) multiple pointers as we used here. However, this C library will deliver generic queue functionality to any Arduino program that requires it and a conversion to C++ is not a necessity – although that exercise should prove interesting.

Before that though, we could take a look at a C binary tree as our last effort used an array rather than the heap.

A C Binary Tree

Start a new program and then add a tab – let’s call it BinaryTree.h. Going for a minimalist approach, start the code for the new tab with a struct called btNode.

struct btNode{
  uint16_t key;
  void* data;
  btNode* left;
  btNode* right;
};

We can then add a function called btInsert() designed to add a new entry (or leaf) to our tree. btInsert() is a recursive function. Whats this? Adding a function to the header file, is that right? It is a bit of a cheat and we can get away with putting all of the code in the same file without function prototypes if each function is defined before it is used by another function.

bool btInsert(uint16_t key, btNode** leaf, void* dta, size_t dtaSize) {
  if(*leaf == NULL) {
    *leaf = (btNode*)malloc(sizeof(btNode));
    if(!*leaf) {return false;}
    (*leaf)->key = key;
    (*leaf)->left = (*leaf)->right = NULL;
    (*leaf)->data = malloc(dtaSize);
  if(!(*leaf)->data) {return false;}
    memcpy((*leaf)->data, dta, dtaSize);
    return true;
  } 
  if (key < (*leaf)->key) {
    return btInsert(key, &(*leaf)->left, dta, dtaSize);
  }
  return btInsert(key, &(*leaf)->right, dta, dtaSize);
}

The function is passed a numeric key (in this instance) together with a pointer to the root node (coming shortly), a pointer to the data to be stored and a value for the data item size. The process walks through any pre-existing tree nodes following the pointer to the “left” when the new key value is less than the current nodes key and otherwise to the “right”. Once a NULL leaf pointer is located then a new tree node is created in memory and the data copied to another location in memory. Data handling is pretty similar to the way data is managed in the FiFo queue we have just written.

The search function btSearch() follows a similar pattern, walking the tree recursively looking for a match to the supplied key. Where a match is found then the function returns the data pointer value.

void* btSearch(uint16_t key, btNode* leaf){
  if(leaf) {
    if(leaf->key == key) {
      return leaf->data;
    }
    if( key < leaf->key) {
      return btSearch(key, leaf->left);
    }
    return btSearch(key, leaf->right);
  } else {
    return NULL;
  }
}

Enjoying the recursion? Recursion works really well with Binary trees because each junction (or leaf) in the tree has an identical structure to all of the others.

How about some very useful utility functions that also use recursion to navigate the tree.

int btCount(btNode* leaf) {
  if(leaf) {
    return 1 + btCount(leaf->left) + btCount(leaf->right);
  }
  return 0;
}
int btMax(int a, int b){
  return (a <= b) ? b : a;
}
int btHeight(btNode* leaf) {
  if(leaf) {
    return 1 + btMax(btHeight(leaf->left), btHeight(leaf->right));
  }
  return 0;
}

btCount() counts the total number of elements (leaves) in the tree while btHeight calculates the maximum height of a tree. The height of a tree is the number of steps from the root to the most left or most right node. The tree height can be used to tell us how well balanced a tree is. The optimal height is Log2N where N is the number of elements in the tree. The tree height also tells us the maximum number of reads required to find any given tree entry.

Finally (for the moment) a function to delete the tree and release all of the memory used if it is no longer required.

void btDeleteTree(btNode* leaf) {
  if(leaf) {
    btDeleteTree(leaf->left);
    btDeleteTree(leaf->right);
    free(leaf->data);
    free(leaf);
  }
}

Now turn your attention to the main program tab. It needs to start by including the content of our BinaryTree.h file and then follow that with a struct to hold test data and a btNode pointer to act as the root of our tree.

#include "BinaryTree.h"

struct observed{
  float temp;
  float airPress;
  float humidity;
};

btNode* root = NULL;
template<class T> inline Print &operator <<(Print &obj, T arg)
                                                        { obj.print(arg); return obj; }

I also added our << Print operator to pretty up the output code a bit.

We can also add a short function to calculate the available free SRAM on our Arduino device. We can use this to check that any ram consumed by our tree can ultimately be released. This function will not work on 32 bit devices but if you refer to the chapter on SRAM optimisation you will find an alternative version there.

int freeRam (){
  extern int __heap_start, *__brkval;
  int f;
  return (int) &f - (__brkval == 0 ? (int) &__heap_start : 
                                     (int) __brkval);
}

We can now start in on the setup() function.

void setup() {
  Serial.begin(115200);
  int keys[] = {5, 3, 8, 2, 7, 6, 4, 9, 1};
  //int keys[] = {1,2,3,4,5,6,7,8,9}; // an unbalanced insertion
  Serial << "Free SRAM before tree: " << freeRam() << '\n';
  for(int i = 0; i < 9; i++) {
    observed obs = {random(1,10), random(11, 20), random(30, 40)};
    btInsert(keys[i], &root, &obs, sizeof(observed));
  }
  Serial << "Free SRAM with tree: " << freeRam() << '\n';
  observed * found = btSearch(6, root);
  if(found) {
    Serial << "Key 6 = Temp: " << found->temp << " Press: " << 
           found->airPress << " Hum: " << found->humidity << '\n';
  } else {
    Serial << "Failed\n";
  }
  btDeleteTree(root);
  Serial << "Free SRAM at end: " << freeRam() << '\n';
}

The code inserts 9 instances of the observed struct (filled with random data) but indexed by the list of integers in the keys[] array into a tree. The function btSearch() is used to test that an element with a selected key can be located in the tree and the content displayed. The tree is then deleted. At each major step the available SRAM is displayed.

Give it a run and check that all is well.

You will have noted the commented out line of alternate key values in the setup() function – we will use those later so do add that line. Now add three new lines to setup() just before the call to btDeleteTree().

  Serial << "B-Tree height: " << btHeight(root) << '\n';
  Serial << "Node Count: " << btCount(root) << '\n';
  Serial << "Optimal height: " << 
             (int)(log(btCount(root)) / log(2) + 0.5) << '\n';

Run the code again and you should see that the tree height is 4 and that the calculated optimum height is 3. Being within 1 of the optimum value is an excellent result. Note that the Arduino environment does not have a function for Log2 but you can calculate a Log2 value by taking the Log10 value and dividing it by Log10 of 2, which is what the code above has done (rounded up).

Now it is time to comment out the current line that starts int keys[] and replace it with the line below. If you run the code again, you should see that we now have an unbalanced tree with a height of 9. This is because each key added to the tree was greater than the previous key and it was thus added to the “right” in every instance. A tree as badly balanced as this will have poor performance as a search for a key value will be no more efficient than running through a linear list.

Balance a Binary Tree

We may not be able to control the order that values are added to our tree so we need a straightforward way of balancing the tree once it is built. We are going to use the simplest method available. The process is to extract the keys and data pointers from the binary tree in ascending order of keys into a list and then rebuild the tree from that list. To do that we need to tackle the grandparent of data structures – a linked list. Just like it sounds, a linked list is just a sequence of memory locations where each element has a pointer to the next. Our FiFo queue was an adapted linked list. A double linked list, if you run across one, is where the individual items have a pointer to the previous as well as the next item in the sequence.

Just so that you can easily find the code again if you look for it, start a new tab in the IDE – perhaps called LinkedList.h. Then add a struct to that tab with a couple of pointers to that struct type.

struct lNode{
  uint16_t key;
  void* data;
  lNode* next;
};
lNode *lStart = NULL, *lEnd = NULL;

The struct has members for out binary tree key, data pointer and a pointer to the next instance in the list.

We then need a function to add items to the linked list. The code here should look pretty familiar.

bool appendToList(uint16_t key, void* data) {
  lNode* nNode = (lNode*)malloc(sizeof(lNode));
  if(!nNode) {return false;}
  nNode->key = key;
  nNode->data = data;
  nNode->next = NULL;
  if(!lEnd) {
    lStart = lEnd = nNode;
  } else {
    lEnd = lEnd->next = nNode;
  }
  return true;
}

The process for reorganising the binary tree will need to find particular items in the list by position (very similar to an array index). A function called readListItem() manages that. Not fancy but functional.

lNode* readListItem(int item) {
  lNode* rNode = lStart;
  for (int i = 0; i < item; i++) {
    rNode = rNode->next;
  }
  return rNode;
}

Finally we need to be able to clear down the linked list and recover the memory used.

void deleteList() {
  lNode* dNode = lStart;
  while(dNode){
    lStart = dNode->next;
    free(dNode);
    dNode = lStart;
  }
  lStart = lEnd = NULL;
}

Now we need to add some functions to the BinaryTree.h tab to make use of the linked list. Start by adding an include statement at the top of the code (#include "LinkedList.h"). Follow that with a function to extract items from the binary tree and insert them into the linked list.

void btExtractKeys(btNode* leaf) {
  if(leaf) {
    btExtractKeys(leaf->left);
    appendToList(leaf->key, leaf->data);
    btExtractKeys(leaf->right);
    free(leaf); // delete the tree leaf
  }
}

If we start that function by passing it the root node then it will recursively walk through the tree in ascending key order freeing the memory used at each step. We will be re-using the memory when the tree is re-built.

The next requirement is for a function that is a modified version of btInsert(). This does not need to reserve memory for the data as that has never been disturbed. We just need to pass in an item from the linked list to set the key and data pointer back into the binary tree. So here we have btReset() which is again recursive so that it can locate the correct insertion point and, yes, that is a pointer to a pointer.

bool btReset(btNode** leaf, lNode* rNode) {
  if(*leaf == NULL) {
    *leaf = (btNode*)malloc(sizeof(btNode));
    (*leaf)->key = rNode->key;
    (*leaf)->left = (*leaf)->right = NULL;
    (*leaf)->data = rNode->data;
    return true;
  } else if (rNode->key < (*leaf)->key) {
    return btReset(&(*leaf)->left, rNode);
  }
  return btReset(&(*leaf)->right, rNode);
}

The next function manages the trick of picking the linked list items in the correct order to rebuild the binary tree.

void resetTree(btNode** leaf, int iFrom, int iTo) {
  if(iFrom > iTo) {return;}
  int iMid = (iFrom + iTo) / 2;
  lNode* rNode = readListItem(iMid);
  btReset(leaf, rNode);
  resetTree(leaf, iFrom, iMid -1);
  resetTree(leaf, iMid + 1, iTo);
}

The resetTree() function is recursive and should look a bit familiar. The purpose is to start at the mid-point of the linked list and insert the item found there into the binary tree. It will then look for the list item half way between the start of the list and that mid-point and insert that as this should be added to the left of the first item. The next item from the list is half way between the mid-point and the end point and that should be inserted into the tree to the right. This is exactly parallel with the binary search algorithm that was introduced in the chapter on sorts. Each recursive call is passed a revised from and to value to populate a further segment of the tree.

Finally, all we need as a function to control the process of tree reconstruction.

void reBuildTree(btNode* root) {
  int nCount = btCount(root);
  btExtractKeys(root);
  root = NULL; //restart the tree
  resetTree(&root, 0, nCount-1);
  deleteList(); //recover the linked list memory;
}

To test the rebuild out, just add two lines to the setup() function before the call to btDeleteTree().

  reBuildTree(root);
  Serial << "B-Tree height after rebuild: " << 
             btHeight(root) << '\n';

A test run should show that the binary tree has been optimized and can now be used as a fast lookup structure. Of course, using a linked list to rebuild the binary tree has a short term memory overhead. Alternatives do exist as you might expect given that binary tree structures have been in the hands of programmers for many years. One alternative to research is the DSW algorithm. That is most effective when, like our linked list, it is applied once to a tree that has been constructed. There are alternate binary trees that can be used when items are likely to be added or deleted during the lifetime of a tree. Then the “AVL Tree” or “Red Black Tree” could be good choices as they maintain a near optimal structure as the tree content changes. Point is, there are a lot of data structures out there and a little research could pay dividends in tricky data storage and retrieval situations.

Print a Binary tree

One last thing for our binary tree. Suppose we wanted to process or display the tree contents in ascending key order. We could directly access the tree structure or we could write a function to return each key and data pair in order. The latter approach is preferable and it uses a call-back function which is itself a useful technique.

First of all a function needs to be prepared to (in this case) print the values from a tree item. The obsPrint() function accepts two arguments; the item key and the item data struct.

void obsPrint(int key, observed* obs) {
  Serial << "Key: " << key << " Temp: " << obs->temp << 
         " Press: " << obs->airPress << " Hum: " << 
         obs->humidity << '\n';
}

The next step is to add a new function to the BinaryTree.h tab that can be passed the root node and a pointer to a function of the same type as obsPrint(). Chapter 3 covered pointers to functions and you might just recall the slightly odd way that a parameter name would be defined when it is a function pointer.

void btIterate(btNode* leaf, void(*callback)(int, void*)) {
  if(leaf) {
    btIterate(leaf->left, callback);
    callback(leaf->key, leaf->data);
    btIterate(leaf->right, callback);
  }
}

To make use of this function, add a new line to the setup() function on the main tab (again before the btDeleteTree() call.

  btIterate(root, obsPrint);

The btIterate() function calls itself recursively to move down the “left” branch before calling the designated callback function passing it the current leaf key and a pointer to the data. The callback function (in this case obsPrint() but any function with a similar set of arguments and return type would work just fine) then carries out the required process and when it completes the btIterate() function can continue by calling itself again to process whatever might be found down the “right” branch.

Code downloads for this chapter are available here.