Mostly High Performance: September 2012

<br /> Detecting Loops in Linked Lists<br />

I was interviewing at a company and one of the questions was how to detect loops in a linked list.

A linked list is where there are a bunch of objects... perhaps bits of memory that have been allocated... and each one points to the next one in the list. Usually, the last one points to NULL. Sometimes the last one points to the "first" one. But then, in such a list, what does "first" really mean? (Perhaps how you refer to the list as a whole.)

Sometimes you may have a non-circular list where the last element points to an item elsewhere in the list, rather than NULL. Or perhaps you think you have a circular list but the last element does point to NULL. That would be an error. You'd like to be able to detect it... and quickly. To do this, you need to be able to determine if a list has a loop, but you can't just scan through it looking for NULL or the start value, because the "last" item might not point to the "first"; the last node could point to the node just before it. Or the one before that...

This is the sort of thing I probably saw in college but didn't remember. On the spot, I came up with the following algorithm:

Jumping by Powers of Two Algorithm


 bool DoesListLoop( Node const * cur )
 {
  Node const * base = cur;
  int jump = 1;
  int remain = 1;
  while (cur) 
  {
   cur = cur->next;
   if (cur == base) return true;
   if (! --remain)
   {
    jump *= 2;
    remain = jump;
    base = cur;
   }
  }
  return false;
 }

In short, this will race a pointer along the list and every time it has traversed a number of nodes equal to a power of two it resets the base pointer to the current pointer. Each time it moves the "current" pointer ahead it compares it to the base pointer. A match indicates a loop. Finding a NULL means it ended and there is no loop.

More fundamentally, any solution to this problem must advance a pointer through the list forever. If the end of the list is found then there is no loop. If it loops then you will never reach the end, but in order to discern this from an arbitrarily long list you need to match one node with another node previously seen. In order to match loops that loop back to a point arbitrarily far into the list you need to look arbitrarily far ahead of the start, but also arbitrarily far behind the node you are checking in the front. You need to let the "cur" pointer get farther and farther from both what you are comparing against and the start of the list. I did this by resetting the "base" pointer at larger and larger intervals.

Back at the interview, they asked me "do you mean, that you have two pointers, and you advance one pointer two steps for every one step that you advance the other pointer?"

"That sounds like it would work, but, no, I don't mean that." I explained my idea in more detail. They said they thought that it would work, but they wanted to talk about their version some more. Presumably they thought it was a better algorithm (but perhaps they just thought it was simpler and made for a better interview discussion). I think they were wrong. As I've since learned, they meant Floyd's Cycle-Finding Algorithm, published in 1967, also known as "The Tortoise and the Hare Algorithm":

"Two Steps for the Forward Pointer, One Step for the Back Pointer"


 bool DoesListLoop( Node const * cur )
 {
  Node const * base = cur;
  if (!cur) return false;
  while (true)
  {
   cur = cur->next;
   if ( ! cur ) return false;
   if ( base == cur ) return true;

   cur = cur->next;
   if ( ! cur ) return false;
   if ( base == cur ) return true;

   base = base->next;
  }
 }

How do we measure performance?

As far as performance is concerned, we want to minimize the time (or number of operations) required to determine whether a list is looping or not. We might want to optimize for the non-looping case since that is probably the most likely case (if loops are errors), and we'd like to be able to identify those quickly. We also may have short or long lists, the looping part might be shorter or longer, and the size of the nodes might vary.

This graph shows the number of operations to solve the various cases. Here I take an operation to be a memory fetch to look at the next node. I don't count branching as a cost. (Branch prediction should work very well in this case.) This certainly isn't a perfect estimate, but it does capture what I think proves to be the major cost.

Theoretical cost of the two algorithms. The "Half-Speed" algorithm is in green. My "Jumping by Powers of Two" algorithm is in red. The Y axis counts "operations" to solve the problem. An operation is a memory fetch. The X axis is the number of elements in the list. Lighter shades indicate lists where the loop goes back closer to the beginning of the list. Darker shades indicate looping more towards the end of the list. The single lightest shade indicates no loop. The blue points indicate the benefit of the jumping algorithm over the half-speed algorithm.

Theoretical cost of the two algorithms, as above, but with a log-log scale and without the blue difference points.

From this simulation we can see that the jumping algorithm is always as good or better than the half-speed algorithm. Although, it should be restated that this is just a simulation of the memory fetching cost. For small lists that can fit entirely into local cache, the cost is only 2/3 of this for the half-speed case. In that case, you would have some cases where using the half-speed was better.

Lighter shades (looping goes back closer to the beginning) generally appear to be harder to solve.

Also note that the performance of the two algorithms don't diverge... they always stay pretty close, at least on a log scale. This confirms that they are of the same order, O(n).

It is nice to see that the theory confirms our expectations, but models are no proof of reality, so let's move on to actual recorded metrics.

Demonstrated Performance

I've implemented the algorithms in C++ and timed them on a Linux desktop using low-end hardware with a vintage around 2010.

Actual performance. Red is my "jumping" algorithm, green is the half speed algorith. X axis is the number of nodes in the list. Color indicates where the list loops back to... lighter colors loop back closer to the beginning, but the lightest color does not loop.

The difference between the two algorithms. Positive numbers reflect an advantage to my jumping algorithm.

These results track well with the theoretical results (note the log-log graph). It appears to be more or less a wash until the size reaches well over a million and then (when it no longer fits in cache?) the jumping algorithm takes off.

Here's a graph of the algorithms when looking only at a non-looping list.

Performance with different sizes of nodes. Theoretically, cache effects might be exposed at earlier list sizes. The differences are not terribly obvious, althoguh jumping still seems to perform better than half-speed.

The difference between the two algorithms. Positive numbers reflect an advantage to my jumping algorithm. There doesn't seem to be much pattern to the distribution of node size (shade).

Finally, here's the most important graph. Relative performance for verifying a non-looping list, as a function of size of the list. As this would normally by the typical use case, this is therefore the most important graph. If a list did loop, then we wouldn't be as concerned about the performance since we're probably about to report an error and stop the program.

Performance of verifying that a list does not loop.

The difference between the two algorithms for non-looping lists. The scale is percentage of the half-speed algorithm's time to completion. The value indicates how much less time the jumping case took.

As it turns out, the jumping algorithm isn't really much better than the half-speed algorithm in my theoretically most differentiating test case. From the difference graph it appears that the benefit does continue increasing as the length of the list increases. This indicates that the benefit is not kicking in for a while, presumably because of cache.

Theoretically, not traversing the list twice should yield significant benefits once the memory that is used exceeeds the cache size. (Branch prediction would work very well in this case, but it still needs to get the memory in before it can fetch the next memory location.) This seems to be what we are seeing.

Parting Thoughts

If there are n elements in the list, both of these algorithms will terminate before running through about 2n nodes. They are both O(n). Can we do better? While we certainly can't improve on O(n), perhaps we can run through fewer nodes.

Ideally we'd be able to quickly check if we'd ever seen a node before, and therefore terminate after n nodes when we either find the end or find a node which we've seen before. However, the naive implementation would have us checking each node against each node we've already seen, which makes the algorithm O(n²).

Theoretically, a hash map would let us find whether we'd already seen a point in constant time, but the hash adds and lookups would probably take longer than the 2x multiple that we're trying to remove.

I think this is the best that can be done, excepting, of course, if you can mark flags to indicate if a node has been visited before. The is a page linked to below that has details on this method.

A silly digression in which an O(1) algorithm is presented

Interestingly, there is an algorithm technically qualifying as O(1) that works on modern computers. Specifically, you can race a pointer along in memory counting how many nodes you've seen. If the count reaches a number higher than you can fit in the memory available, then you must have a loop. This is worst-case O(1) because the number of operations is constant with respect to the list size... it only varies with the size of memory that is available to contain the nodes in the list, which is presumably constant. But if you consider the typical size of linked lists in comparison to even a 32-bit memory address, you can quickly see that this is also about the most inefficient algorithm imaginable. And therefore we have an O(1) algorithm which is probably the least efficient algorithm. Of course, it all depends on what your independent variable "n" represents. If it represents the number of nodes in your list, then this is O(1) (since the number of operations is constant, the memory size, regardless of how large the list is). If it represents memory size, or maximum possible nodes, then this is O(n), and probably much less efficient than the other algorithms listed. Always use an "n" which is meaningful for the analysis.

Hypothetical Case

Performance often depends upon the use case. For instance, if we are usually dealing with a non-looping list and just want to verify that quickly, then a better algorithm might be to track the expected length of the list and just race to the end. If it hasn't ended, then a more exhaustive search could be done. Or, if it is expected to loop, and always to the beginning (circular linked list), then the search can optimize for that.

Suppose there is a prioritized queue of elements. The elements are allocated as a std::vector because changing the size is infrequent and the number only ever grows. The elements refer to each other through offsets. The order rearranges as the priority of the various items shift around. For whatever reason, we want to be sure that the list terminates and doesn't loop. (Perhaps we are implementing the priority queue for Dijkstra's algorithm.) What is the most efficient way to do this? I suspect that the answer is in the previous (silly) section.

Mostly High Performance

Pages

2012-09-04

Detecting Loops in Linked Lists