Mostly High Performance: 2012

2012-10-11

My Development Tools

Whether developing on my desktop or laptop, Linux or Windows, my tools of choice are pretty bare-bones. I edit my files in vim and I incrementally develop my program by using my "kr" script. "kr" looks for changes in whatever files you give it, and then reruns a command when it detects a change. I.e.:

> kr *.py -c 'tests.py'

Would look for a change in any .py file and rerun the "tests.py" script.

> kr *.c *.h makefile -c 'make ; application param1 param2'

Would look for a change in any .c/.h file, or the makefile, and when a change is detected it remakes and runs "application" with the two params "param1" and "param2".

Developing is easy, just save the file I'm editing and look in the other window to see what happens with the new code. Usually I use this for re-running a python program, which I develop a bit at a time. I fix an error, save, and quickly see the next error. It works best, of course, when the file is able to run quickly.

I've also used it to check for a data file change, at which point it regenerates an image. In another window a separate "kr" script looks for a change in the image file, and runs a program to render it. (The two "kr" scripts had to run on different machines, or else they could have been combined.)

Find the current version here.



#Copyright 2012 Mark Santesson
#
#   Licensed under the Apache License, Version 2.0 (the "License");
#   you may not use this file except in compliance with the License.
#   You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#   Unless required by applicable law or agreed to in writing, software
#   distributed under the License is distributed on an "AS IS" BASIS,
#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#   See the License for the specific language governing permissions and
#   limitations under the License.

import sys
import time
import re
import stat
import os
import glob
import operator
import optparse


if __name__ == "__main__":

    parser = optparse.OptionParser (
                        usage = '%prog -c  files ...',
                        description = 'Repeatedly run a command whenever any one of a list of files changes.')
    parser.add_option('-c', '--command', dest='command', type="string",
            help='Specify command to run when a change is detected')
    options, args = parser.parse_args()

#    print "Found args: " + repr(args)
#    print "Found command: " + repr(options.command)

    if not args:
        print '\n' + os.path.basename( sys.argv[0] ) + ' requires some files to monitor.'
        sys.exit(1)

    if not options.command:
        print '\nError: A command to run is required.'
        sys.exit(1)


    file_times = dict()
    globbed = sum( [ glob.glob(x) for x in args ], [] )
    for x in globbed:
        file_times[x] = 0.0

    print file_times

    while True:
        rerun = False

        for f in file_times.keys():
            attempts_left = 5
            t = None
            while attempts_left and t == None:
                try:
                    t = os.stat(f)[ stat.ST_MTIME ]
                except WindowsError, e:
                    time.sleep(0.1)
                    attempts_left -= 1

            if t > file_times[f]:
                rerun = True
                file_times[f] = t

        if rerun:
            print time.strftime('\n\n--- Rerunning at %H:%M:%S :') + repr(options.command)
            ret = os.system( options.command )

            if ret:
                print '\n\n--- ERRORED %r' % (ret,) + time.strftime(' at %H:%M:%S.')

            else:
                print time.strftime('\n\n--- Done at %H:%M:%S.')

        else:
            time.sleep(0.5)

else:
    raise Exception( 'This module is not meant to be imported.' )

My Development Tools by Mark Santesson is licensed under a Creative Commons Attribution 3.0 Unported License.

2012-09-04

Detecting Loops in Linked Lists

<br /> Detecting Loops in Linked Lists<br />

I was interviewing at a company and one of the questions was how to detect loops in a linked list.

A linked list is where there are a bunch of objects... perhaps bits of memory that have been allocated... and each one points to the next one in the list. Usually, the last one points to NULL. Sometimes the last one points to the "first" one. But then, in such a list, what does "first" really mean? (Perhaps how you refer to the list as a whole.)

Sometimes you may have a non-circular list where the last element points to an item elsewhere in the list, rather than NULL. Or perhaps you think you have a circular list but the last element does point to NULL. That would be an error. You'd like to be able to detect it... and quickly. To do this, you need to be able to determine if a list has a loop, but you can't just scan through it looking for NULL or the start value, because the "last" item might not point to the "first"; the last node could point to the node just before it. Or the one before that...

This is the sort of thing I probably saw in college but didn't remember. On the spot, I came up with the following algorithm:

Jumping by Powers of Two Algorithm


 bool DoesListLoop( Node const * cur )
 {
  Node const * base = cur;
  int jump = 1;
  int remain = 1;
  while (cur) 
  {
   cur = cur->next;
   if (cur == base) return true;
   if (! --remain)
   {
    jump *= 2;
    remain = jump;
    base = cur;
   }
  }
  return false;
 }

In short, this will race a pointer along the list and every time it has traversed a number of nodes equal to a power of two it resets the base pointer to the current pointer. Each time it moves the "current" pointer ahead it compares it to the base pointer. A match indicates a loop. Finding a NULL means it ended and there is no loop.

More fundamentally, any solution to this problem must advance a pointer through the list forever. If the end of the list is found then there is no loop. If it loops then you will never reach the end, but in order to discern this from an arbitrarily long list you need to match one node with another node previously seen. In order to match loops that loop back to a point arbitrarily far into the list you need to look arbitrarily far ahead of the start, but also arbitrarily far behind the node you are checking in the front. You need to let the "cur" pointer get farther and farther from both what you are comparing against and the start of the list. I did this by resetting the "base" pointer at larger and larger intervals.

Back at the interview, they asked me "do you mean, that you have two pointers, and you advance one pointer two steps for every one step that you advance the other pointer?"

"That sounds like it would work, but, no, I don't mean that." I explained my idea in more detail. They said they thought that it would work, but they wanted to talk about their version some more. Presumably they thought it was a better algorithm (but perhaps they just thought it was simpler and made for a better interview discussion). I think they were wrong. As I've since learned, they meant Floyd's Cycle-Finding Algorithm, published in 1967, also known as "The Tortoise and the Hare Algorithm":

"Two Steps for the Forward Pointer, One Step for the Back Pointer"


 bool DoesListLoop( Node const * cur )
 {
  Node const * base = cur;
  if (!cur) return false;
  while (true)
  {
   cur = cur->next;
   if ( ! cur ) return false;
   if ( base == cur ) return true;

   cur = cur->next;
   if ( ! cur ) return false;
   if ( base == cur ) return true;

   base = base->next;
  }
 }

How do we measure performance?

As far as performance is concerned, we want to minimize the time (or number of operations) required to determine whether a list is looping or not. We might want to optimize for the non-looping case since that is probably the most likely case (if loops are errors), and we'd like to be able to identify those quickly. We also may have short or long lists, the looping part might be shorter or longer, and the size of the nodes might vary.

This graph shows the number of operations to solve the various cases. Here I take an operation to be a memory fetch to look at the next node. I don't count branching as a cost. (Branch prediction should work very well in this case.) This certainly isn't a perfect estimate, but it does capture what I think proves to be the major cost.

Theoretical cost of the two algorithms. The "Half-Speed" algorithm is in green. My "Jumping by Powers of Two" algorithm is in red. The Y axis counts "operations" to solve the problem. An operation is a memory fetch. The X axis is the number of elements in the list. Lighter shades indicate lists where the loop goes back closer to the beginning of the list. Darker shades indicate looping more towards the end of the list. The single lightest shade indicates no loop. The blue points indicate the benefit of the jumping algorithm over the half-speed algorithm.

Theoretical cost of the two algorithms, as above, but with a log-log scale and without the blue difference points.

From this simulation we can see that the jumping algorithm is always as good or better than the half-speed algorithm. Although, it should be restated that this is just a simulation of the memory fetching cost. For small lists that can fit entirely into local cache, the cost is only 2/3 of this for the half-speed case. In that case, you would have some cases where using the half-speed was better.

Lighter shades (looping goes back closer to the beginning) generally appear to be harder to solve.

Also note that the performance of the two algorithms don't diverge... they always stay pretty close, at least on a log scale. This confirms that they are of the same order, O(n).

It is nice to see that the theory confirms our expectations, but models are no proof of reality, so let's move on to actual recorded metrics.

Demonstrated Performance

I've implemented the algorithms in C++ and timed them on a Linux desktop using low-end hardware with a vintage around 2010.

Actual performance. Red is my "jumping" algorithm, green is the half speed algorith. X axis is the number of nodes in the list. Color indicates where the list loops back to... lighter colors loop back closer to the beginning, but the lightest color does not loop.

The difference between the two algorithms. Positive numbers reflect an advantage to my jumping algorithm.

These results track well with the theoretical results (note the log-log graph). It appears to be more or less a wash until the size reaches well over a million and then (when it no longer fits in cache?) the jumping algorithm takes off.

Here's a graph of the algorithms when looking only at a non-looping list.

Performance with different sizes of nodes. Theoretically, cache effects might be exposed at earlier list sizes. The differences are not terribly obvious, althoguh jumping still seems to perform better than half-speed.

The difference between the two algorithms. Positive numbers reflect an advantage to my jumping algorithm. There doesn't seem to be much pattern to the distribution of node size (shade).

Finally, here's the most important graph. Relative performance for verifying a non-looping list, as a function of size of the list. As this would normally by the typical use case, this is therefore the most important graph. If a list did loop, then we wouldn't be as concerned about the performance since we're probably about to report an error and stop the program.

Performance of verifying that a list does not loop.

The difference between the two algorithms for non-looping lists. The scale is percentage of the half-speed algorithm's time to completion. The value indicates how much less time the jumping case took.

As it turns out, the jumping algorithm isn't really much better than the half-speed algorithm in my theoretically most differentiating test case. From the difference graph it appears that the benefit does continue increasing as the length of the list increases. This indicates that the benefit is not kicking in for a while, presumably because of cache.

Theoretically, not traversing the list twice should yield significant benefits once the memory that is used exceeeds the cache size. (Branch prediction would work very well in this case, but it still needs to get the memory in before it can fetch the next memory location.) This seems to be what we are seeing.

Parting Thoughts

If there are n elements in the list, both of these algorithms will terminate before running through about 2n nodes. They are both O(n). Can we do better? While we certainly can't improve on O(n), perhaps we can run through fewer nodes.

Ideally we'd be able to quickly check if we'd ever seen a node before, and therefore terminate after n nodes when we either find the end or find a node which we've seen before. However, the naive implementation would have us checking each node against each node we've already seen, which makes the algorithm O(n²).

Theoretically, a hash map would let us find whether we'd already seen a point in constant time, but the hash adds and lookups would probably take longer than the 2x multiple that we're trying to remove.

I think this is the best that can be done, excepting, of course, if you can mark flags to indicate if a node has been visited before. The is a page linked to below that has details on this method.

A silly digression in which an O(1) algorithm is presented

Interestingly, there is an algorithm technically qualifying as O(1) that works on modern computers. Specifically, you can race a pointer along in memory counting how many nodes you've seen. If the count reaches a number higher than you can fit in the memory available, then you must have a loop. This is worst-case O(1) because the number of operations is constant with respect to the list size... it only varies with the size of memory that is available to contain the nodes in the list, which is presumably constant. But if you consider the typical size of linked lists in comparison to even a 32-bit memory address, you can quickly see that this is also about the most inefficient algorithm imaginable. And therefore we have an O(1) algorithm which is probably the least efficient algorithm. Of course, it all depends on what your independent variable "n" represents. If it represents the number of nodes in your list, then this is O(1) (since the number of operations is constant, the memory size, regardless of how large the list is). If it represents memory size, or maximum possible nodes, then this is O(n), and probably much less efficient than the other algorithms listed. Always use an "n" which is meaningful for the analysis.

Hypothetical Case

Performance often depends upon the use case. For instance, if we are usually dealing with a non-looping list and just want to verify that quickly, then a better algorithm might be to track the expected length of the list and just race to the end. If it hasn't ended, then a more exhaustive search could be done. Or, if it is expected to loop, and always to the beginning (circular linked list), then the search can optimize for that.

Suppose there is a prioritized queue of elements. The elements are allocated as a std::vector because changing the size is infrequent and the number only ever grows. The elements refer to each other through offsets. The order rearranges as the priority of the various items shift around. For whatever reason, we want to be sure that the list terminates and doesn't loop. (Perhaps we are implementing the priority queue for Dijkstra's algorithm.) What is the most efficient way to do this? I suspect that the answer is in the previous (silly) section.

2012-01-15

Finding the Highest M of N

Finding The Highest M of N

How do I find the top M of N? The hundred largest of a million?

One day my boss came out of his office and said "I've got to get the highest hundred values of a bunch of data. I tried sorting it, but it's just taking too long."

He had many Megabytes of floats, and was trying to sort the whole thing in order to find the highest 100 values. There are two immediate approaches to this problem:

Sort N and Pick M (QSort)- Sort the whole list (of N items), pull out the first M (100) items.
Scan N and Sort M (Insertion Sort) - Create a list of M (100) items, initially filled with the first M items of the input list of N (1 million) items, sorted. Then walk through the rest of the "N" list and for each item put it in the "M" list if it is larger than the smallest value already in the "M" list. This is like an insertion sort, but where you don't let it grow beyond M items.

Both of these are obviously unacceptable. In Big O notation the first option is (with typical QSort, MergeSort, et c.) an O(N*log(N)) operation, and my boss was already trying it. With the example numbers, we're looking at O(N*log₂(N)) = O(1x10⁶*20) = O(2x10⁷) operations.

The second option can be optimized a bit, but you need to be prepared to do an insertion sort for every one of your one million items. In practice, you would compare the item from the M list against the smallest value in your N list, and if it was smaller and the N list was already full, then you'd skip the insertion. In Big O notation, this is O(N * M) = O(1x10⁸) operations.

Which one is worse depends on the values of N and M.

As an interesting corner case, what if the list is already sorted, or nearly so? The first option (Sort N and Pick M) can possibly fall apart, since the simplest Quicksort implementations have trouble with this case. The second option (Scan N and Sort M) will also fall apart, since it will always have to insert each new entry into the N list (unless the M elements are positioned at the end, in which case it falls prey to a reverse sorted list).

The Third Way: nth_element

Another way to do this is to do something like a quicksort, but without recursing on the side of the pivot that doesn't contain the element of interest. This is called Quickselect, and generally has time O(N), but worst case O(N²). For sufficiently evenly distributed elements this would first process all elements, then half the elements, then a quarter, et c. The total number of elements processed converges to 2*N.

There is an STL algorithm to do this: nth_element. There is also a similar function, partition, that will take the pivot value. partial_sort will sort values up to a specific position in the list.

Performance!

I wrote a Python script to draw some pretty graphs of performance, and also a C++ program to provide the metrics. I also have the specific results file graphed here.

First a note on how to read the graphs. Time is the y axis, so lower values are better. N is the X axis, so values to the right are operating on larger sets of data. I've color coded datapoints by algorithm (red, green, blue), and shaded them by the value of "M". Lighter shades correspond to smaller areas being sorted.

I've also produced some theoretical performance graphs. These are not calibrated to actual performance. They should not match up with the actual metrics except that they should have the same sort of shape.

First up: Performance of Sort N Pick M (qsort)

"Sort N Pick M" sorts the entire sequence of N items, then picks out the first M items. I've provided results for two implementations. One uses the STL and one uses qsort. The qsort variety must call a user-provided function to compare the items, whereas the STL version can inline that call. The STL version seems to be doing better across the board, with the STL clocking at 0.000738 milliseconds versus 0.002162 for a datapoint near 10,000. Since these are log graphs you can see that the STL seems to be consistently about thrice as fast as using qsort, regardless of N. Yay STL!

Regardless of how many items are picked out (M) it takes about the same amount of time to run this algorithm. Since the results are all contiguous after the sort, that range can be returned without requiring a copy of the M elements. Even if this was necessary, the time to do this, O(M), is much less than the time to sort, O(N*log₂N), and would not be particularly noticeable in the timing.

Notice the odd jog near N equal to one million. I looked into that, but the results aren't relevant to this discussion. (I may post details in the future.)

Second chance: Performance of Scan N and Sort M (insertion sort).

The second algorithm looks much different. This is essentially an insertion sort where it doesn't bother inserting if the new value is known to not fit in the M "best" items. Imagine that M is one, how fast would it be? Pretty fast, given that it is essentially doing min() on N items. That is an O(N) operation, and a fast one at that. What about if M is near N? Pretty slow, since insertion sort is an O(N²) operation. Unless M is really near to N, in which case the initial qsort on the M items has already done just about all the work. That explains the initial low values on the graphs for the higher values of M (notice how they track the "Sort All" time from the graphs combining multiple algorithms).

There is a graph above that compares all the algorithms. It shows that for small values of M this algorithm can be faster than any of the others, but for the general case it is much slower.

How Many Inserts Are Required?

How often are inserts required? The likelihood is M/i where i is the position of the element. How many inserts are required, then? Sum as i=M to N of M/i. Integrate from M to N the function M/x... I believe that would be M*ln(N)-M*ln(M) = M*ln(N/M).

Below is a graph of the expected time to do an insertion sort to find the M lowest items of N items, where the M items are initially sorted, and then the remaining items are insertion sorted. This graph, however, has M as the x axis. Increasing M increases the total time, except near the end when M is nearly N and most items are efficiently dealt with by the O(M*log₂M) sort. I have plotted the time to do a qsort on the entire list of N items for comparison, and each value of N impacts the qsort line when M reaches N.

However, that graph is a bit misleading since it appears that only very large values of M as compared to N will yield a performance gain. This is because the x axis is presented with logarithmic progression. Actually the worst case is somewhere around half to two-thirds, as can be seen by the graph below which shows the same data but displays the x axis linearly.

Separating the Men From the Boys: Partitioning with nth_element

The graph of the nth_element algorithm is not too suprising. It operates mostly without regard to the value of M, as expected. It is noticably faster than the Sort N Pick M algorithm. This algorithm is O(N), so it should be considerably faster than the O(N*log₂N) algorithm of Sort N Pick M.

But why does it appear to have the same slope as the SortN Pick M algorithm? Consider the difference implied by additional multiple of log₂N. Log₂(N) equals 16.6 for N=100,000 and 19.9 for N = 1,000,000. The increase of the red lines should be 20% more than for the green. That 20% should correspond to the width of the fifth largest tickings on the left hand side. That is a fairly small tick. In other words, after the number get into the millions the log part doesn't affect the result very much incrementally. If the graph went to smaller numbers, or if the y axis was linear instead of logarithmic, then the curve would likely be more obvious.

Coda: Heaps Better

By far, insertion sort is the best performing algorithm. Well, at least for small values of M. It's too bad I can't get the best of both worlds... Insertion Sort time for small M, and Partition time for larger M. Of course, I could always compare M to N and decide which algorithm to use, but that seems so hacky.

Why am I doing an insertion sort at all, when I don't really care what the sorted order is, but just what the highest values are?

Heaps are neat. They are a not-quite-sorted,-but-maybe-sorted-enough type of data structure. Heaps can insert in O(log₂N) time. A Heap can be constructed in O(N) time! (Knuth, The Art of Computer Programming , volume 3, section 5.2.3, page 155.)

Heaps always know what their highest item is, because it is in position 0. Heaps work by having the property that the element at index i is always greater than the elements at indices 2*i+1 and 2*i+2 (for the first element being at index 0). They can be thought of as a nearly balanced binary tree where each node is greater than its children. It is not a red/black tree or any other type of sorted list.

To insert you figure out where the next item goes (index i, where i is the current length of the list) and put the new item there. Then you percolate that point up:

If the current element is larger than the parent (at index floor(i/2)) then swap the two.
If there is no swap necessary, then the insert is complete.
If a swap was necessary, then repeat the check at the parent's location.

To pop the highest item, you remove the last item in the list and put it in the list, either as its own sibling, or as the parent, as required upon comparison with the sibling. If a sibling is replaced, it replaces the common parent, which in turn gets to replace its sibling or parent. Eventually you get to the root of the heap and it is taken away as the highest value before the location is overwritten.

The Heapsort algorithm takes an unordered list, turns it into a heap, then burns through the heap, popping off the highest item and placing it in place of the last item. The result is a sorted list. Heap construction takes O(N) time, but the popping takes O(N*log₂N) time, same as other sorts. The advantage of Heapsort is that this neither takes additional memory (like merge sort) nor has a work case scenario of O(N²) (like Quicksort). Typical run time is worse than Quicksort, however.

Heaps are ideal for priority lists that will be modified while in use. They can fetch first, insert, and delete items in O(log N) time. This makes them good choices for Huffman Encoding.

What can't heaps do? Well, you can't traverse a heap in sorted order very easily... you basically have to pop off all the items in turn. They are most useful when you just need the highest item, or when the number of inserts and deletes is significantly larger than the size of the heap for each time it is walked in a sorted order.

Instead of using an Insertion Sort, if I just maintain the lowest M items as a heap, then I can easily compare to the highest value, even after an insertion. For evenly distributed numbers, the likelihood of needing to insert into N is M/i where i is the index of the current point. If an insertion is needed it will take O(log₂N) instead of O(N) for an insertion sort.

If I do this then we can cut down on the time required for the inserting part of the insertion sort. The result will need a final sort pass if we need items in the correct order instead of just the items listed (this is also true of the nth_element version).

Yet Another Heap Based Algorithm

Another possible use of a heap would be to heap the entire list, then pull out the top M items. The performance of this is quite simple: O(N) to heap, then M*log(N) to pull out the items. (Note that the depth of the tree decreases, thereby reducing the log(N) component as items are pulled out. Given that the log(N) value holds for half the items, even if the other half takes zero time it would still be the same O(M*log(N)).)

However, as can be seen from the graph below, this new algorithm never appears to be better than using a heap for an insertion sort. It only becomes competetive well after using nth_element would have beaten both Heap algorithms.

Heap Performance

The performance is quite reminiscent of the insertion sort, except that the performance does not degrade so much for larger M. Notice the dipping first value is still present. I.e., if M is equal to N then the algorithm is just doing a large qsort, no insertions. And yet, the line for "sort all" is actually above these lower first values. I wonder why that is.

The performance does still get worse than the nth_element case. But instead of being about 100x as slow (or more) the Heap implementation seems to max out at about ten times as slow. (Below is a graph of all the implementations.)

Theoretical Results

Here are the complexities of the various algorithms:

Algorithm	Complexity
Sort N and Pick M (QSort)	O(N*log₂N)
Scan N and Pick M (Insertion sort)	O(Mlog₂M + (N-M)M)
nth_element	O(N)
Heap insertion sort	O(M + (N-M) + log₂(M)*ln(N/M))
Heap All and Pop	O(N + M*log₂(N))

Here is a graph of the theoretical complexity of the algorithms.

The theoretical analysis of the complexity of the algorithms indicates that the Heap insertion sort is the fastest for small M, but that it does grow without bound on m. While it may always be faster for any N on small values of M, there will be values large enough to justify using nth_element instead.

Actual Results

For low M, the Heap and Insertion sort methods have the same performance. This makes sense, since the benefit of using a heap over a straight sorted list is log₂(M) versus M when inserting and M versus M*log₂(M) for the initial sort. As M approaches one these approach equivalence.

Bottom line: the Heap Insertion Sort is always at least as good as the regular insert sort, but can still be beat by nth_partition for larger values of M.

Finding the Highest M of N by Mark Santesson is licensed under a Creative Commons Attribution 3.0 Unported License.

2012-01-08

When to make a Destructor Virtual

When do I make my destructors virtual?

I've interviewed several people for C++ positions and one question we always ask is what a virtual destructor is and when to use one. For some reason, a suprising number of people say that you should make your destructor virtual whenever you have anything else in the class that is virtual. That is not the correct answer. (Although, it is suggested by Bjarne Stroustrup in the ARM (page 278), and I suppose that is a pretty good basis for the claim...) There may be a correlation between the two, but there is absolutely no causative relationship.

There are three reasons that must all be fulfilled^(*) before making a destructor virtual.

The class might be derived from.
A pointer to this type might be used to destroy instances of derived class.
Either of ^(*):
- Someone might define custom delete functions (that might be different for the base and derived classes, or might be sensitive to the object's real size).
- Some derived classes might need their destructors called.

^(*) Unless you want to follow The Standard... (which, you know, maybe you should...)

Technically, you should ignore that last one. The standard says:

5.3.5 Delete
[...]
3 In the first alternative (delete object), if the static type of the operand is different from its dynamic type, the static type shall be a base class of the operand's dynamic type and the static type shall have a virtual destructor or the behavior is undefined. In the second alternative (delete array) if the dynamic type of the object to be deleted differs from its static type, the behavior is undefined.

(Emphasis mine.)

Which means: if you're going to delete an object of class B through a pointer to class A (where B is derived from A), then A had better have a destructor that is virtual. That doesn't mean that it needs to declare it virtual itself but if one of A's superclasses doesn't declare the destructor virtual then A's destructor does need to be declared as virtual. And if you intend to rely on a higher base class declaring it virtual, you should probably just declare it so as well... in case someone does something utterly foolish like making it no longer virtual.

If that doesn't convince, see what some learned elders have to say on it:

Scott Meyers, Effective C++ Second Edition (Addison Wesley, 1997) 60-62. Item 14 discusses the problem of virtual destructors in detail.
Bjarne Stroustrup, The C++ Programming Language Third Edition (Addison Wesley, 1997) 319. Section 12.4.2 discusses abstract classes and mentions rather offhandedly: "Many classes require some form of cleanup for an object before it goes away. Since the abstract class [the base class] cannot know if a derived class requires such cleanup, it must assume that it does require some. We ensure proper cleanup by defining a virtual destructor [base class]::~[base class]() in the base and overriding it suitably in the derived classes."
Margaret A. Ellis and Bjarne Stroustrup, The Annotated C++ Reference Manual, (Addison Wesley, 1990) 277-278. (Also known as "The ARM".) The commentary at the end of the page rather tepidly states "It is often a good idea to declare a destructor virtual".
Marshall Cline, C++ FAQ (http://www.parashift.com/c++-faq-lite/virtual-functions.html#faq-20.7), A good description of the problem and the rule.

However, if you feel like living dangerously, or even if you don't but you really need that extra performance, then you might be able to get away with it.

Now, normally I'm really a fairly honest person. Really! And I'm a big advocate of not lying to the compiler. (Generally, it takes its revenge.) But sometimes lying is just communicating something in a different way. In this case, I'm saying to the compiler: "I know you really want to dot your I's and cross your T's, but believe me, in this case you really don't need to. Just call the destructor of the type of the pointer."

That said, you do still have to be really careful. There are a few reason why you really would need to have the derived class's destructor called:

The derived class has data members (such as raw pointers), or other logical ownerships such as open log files, that must be cleaned up.
You have data member items in the derived class and they need their destructors to be called. (Such as an auto_ptr.)
You have explicitly defined the derived class's destructor code and need it to be called.
A derived class, or one of its derived classes, might provide a custom delete operator. If so, the size passed to the delete function is determined by the dynamic type of the class if the destructor is virtual, but by the static type of the pointer if the destructor is not virtual.
A derived class might be a different size, and you may be deleting an array of them.

If all of those reasons are absent, then you might be able to get away with not having your class use virtual destructors.

When can I not delete through a pointer to the base class?

Related to the discussion about when a function needs to be virtual, is when can an array of a derived class be deleted through a pointer to a base class? If the sizes are the same (and assuming you don't have to worry about the destructor not being virtual), then the answer is whenever you like. If they aren't the same, then never.

To see why, imagine that you have class A and B, where B derives from A, and both hold an integer. A always sets it to 1, and B sets it to 2. Something like (source here):


 class A {
  int nDataA;
  A() : nDataA(1) {}
 };

 class B : public A {
  int nDataB;
  B() : nDataB(2) {}
 };

If I have it create an array of three Bs and delete them as Bs, then (after instrumenting the classes) I see:


 Creating B[3] dynamically, deleting through B*
 Constructing at 0xd915c
 Constructing at 0xd9168
 Constructing at 0xd9174
 Dest of 0xd9174. I am now B :(size is 12 bytes)  B83F0400 01000000 02000000
 Dest of 0xd9174. I am now A :(size is 8 bytes)  08400400 01000000
 Dest of 0xd9168. I am now B :(size is 12 bytes)  B83F0400 01000000 02000000
 Dest of 0xd9168. I am now A :(size is 8 bytes)  08400400 01000000
 Dest of 0xd915c. I am now B :(size is 12 bytes)  B83F0400 01000000 02000000
 Dest of 0xd915c. I am now A :(size is 8 bytes)  08400400 01000000

If I do the same, but delete it through an A pointer, then I see:


 Creating B[3] dynamically, deleting through A*
 Constructing at 0xd915c
 Constructing at 0xd9168
 Constructing at 0xd9174
 Dest of 0xd916c. I am now A :(size is 8 bytes)  08400400 02000000
 Dest of 0xd9164. I am now A :(size is 8 bytes)  08400400 B83F0400
 Dest of 0xd915c. I am now A :(size is 8 bytes)  08400400 01000000

(These results were obtained with the DJGPP compiler, which uses gcc.)

Notice how it deletes backwards; it starts at 0xd916c and goes down to 0xd915c. Also notice that those aren't the addresses that were constructed, unlike the first case where we destroyed through a B pointer. Notice what happens when you call it through Bs destructor: it starts with the vptr at 00043FB8 and calls ~B, then sets it to 00044008 for when it calls ~A. This is as expected, of course, since when you call through the constructors and destructors, virtual functions resolve to the current class being constructed/destructed, not the class that the object will eventually be.

Now look at the sequence of destructor when it is done through the A* pointer. The first one that is destroyed is destroyed through a pointer that is actually pointing to the "1" of the second item. Before the ~B code is called the vptr for A's vtable is written on top of the "1". The same goes for the second item, which starts out pointing to the "2" of the first item.

Clearly this is bad. Unfortunately, even making the destructor of A virtual will not solve this problem. Theoretically, if the class being pointed to has a vtable it would be possible to encode the sizeof in the table and use that when iterating through the array. Or, the compiler could store the sizeof in the table, as it stores the number of entries in the array. Gcc doesn't do this, for better or worse. The bottom line is that if you want to destroy an array of Bs with a pointer to As, then sizeof(B) must equal sizeof(A), in addition to the other rules stated above, regardless of whether or not the destructor is virtual.

Hacky "Fixes"

You could ensure that class A is not ever deleted in array form by overloading that operator and throwing an exception, but then how do you know that you aren't just deleting a bunch of "A"s, which is fine? (If you overload the delete array operator for "B" (in order to throw), then it will only get called in the instance where it is being deleted properly, which is actually worse than doing nothing.)

You could implement a custom destructor for arrays of A, and test in the destructor if it is seeing an object of class B in order to handle it specially. But you would still have the wrong pointer to the class... you'd have to fix that up with pointer math after you figure out what your base is. This is so terrible that I refuse to think about it anymore.

Why not always make it virtual

There is a performance penalty to making and calling a virtual destructor. Making a destructor virtual will force the class to store a pointer (called a vptr) to the class's virtual table. If there are no other virtual functions in the class, then this is a little bit of extra overhead (whatever the pointer size is: usually 32 bits, but sometimes 64 bits or even something else). This is probably where the typical erroneous answer comes from. Presumably people think that if you are already paying the space cost, then you might as well make the destructor virtual.

There is also a performance penalty to calling a virtual function, including virtual destructors. The vptr must be dereferenced and the resulting function called. This can cause memory cache misses, et c.

Use at Your Own Risk

There you go. Be careful. Or just play it safe and follow the standard. Or did I miss something?

When to make a Destructor Virtual by Mark Santesson is licensed under a Creative Commons Attribution 3.0 Unported License.

2012-01-01

Generating Anagrams the Efficient and Easy Way

Generating Anagrams The Efficient and Easy Way

I was interviewing at a company that gives take home problems before the phone interview. One of the problems was to write a program that would generate anagrams. The question specifically asked for all combinations to be generated.

// Returns a list of all words that use the same
 // letters as the input string.
 list < string > generateAnagrams(const string& input);

(Note the ambiguity: is the function returning multiple words per list element, or a single one?)

The easiest solution is to strip out non-alphabetic characters from the input, and then call the STL's "permute" function. Check each version against the entries in the dictionary and you are done. Except that this version won't return multi-word anagrams, which is rather important in that it makes for a much more challenging problem.

It is also very slow. For an N-letter long input (assuming no repeat letters), the number of permutations (not counting that we only have 26 letters and ordering is unimportant for multiples of the same letter) is n factorial, n!:

N	N!
1	1
2	2
3	6
4	24
5	120
6	720
10	3628800
15	1307674368000
20	2432902008176640000
25	15511210043330985984000000
30	265252859812191058636308480000000

I use a dictionary with 45,424 words (a spell-checking file from Linux, but with single letter words removed). That is equivalent to about eight letters of permutations. Of course, you'd need to have access to all the letters before you could hope to make every word in the dictionary. This is to show how explosive the possibilities become with only a few letters. The possibilities can be severely curtailed by refusing to continue permuting when the word so far does not match any existing words.

After about seven letters, the number of possible words that can be made out of the permutation actually starts to go down. Without limiting constructions to valid starts of words, the number of permutations being handled would still be going up.

I tried to get a metric on how much this improved the performance, but the slow version was too slow to run on a letter length that was long enough to get a non-zero number from the fast version. For "percentage speed boost" I'll jot that one down as "infinite".

Phrase	Number Produced	Slow Version	Fast Version
two words	14	1 second	0 seconds
two words a	326	16 seconds	0 seconds
two words ab	522	319 seconds	0 seconds
three words hu	21,122	too slow	3 seconds
SETEC Astronomy	1,800,508	did not attempt	161 seconds

I could have made it even faster by trimming the dictionary by any letters that were not in the input string. Just think about how much smaller the dictionary would be for a fifteen letter input when we can take out all words from the dictionary that use even one of the letters that aren't in the input string! However, this optimization has a reducing benefit as the sizes get larger (and more of the alphabet is filled out).

There is a much faster way to do this than permuting the input. Instead, sort the input, and all words in the dictionary, to obtain the "key" for an arbitrary series of letters. Then, you can find the items that share the input key. Ideally, store the dictionary of words as a map (or Python dict) using the key. That way you can easily turn the key from an input string into all the words in the dictionary that use it.

Word	key
part	aptr
prat	aptr
tarp	aptr
trap	aptr
rapt	aptr

However, this technique doesn't really turn out to be much easier for the multi-word case.

But there is an even faster way. I've never seen this explained anywhere, although a long time ago I did read a science fiction story with a similar element. I'm going to try to walk into this so that readers can have their own "a-ha!" or "Eureka!" moment.

The concept of keys are useful. They make the order of letters unimportant, which they are. Completely unimportant, both in the input and output words. Only the count of them are important. Why then, do I store a string at all? Perhaps I'd prefer to have an array of counts of each letter:

a	b	...	o	p	q	r	s	t	u	...
1	0	...	0	1	0	1	0	1	0	...

But checking one versus another would be so difficult. For each entry in the word dictionary I'd have to iterate along all 26 characters just to verify that everything the word requires is in my source anagram. That might not be an improvement. What might work better? Have a think about that.

But if I store it as a bitfield, then I can just do a bit-operation to verify the presence of all the items! Each letter becomes a bit in a 32-bit word. Characters available in my source string are set in a 32-bit integer, and each entry in the word dictionary has a 32-bit word key set. In order to determine if I have all the letters necessary for a word in the dictionary, I simply do a binary AND of the two keys and compare to the prospective word. Of course, "sap" and "pass" are not anagrams of one another, so there has to be some additional tests. How can I solve that one?

We could expand the bitfield, so that each location was able to store a count of letters. This is not infinitely scalable, and somewhat hacky. So the next problem becomes, how can I store the word in an easily manipulatable form that will store both the characters, and their counts? I can't just add the key for "pas" and "s", because then I'd have two "s"s, which sum to the key for the next letter "t". How can we fix that?

If you assign a number to each letter of the alphabet, and added all of the numbers for a word together, then you could take out letters as you constructed anagram words, and still have the rest of the letters represented. This is a very poor representation, however, since you wouldn't be able to tell the difference between "aa" (1+1) and "b" (2). No matter how far apart you spaced the numbers for the letters, you could still get there eventually. You could pick a maximum word length, and start at a sufficiently high number. But how do you know if an arbitrary letter is part of the number without finding all the letters?

If you multiply instead of add the numbers, then you can try to divide the letter number from the product of the input letters, and if there is no remainder in the division then that number went in. But then, how can you tell the difference between "ab" (2*3) and "g" (6)? You need to be careful to choose numbers for each letter than cannot be produced by any combination of a previous number. What is the set of numbers with this property? Prime numbers.

Here's the algorithm, and there is a copy at the end of this post as well. (A few performance steps are marked with an asterisk.):

Assign prime numbers to each letter.
- * Read in the dictionary and count the number of times each letter is seen.
- For the most common letter, assign it to "2".
- For each next most common letter, assign the next prime number.
- For each word in the dictionary, calculate the product of the primes associated with each letter of the word. Store all words in a list, where each list element starts with a key, and is followed by the words that use that key.
- * Sort the list by the key, so that the smallest values are first.
For each input to be turned into anagrams:
- Calculate the product of the input letters.
- * Filter the list of words to remove any keys which cannot divide into the input product.
- Look at each key for words.
  - If it evenly divides the input number, then add that list of words to a temporary list of "matched" words, reduce the input string, and recursively continue processing later keys in the list.
  - If the key ever matches the whatever is left of the input key, then a complete multi-word-list match is found. Print out the various lists of words that have been assembled and continue to look for other matches.
  - When the dictionary is run through, pop back to the state before the previous match.

The result will be a list of lists of lists of words. For instance the result for "lumberjack" is:

lumberjack:
Result: [22, 1437443, 2359823] - re  jack  Blum
Result: [94, 331639, 2393863] - em,me  bark  Cluj
Result: [94, 552299, 1437443] - em,me  blur,burl  jack
Result: [118, 264187, 2393863] - be  mark  Cluj
Result: [329, 155782, 1456061] - am  jerk  club
Result: [1739, 40061, 1071202] - mu  jab  clerk
Result: [2162, 7469, 4621411] - elm,Mel  jar  buck
Result: [2162, 24013, 1437443] - elm,Mel  rub  jack
Result: [3358, 31913, 696377] - elk  jam  curb
Result: [4669, 102601, 155782] - Lac  bum  jerk
Result: [31913, 2338433966L] - jam  buckler
Result: [163009, 457805662] - rack  jumble
Result: [1437443, 51916106] - jack  lumber,rumble
Result: [2188978, 34091911] - Merck  Jubal
Result: [2393863, 31174066] - Cluj  embark
Timings:
0.753687 - Assign Primes to the alphabet.
1.065416 - Create signatures dictionary.
0.068219 - Solve anagrams for "lumberjack"
Done.

Each line indicates a family of anagrams that can be produced. The commas separate words which are anagrams of each other and are interchangeable within a produced phrase. The order of the groups that make up a family are unimportant. They are shown with the lowest number first.

For instance, the second line can produce "em bark Cluj" or "me bark Cluj". It can also produce any rearrangement of those words, such as "Cluj bark me".

The basic algorithm shown above, without limiting the input keys, took 932 seconds to produce "SETEC Astronomy" anagrams. The time was reduced to 21 seconds with the code change to filter the dictionary to include only words which can evenly divide the input phrase. Another change to re-filter at every level of recursion took the time down to 4.9 seconds.

This handily beats my best submission for the interview, which was a C++ program that took 161 seconds on the same anagram input. Of course, I could implement this in C++ and it would probably be faster yet.

But, it must be noted, the C++ implementation did not have the dictionary reducing optimization, so it would likely be even faster. The C++ method worked fundamentally differently. It would permute the input string, recursively adding new letters so long as there was a word in the dictionary that could match.

Finally, a practical use for prime numbers! (Well, assuming one day I can find a practical use for Anagrams...) The most recent version can be found here.

#!/usr/bin/python

#Copyright 2011, 2012 Mark Santesson
#
#   Licensed under the Apache License, Version 2.0 (the "License");
#   you may not use this file except in compliance with the License.
#   You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#   Unless required by applicable law or agreed to in writing, software
#   distributed under the License is distributed on an "AS IS" BASIS,
#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#   See the License for the specific language governing permissions and
#   limitations under the License.



'''
Anagram generator. Use prime numbers to represent the phrase given.
Each letter has a unique prime number. A phrase's signature is
the product of all the prime numbers assigned to the component
letters.  The signatures for any set of phrases which are anagrams
of each other will all be the same number.
By Mark Santesson.
'''

import time, os, sys, collections


timings = []
last_time = time.clock()

def elapsedTime():
    '''Return the number of seconds that has passed since the last time this
    function was called. The first time it is called it will return the time
    since the function was defined.'''
    global last_time
    this_time = time.clock()
    elap_time = this_time - last_time
    last_time = this_time
    return elap_time

def addTiming(desc):
    timings.append( ( elapsedTime(), desc, ) )



def nextprime(x):
    """Quick prime number calculator. Pass in a number and it will return
    the smallest prime strictly larger than the input. It is very
    inefficient."""

    # Create a function which returns the next prime larger than the
    # passed value. A faster way to do this is to keep track of all
    # primes seen so far and then start trying primes incrementally
    # higher than that, dividing only by the other primes already
    # found. You can also stop at sqrt(x). I'm only going to generate
    # 26 of them, so this easy method is fine.

    while True:    # On the theory that there are an infinite number of primes...
        x += 1
        for pot_fac in range(2,x):    # Not even using the sqrt(x) or even numbers optimizations!
            if x % pot_fac == 0:
                break
        else:
            return x



def prime_generator():
    '''Returns a generator that produces the next largest prime as
    compared to the one returned from this function the last time
    it was called. The first time it is called it will return 2.'''
    lastprime = 1
    while True:
        lastprime = nextprime(lastprime)
        yield lastprime






class AnagramGenerator:
    '''The AnagramGenerator class produces all anagram combinations
    that are possible from the input phrase using the given dictionary.
    If no dictionary is given to the constructor (as a filename)
    then it assumes "words" is the filename.'''
    def __init__(self, dictionary_filename='words'):
        '''Make a AnagramGenerator object. There is no way to
        switch dictionaries; create a new instance instead.'''
        self.dictionary_filename = dictionary_filename
        self.__loadDictionary()

    def __getPrime(self, c):
        '''Returns the prime number that represents the passed
        in character. The character must be lower case.'''
        # assert c==c.lower()
        return self.chars_list[ ord(c) - ord('a') ]

    def __getSignature(self, s):
        '''Returns the integer that represents the characters in
        a string.  The integer is the product of the prime number
        representation of each character in the string.'''
        s = s.lower()
        sig = reduce(lambda x,y: x * self.__getPrime(y),
            [ch for ch in s if ord('a') <= ord(ch) <= ord('z')], 1)
        assert sig >= 1, repr( ('Product of primes produced non-positive number',s, sig, [ (z,self.__getPrime(z)) for z in s if ord('a') <= ord(z) <= ord('z') ]) )
        return sig

    def __loadDictionary(self):
        '''Load a dictionary from the given input file.  The file
        should have one word per line.'''

        addTiming('Start of loading dictionary from "%s"' % (self.dictionary_filename,))

        # Count chars in the dictionary. Create a dict of lower
        # case character to the number of instances found in the
        # words dictionary. It might be useful to break this
        # function into two parts to enable custom dictionaries
        # passed in as a list/string instead of just specifying
        # a dictionary filename. However, since I'm testing
        # the performance of this script, I need to combine the
        # counting and the line loading to prevent a second
        # loop. There are ways around that, but it isn't a
        # focus for now. If loading from something other than
        # a dictionary file is needed then this is the place
        # to make that change.

        all_lines = list()
        counts = dict([ (chr(x),0) for x in range(ord('a'), ord('z')+1) ])
        for line in open( self.dictionary_filename ).readlines():
            all_lines.append(line.strip())
            for c in line.lower():
                if ord('a') <= ord(c) <= ord('z'):
                    counts[ c ] += 1


        # Construct a dict mapping from lower case character to the
        # prime number for it. Initialize it in the order of most
        # used character first, so that the letters most frequently
        # used get lower numbers in the hopes that the products
        # will be smaller numbers.

        primes = prime_generator() # get the prime number generator
        chars_map = dict( [ (x,primes.next()) for x,y in \
            sorted( counts.items(), key=lambda i:i[1],
                reverse=True ) ] )

        # Recast the dict as a list, where the index is the index
        # in the alphabet. The value is the prime number for that
        # character.

        self.chars_list = [ chars_map[chr(x)] for x in range(ord('a'), ord('z') + 1) ]

        addTiming( 'Assign Primes to the alphabet.' )


        # Insert all the dictionary words into the list
        self.sigs = collections.defaultdict( list )
        for word in all_lines:
            sig = self.__getSignature( word )
            if sig > 1:    # ignore lines without letters
                self.sigs[ sig ].append( word )


        # We need the keys in list form so we can have it sorted.
        # Lookups into the signatures dictionary are pretty rare.
        self.sigs_keys = sorted( self.sigs.keys() )
        addTiming( 'Create signatures dictionary.' )

    def __getSignatureKeys(self):
        '''Returns the sorted list of all signature keys. This is
        used to populate a list of all available words.'''
        return self.sigs_keys

    def formulateAnagramPhraseCombo(self, res):
        '''Render an anagram "phrase" into text for display. Each key
        is separate by double spaces, and each key displays as a comma
        separated list of the words choices.'''
        return "  ".join( [ ','.join(self.sigs[s]) for s in res ] )



    def __solveAnagramPhrase(self, letters, unreduced_keys, start=0, \
                so_far=[], display_progress=False):
        '''This is the recursive function that helps produce anagrams.
        It should not be called directly.'''

        # letters: Product of signatures of letters remaining.
        # unreduced_keys: Keys representing items in the dictionary
        #    that can non necessarily be reached by the remaining
        #    letters.
        # start: The index of unreduced_keys to start considering.
        #    keys before this do not need to be considered in
        #    combination with this pick because earlier keys
        #    will be considered separately and they will consider
        #    combining with this pick as part of their recursion.
        #    The start value should be considered again in case
        #    the same key can be used multiple times.
        # so_far: A list containing the keys that have been picked
        #    so far in this chain of recursions.
        # display_progress: Flag, if true then results are displayed
        #    as they are discovered.

        if letters == 1:
            # There are no letters left.
            if display_progress:
                print "Result: %s - %s" % \
                    ( repr(so_far),
                    self.formulateAnagramPhraseCombo(so_far) )
            return [ so_far ]

        # Filter list of keys to remove any that can no longer
        # be constructed using some of the input letters.
        reduced_keys = [ x for x in unreduced_keys[start:] if letters % x == 0 ]
        result = []

        # Recurse on all items remaining in the dictionary.
        for index,sig in enumerate(reduced_keys):
            remaining_letters = letters // sig
            result += self.__solveAnagramPhrase(
                    letters        = remaining_letters,
                    unreduced_keys    = reduced_keys,
                    start         = index,
                    so_far        = so_far + [ sig ],
                    display_progress= display_progress
                    )

        return result


    def anagrams( self, phrase_string, display_progress=False ):
        '''This function takes an input phrase string and returns
        a list of all anagrams that can be generated from it. The
        return value is a list of lists of lists. The inner lists
        represent the individual words which are interchangeable
        in an anagram. For instance, "tarp" and "trap" are
        anagrams of each other. Whenever one can be used, the
        other can substitute. They will be presented together
        in the innermost list. An example output for "lumberjack"
        might be:
        [ [ ['me'], ['bark'], ['Cluj'] ],
          [ ['me'], ['blur','burl'], ['jack'], ],
          [ ['be'], ['mark'], ['Cluj'] ]
          [ ['am'], ['jerk'], ['club'] ]
          [ ['elm','Mel'], ['jar'], ['buck'] ]
          ...
          ]
        '''
        all_keys = self.__getSignatureKeys()
        if options.dumpdict:
            for k,v in sorted(sigs.items(), key=lambda x: x[0]):
                print repr(k),'->',repr(v)

        r = self.__solveAnagramPhrase(
                self.__getSignature(phrase_string),
                all_keys,
                display_progress=display_progress )

        r = [ [ self.sigs[s] for s in row ] for row in r ]
        addTiming( 'Solve anagrams for "%s": %d rows' % (phrase_string, len(r),) )
        return r






if __name__ == "__main__":
    import optparse

    description = '''\
Anagram generator by Mark Santesson. This program rearranges letters
in any non-option command line parameters and output the anagrams
that can be generated from them. The potential words are filtered
through a dictionary. To generate anagrams for more than one phrase,
separate the phrases with (escaped) semicolons.
'''

    parser = optparse.OptionParser(
                usage='usage: %prog [options] args',
                description=description )
    parser.add_option('-t', '--time', action='store_true', default=False,
            dest='timing', help='Display timing information but not anagrams generated.')
    parser.add_option('-d', '--dictionary', default='words',
            dest='dictionary_name',
            help='Specify a location for the dictionary. Default is "%default".')
    parser.add_option('--dumpdict', action='store_true', default=False,
            dest='dumpdict',
            help='Dump dictionary after load, for debug purposes.')

    options, test = parser.parse_args()

    if len(test) >= 1:
        test = ' '.join(test).split(';')
    else:
        test = [ "face", "astronomy", "SETEC Astronomy" ]

    ag = AnagramGenerator( options.dictionary_name )
    for x in test:
        print "%s:" % x
        ag.anagrams( x, not options.timing )

    if options.timing:
        print "Timings:"
        for et,desc in timings:
            print "%f - %s" % (et, desc,)
    print "Done.\n"


# 932 seconds to solve SETEC Astronomy without limiting dictionary.
# The C version took 161 secs.
# 21 seconds after reducing the keys from the dict.
# With dictionary reduction at every recursion, it goes down to 4.25 seconds.

Generating Anagrams the Efficient and Easy Way by Mark Santesson is licensed under a Creative Commons Attribution 3.0 Unported License.

Pages