2015-06-29

Things I don't like about python: bools indexing into a list

Things I don't like about python

Bools indexing into a list

This is valid code, and does what you expect:

list_ = [ 'A', 'B', 'C' ]
print list_[False]  # 'A'
print list_[True]   # 'B'

... which is neat, but it upsets me a little bit. You index into things with integers, not floats, not strings, and certainly not bools. This smells more like Javascript than Python. If I want to use a bool to index values out I can do that with a dict.

2015-06-23

A Python object to detect when an imported module has had the source code changed

Detecting changes to loaded modules

I'm reworking the web server for my irrigation project. I've already got "kr", which restarts a program whenever it detects the source files have been updated, but first I need the existing web service to close down when there is a change to the source code. For this I implemented the code below, which defines a class. Calling an instance of the class will check the loaded modules (with custom filtering available) for changes to their source code. If a change is found, a callback is made. You can shut down your web server with the callback. The only tricky part is getting the event loop to periodically call the check function. It worked fine for my architecture that doesn't block indefinitely. Later I'll probably add an option for having a background thread do the monitoring.

The file is available on github here.


#!python

#Copyright 2015 Mark Santesson
#
#   Licensed under the Apache License, Version 2.0 (the "License");
#   you may not use this file except in compliance with the License.
#   You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#   Unless required by applicable law or agreed to in writing, software
#   distributed under the License is distributed on an "AS IS" BASIS,
#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#   See the License for the specific language governing permissions and
#   limitations under the License.


# Test with: kr.py *.py -c watch_modules.py


import os
import re
import sys


class ImportedModulesTimestampChecker(object):
    '''
       A class to help detect when one of the source files for a
    running process has changed.
       The returned object should be called periodically to let it check
    the timestamps of all files. There is, as of yet, no multi-threaded
    version.
       Typical use of this module would be to have it track the files
    being actively developed and to call a function resulting in program
    exit (perhaps sys.exit) when a change is detected. This works well
    when combined with "kr" to relaunch the program.

    '''
    def __init__(self, on_difference_fn, filename_filter_fn=None):
        '''
        Takes two parameters:
          on_difference_fn: a function taking the name of the file which
                      changed, which is called when the change is
                      detected.
          filename_filter_fn: a function taking a module's filename and
                      which should return True if the file's timestamp
                      should be observed. If the filename filter is a
                      string, then any filenames containing that string
                      will be observed. If not present, then all modules
                      will be tracked.
        '''
        self._onDifferenceFn = on_difference_fn

        if isinstance(filename_filter_fn, basestring):
            filename_filter_fn = lambda x: filename_filter_fn in x
        self._filenameFilterFn = filename_filter_fn or (lambda x:True)

        self._timestamps = {}

    def __call__(self):
        '''
        Call this periodically to do a check of the timestamps.
        '''
        # Get the timestamps on all matching files in this module's
        # directory. If any have changed, quit.
        all_files = [ x.__file__ for x in sys.modules.values()
                      if isinstance(x,type(sys))
                         and hasattr(x, '__file__')
                         and self._filenameFilterFn(x.__file__)
                    ]
        for module_name in sorted(all_files):
            module_name = re.sub(r'\.pyc$', r'.py', module_name)
            try:
                ts = os.stat(module_name).st_mtime
            except WindowsError:    # TODO: What is the error on Linux?
                logging.exception(module_name)
                pass
            else:
                if ts > self._timestamps.setdefault(module_name, ts):
                    self._onDifferenceFn(module_name)
                    self._timestamps[module_name] = ts


def main():
    import os.path
    import SocketServer
    import logging
    # From an example in the documentation for SocketServer.
    class TinyHandler(SocketServer.StreamRequestHandler):
        def handle(self):
            # Get one line.
            self.data = self.rfile.readline().strip()
            logging.info('Received: %r', self.data)
            out = self.data.lower()
            self.wfile.write(out)
            logging.info('Sending : %r', out)

    class TinySocketServer(SocketServer.TCPServer):
        def __init__(self):
            self._address = ("localhost", 9999)
            SocketServer.TCPServer.__init__( self
                                           , self._address
                                           , TinyHandler )
            self.timeout = 0.5
            self.quit = False
            self._timestampChecker = ImportedModulesTimestampChecker\
                    ( self.on_module_modification
                    , self.module_name_filter
                    )

        @staticmethod
        def module_name_filter(module_name):
            this_mod_name = os.path.basename(__file__).split('.')[0]
            return this_mod_name in module_name

        def on_module_modification(self, module_name):
            logging.info('Module %s was modified, exiting.', module_name)
            self.quit = True

        def run(self):
            logging.info('Listening at: %s', self._address)
            while not self.quit:
                self._timestampChecker()
                self.handle_request()
            logging.info('Quitting.')

    logging.getLogger().setLevel(20)
    tws = TinySocketServer()
    tws.run()

if __name__ == "__main__":
    main()



2015-03-16

Anecdote: That time I had an idea that contributed to the game design

I used to work at Pipeworks Software. We did really cool things like the original XBox demos. I didn't work on Desk Toys, but I did most all of the Ping Pong Ball and Mousetrap room. I did the butterfly flight in Butterfly Garden. I also did part of the boot screen that was used for the original generation XBox. I did the fog, camera paths, and blobby goo.

For Godzilla: Destroy All Monsters Melee" I worked on special effects, rigid body physics, &c. At one point I was coding the results of a monster being thrown into a building. Whenever you destroy a building it is supposed to make the people a bit more angry at you. Unfortunately, when a monster was thrown into a building we had no record of who threw the monster into it. As I contemplated how much work would need to be done to track that information I realized that I could just attribute the destruction to the monster that was thrown. After all, if someone throws me onto an ant pile, the ants are going to get mad at me.

I walked over to a designer and ran the idea past him. It would let the player force some human anger onto their opponent. This wasn't a big deal, but could affect the game as the humans would attack the monster that was doing the most damage to their city. Humans could never kill you, but they could take you down to about 10% health through machine gun fire from helicopters. It seemed like the change would work okay, so I went back and implemented it.

That saved me a few minutes of time and some added complexity to the data structures. And I thought it was a nice feature.

2015-03-11

New Page: My Irrigate Project

I just added a page to the blog introducing my major side project. I've built an Arduino based flow sensor that monitors my irrigation water usage. (It also keeps an eye on my automatic pool filler.) It mails me graphs of water usage every morning so that I can easily see if a sprinkler head has been broken off, thereby saving me money and worry. It also emails me if flow stays on for longer than a set time (in case I've forgotten to turn off water filling the pool). I can check the flow rate in real time from a web page.

I haven't yet packaged up the source, but I'll do that soon. Check it out and leave me comments.


2015-03-09

Xmldump: Dumping Django data state to XML

The problem: Migrating Django model data across model versions

I've been working on learning Django. There's probably a better way of doing this that I don't know about, but as near as I can tell there is no way to back up the state of the db in any way that can migrate across versions. There is

There is a stack overflow question on the topic here. It is an old question, with a few third party solutions. A more recent answer references new Django support for migrations. However, migrations requires that you write significant code for non-trivial changes.

I wanted a solution that would let my classes load themselves from older representations. Preferably automatically, but with the option of custom code for non-trivial cases. I searched around a while and could not find anything, so I wrote it myself.

XMLDump

An example project that illustrates using xmldump is here. The only file you need to use it yourself is serializable.py which defines a mixin. Include it as a base class for any models that you want to be able to dump to xml. The file also includes documentation on how to use it, which I've include that below.


The serializable module can save django data to xml format and
reimport. The intent is that in most cases no customization should
be required. However, I expect that there are still many cases where
the code is not yet able to handle the model relationships.
For instance, I have made no particular effort to implement many to
many relationships.
To generate xml from the models currently in memory:
      import serializable
      xml = serializable.models_to_xml([TopLevelModelClass1, Class2])

The xml that is returned is an instance of the xml.etree.ElmentTree.Element
class. To display it in a readable format:
      from xml.dom.minidom import parseString
      doc = parseString( xml.etree.ElmentTree.tostring(xml) )
      xml_string = doc.toprettyxml('  ')

In order to delete all the data in memory (in preparation for reloading it,
presumably):
      delete_all_models_in_db([TopLevelModelClass1, Class2])

The list of class names that must be passed to the functions need not
contain all the model classes. They should contain the highest level
classes only. The module will discover contained classes, whether they
be models which are referred to by a foreign key, or models which refer
to the high level model class through a foreign key.
However, it may be advisable to overload the "owned_models" function
in order to restrict the default discovery. In some cases it makes
for a nicer xml nesting if references are not followed all the way
down.

Example output

<?xml version="1.0" ?>
<ModelData>
  <xmldump.models.Menu>
    <id type="int">1</id>
    <name type="unicode">Breakfast</name>
    <___owned>
      <xmldump.models.MenuItem>
        <menu to_type="xmldump.models.Menu" type="reference">1</menu>
        <price type="float">4.0</price>
        <id type="int">1</id>
        <name type="unicode">Spam and Eggs</name>
      </xmldump.models.MenuItem>
      <xmldump.models.MenuItem>
        <menu to_type="xmldump.models.Menu" type="reference">1</menu>
        <price type="float">4.5</price>
        <id type="int">2</id>
        <name type="unicode">Eggs and Spam</name>
      </xmldump.models.MenuItem>
      <xmldump.models.MenuItem>
        <menu to_type="xmldump.models.Menu" type="reference">1</menu>
        <price type="float">5.0</price>
        <id type="int">3</id>
        <name type="unicode">Spammity Spam</name>
      </xmldump.models.MenuItem>
      <xmldump.models.MenuItem>
        <menu to_type="xmldump.models.Menu" type="reference">1</menu>
        <price type="float">3.0</price>
        <id type="int">4</id>
        <name type="unicode">Spam</name>
      </xmldump.models.MenuItem>
    </___owned>
  </xmldump.models.Menu>
  <xmldump.models.Order>
    <customer type="unicode">Brian</customer>
    <date type="date">2015-03-04</date>
    <id type="int">1</id>
    <___owned>
      <xmldump.models.OrderEntry>
        <order to_type="xmldump.models.Order" type="reference">1</order>
        <count type="int">1</count>
        <menuitem to_type="xmldump.models.MenuItem" type="reference">1</menuitem>
        <id type="int">1</id>
      </xmldump.models.OrderEntry>
      <xmldump.models.OrderEntry>
        <order to_type="xmldump.models.Order" type="reference">1</order>
        <count type="int">1</count>
        <menuitem to_type="xmldump.models.MenuItem" type="reference">2</menuitem>
        <id type="int">2</id>
      </xmldump.models.OrderEntry>
      <xmldump.models.OrderEntry>
        <order to_type="xmldump.models.Order" type="reference">1</order>
        <count type="int">2</count>
        <menuitem to_type="xmldump.models.MenuItem" type="reference">4</menuitem>
        <id type="int">3</id>
      </xmldump.models.OrderEntry>
    </___owned>
  </xmldump.models.Order>
</ModelData>


(Thanks to Free Online XML Escape Tool - freeformatter.com.)

2015-03-04

Date Time and Datetime conversions

Date Time, Timestamp and Datetime conversions

Python has a great class for storing and manipulating the date and time: datetime.

Unfortunately, it is not the only way to refer to a point in time. Sometimes you need to convert between the two and I frequently forget how to do it.

Table of Conversions

Here is a table of conversions. Pick the row corresponding to the form of time that you have (from the left), and them pick the column corresponding to the form that you would like (from the top). The intersecting cell is the code snippet to get you there.

I have some measure of confidence in the correctness since the table, indeed, this entire post, was created by a script which tested the code shown in the table and lists below. That script is available here. However, I expect that many of the conversions can be improved upon and many may be subtly, or not so subtly, incorrect. Please let me know if you have an improvement.

  • "dt" represents an input datetime.
  • "t" represents an input time struct.
  • "secs" represents an input timestamp (seconds since the Epoch GMT).
  • "localtz" represents the local timezone. This is necessary to account for functions or formats that convert to or from the local timezone.
  • "othertz" represents a timezone that you want to to have. It can be UTC or any other pytz timezone.
  • You need to import "pytz" for timezones.
  • You need to import the "datetime" class from the the datetime module.
  • Note that time structs can not store time at a resolution finer than a second.

I'm sorry, but this table just isn't going to work on mobile. Below is a breakout by the type of representation that you are converting to.

Table of Conversions between various date and time formats
To
Timezone Naive Datetime Local
To
Timezone Naive Datetime UTC
To
Timezone Aware Datetime
To
Time Struct Local
To
Time Struct UTC
To
Timestamp
Build as of now datetime . now ( ) datetime . utcnow ( ) localtz . localize ( datetime . now ( ) ) . astimezone ( othertz ) time . localtime ( ) time . gmtime ( ) time . time ( )
From
Timezone Naive Datetime Local
localtz . localize ( dt ) . astimezone ( pytz . utc ) . replace ( tzinfo = None ) localtz . localize ( dt ) . astimezone ( othertz ) dt . timetuple ( ) localtz . localize ( dt ) . utctimetuple ( ) time . mktime ( dt . timetuple ( ) ) + 1e-6 * dt . microsecond
From
Timezone Naive Datetime UTC
pytz . utc . localize ( dt ) . astimezone ( localtz ) . replace ( tzinfo = None ) pytz . utc . localize ( dt ) . astimezone ( othertz ) pytz . utc . localize ( dt ) . astimezone ( localtz ) . timetuple ( ) dt . utctimetuple ( ) calendar . timegm ( pytz . utc . localize ( dt ) . timetuple ( ) ) + 1e-6 * dt . microsecond
From
Timezone Aware Datetime
dt . astimezone ( localtz ) . replace ( tzinfo = None ) dt . astimezone ( pytz . utc ) . replace ( tzinfo = None ) dt . astimezone ( localtz ) . timetuple ( ) dt . utctimetuple ( ) calendar . timegm ( dt . utctimetuple ( ) ) + 1e-6 * dt . microsecond
From
Time Struct Local
datetime ( *t[:6] ) datetime . utcfromtimestamp ( time . mktime ( t ) ) localtz . localize ( datetime ( *t[:6] ) ) . astimezone ( othertz ) time . gmtime ( time . mktime ( t ) ) time . mktime ( t )
From
Time Struct UTC
pytz . utc . localize ( datetime ( *t[:6] ) ) . astimezone ( localtz ) . replace ( tzinfo = None ) datetime . utcfromtimestamp ( calendar . timegm ( t ) ) pytz . utc . localize ( datetime ( *t[:6] ) ) . astimezone ( othertz ) time . localtime ( calendar . timegm ( t ) ) calendar . timegm ( t )
From
Timestamp
datetime . fromtimestamp ( secs ) datetime . utcfromtimestamp ( secs ) localtz . localize ( datetime . fromtimestamp ( secs ) ) time . localtime ( secs ) time . gmtime ( secs )

List of Specific Conversions

Timezone Naive Datetime Local
From To Timezone Naive Datetime Local
Timezone Naive Datetime UTC pytz.utc.localize(dt).astimezone(localtz).replace(tzinfo=None)
Timezone Aware Datetime dt.astimezone(localtz).replace(tzinfo=None)
Time Struct Local datetime(*t[:6])
Time Struct UTC pytz.utc.localize(datetime(*t[:6])).astimezone(localtz).replace(tzinfo=None)
Timestamp datetime.fromtimestamp(secs)

To build as of now: datetime.now()


Timezone Naive Datetime UTC
From To Timezone Naive Datetime UTC
Timezone Naive Datetime Local localtz.localize(dt).astimezone(pytz.utc).replace(tzinfo=None)
Timezone Aware Datetime dt.astimezone(pytz.utc).replace(tzinfo=None)
Time Struct Local datetime.utcfromtimestamp(time.mktime(t))
Time Struct UTC datetime.utcfromtimestamp(calendar.timegm(t))
Timestamp datetime.utcfromtimestamp(secs)

To build as of now: datetime.utcnow()


Timezone Aware Datetime
From To Timezone Aware Datetime
Timezone Naive Datetime Local localtz.localize(dt).astimezone(othertz)
Timezone Naive Datetime UTC pytz.utc.localize(dt).astimezone(othertz)
Time Struct Local localtz.localize(datetime(*t[:6])).astimezone(othertz)
Time Struct UTC pytz.utc.localize(datetime(*t[:6])).astimezone(othertz)
Timestamp localtz.localize(datetime.fromtimestamp(secs))

To build as of now: localtz.localize(datetime.now()).astimezone(othertz)


Time Struct Local
From To Time Struct Local
Timezone Naive Datetime Local dt.timetuple()
Timezone Naive Datetime UTC pytz.utc.localize(dt).astimezone(localtz).timetuple()
Timezone Aware Datetime dt.astimezone(localtz).timetuple()
Time Struct UTC time.localtime(calendar.timegm(t))
Timestamp time.localtime(secs)

To build as of now: time.localtime()


Time Struct UTC
From To Time Struct UTC
Timezone Naive Datetime Local localtz.localize(dt).utctimetuple()
Timezone Naive Datetime UTC dt.utctimetuple()
Timezone Aware Datetime dt.utctimetuple()
Time Struct Local time.gmtime(time.mktime(t))
Timestamp time.gmtime(secs)

To build as of now: time.gmtime()


Timestamp
From To Timestamp
Timezone Naive Datetime Local time.mktime(dt.timetuple())+1e-6*dt.microsecond
Timezone Naive Datetime UTC calendar.timegm(pytz.utc.localize(dt).timetuple())+1e-6*dt.microsecond
Timezone Aware Datetime calendar.timegm(dt.utctimetuple())+1e-6*dt.microsecond
Time Struct Local time.mktime(t)
Time Struct UTC calendar.timegm(t)

To build as of now: time.time()


Timezones

import pytz, datetime
pytz.timezone('UTC').localize(datetime.utcnow())
pytz.timezone('US/Central').localize(datetime.now())
print 'Common timezones:',pytz.common_timezones

References

2015-02-16

Performance of Membership Tests


Some people think that performance in Python is inscrutable. But surely there is one thing we can all agree on... membership tests should be done on something hashed, like dictionaries or sets, not on a list.

Metrics: M values between 0 and N-1 are in a collection. How long does it take to test for each of N possibilities in turn? These values are the total search time divided by the number of searches, so they are an average time per single lookup.



Analysis

The list is, unsurprisingly, a couple of magnitudes worse than the set and dict. Note that the amount of time it takes to lookup a value in a list is linear with the number of elements. (The value of each increasing dot increases exponentially, so a constant distance between dots on a log graph implies linear growth.) This makes intuitive sense.

The time it takes to look up a value in a set or dict is suspiciously identical. In fact, they use the same algorithm and share some code. You can think of sets as dictionaries with an unused value.

Digression - Python implementation of dict

In Python a dict is a hash map. (Maps don't need to be hash mapped; they could be a binary tree like in C++.) Basically, the key is turned into an integer value by way of the hash function, __hash__. The hash value modulo the size of the hash indicates into which slot the key/value pair is inserted. Sometimes, multiple keys will hash to the same slot and "collide". When this happens, there are two solutions: open or closed addressing. With open addressing (such as for a Python dict) the next1 slot is used. With closed addressing, each slot is a linked list of all entries which hash to that slot2.
Clearly, open addressing cannot contain more key/value pairs than there are slots, so they will automatically resize when they get nearly full. There is a fairly severe performance degradation that happens when the hash begins to be about three quarters full; hashes typically resize at a given occupancy rate3.


Notes:

1: ... for certain values of "next". Sometimes the next slot is the previous slot index plus one and progresses linearly, other times it progresses quadratically. Other times it may be based upon the hash (before the modulo). It may take into account how many times it has tried to find a slot. The exact algorithm doesn't matter just so long as it is repeatable and gets to a blank slot quickly.

2: Using a linked list is simpler than scanning ahead, but it has problems of its own. There can be serious performance degradation if the keys hash to the same slot (not just the same hash value, which would trip up open addressing as well). This makes the closed addressing solution particularly unsuitable where the keys may be chosen maliciously. Using something like a tree rather than a list would help, but not eliminate, the issue. Open addressing could also be susceptible to malicious keys, although such implementations must support dynamic resizing, which helps a bit (both in complicating the attack and automatically resolving one).

3: Like any resize operation, the complexity is probably O(n), but amortized to O(1). Some applications may need better time guarantees. It is possible to incrementally resize the hash. For instance, any addition could be added to the new, double-sized hash, while queries could check both hash locations. The hashing function is probably the most expensive operation, and that can be reused. The modulo would be done twice. Any query could also migrate the value to the new hash. (But the slot should still be marked as occupied, or else searching for collided values may fail). Any insert should be accompanied with a migration of an old value. That way, it is guaranteed that the old hash has been completely migrated by the time the new (double-sized) hash has filled up.


Graph Analysis

On the first graph the X axis represents N, which is the max potential value in the collection. Darker colors represent more actual entries in the collection. The downwards slope to the right indicates that for a given collection size, a set/dict will get more efficient the more times it is queried. Another possibility is that the collection gets more efficient the sparser the data that it contains, but this seems unlikely. Why would a collection containing 10,20,30 be more efficient than one containing 100,200,300? The most inefficient query is the one where the collection is fully populated. Sets/Dicts use hashing. Hashes usually work by using a function called "the hash function" to turn the contents into a numeric key. This key is used to find a position in an array. In the case of collisions, the next position is checked to determine if the key is located there. When the collection gets full it takes longer to see if the key is present.

The next graph shows something different: The shading and the X axis have been swapped, so that the X axis represents the number of items in the collection, while the shade represents the maximum value that is present.




The time to test membership in lists seem to be entirely dependent upon the length of the list. This makes sense if the collection is generally sparsely populated, and it must generally search the whole list before confirming that the item is not there. (Even if it is not there, the complexity is O(n), so it is still linear.)

Perhaps less obvious is that dictionaries and sets perform best, and identically, until the time that they near full population in the collection. I.e., if the dict/set has X members, the numbers that are contained are between 0 and X. Presumably this is the effect of hash slot collision... the tight clustering of values may be causing more has collisions.

Also, the time to find an object in a dictionary is fairly constant, but it does appear to be rising slowly as the number of entries in the dictionary is increased.



If the hash function is complex relative to the comparison function, and if the most likely values to be searched for are located earliest in the list, then it may make sense to use a list instead of a dict. Frankly, however, I think the graph above really demonstrates how extreme the situation must be for this to be true. It was generated with 0.9 probability, so that it was 90% likely that the first item is the one requested. Failing that, it is 90% likely to ask for the second item, and so on. That is a pretty extreme case. Hooray for efficient dicts and sets!

See also:

2015-02-09

Setting up clamav antivirus on Ubuntu

Installing Anti-virus on Ubuntu

My wife's been using a Linux box in the kitchen as her primary web browsing computer. It also hosts my version control servers that back up everything that matters in the world. I figured that it was time I installed some anti-virus on it. Clamav seemed to be the simplest/best option.

The only hitch is that I don't get to sit at the computer much. Mostly I SSH in from the bus, but I do that infrequently. I can crontab the scan, but I really need the results pushed to me. For another program I've written a module that will send an email using a secondary gmail account, so I just needed to hook clamav up to it.

cp_email.py

cp_email.py takes command line parameters to indicate how to send the email, and then runs a command and sends the results. This was easier than setting up email on the Linux box so that it could send email natively. Cron can email the results, but I didn't want to hook that up. This way I can add wrapping code to do arbitrary post-processing (filters, summarizing). It is also more easily portable.

I was rather concerned with security. It would be foolish to include a plaintext password on the command line, as that can be seen by all processes running on the machine. The --ob will perform a trivial de-obfuscation on the password. Each character will be converted into the preceding ascii value. If the password is "cat", the obfuscated password (that should be given to cp_email) is "dbu". This obviously cannot stop a mildly determined attacker. The preferred method of specifying a password is by reference to a text file. A password preceded by an at sign ("@") is taken to be a filename. The file is loaded. If there are multiple lines in the file then the password is the taken from the last line in the file. This method can also be used to specify the username, where the first line of a multi-line file is taken. This allows the username and password to specified in the same file, which should, of course, be read protected from the world. Only the user should be able to read it. It should also be ignored by version control so that it is not available to all those who can access the source. @ and --ob can be used together for a small extra measure of security.


crontab -e

Here is my crontab entry:
0 4 * * * (rm /tmp/scan ; ((clamscan -i -l /tmp/scan -z --exclude-dir="^/(dev|cdrom|media/cdrom|sys)" -r /)) ; chmod a+r /tmp/scan ; ( cd /home/myusr/dir_with_password ; su -c "python /home/myusr/lib/cp_email.py --ob run-and-send @password @password recipient@email.com cat /tmp/scan --subject='Antivirus'" myusr))

I could have executed clamav directly from the cp_email script. However, clamav needs to run as root to be able to see all the files to scan and I didn't want root to be running a program which is held in version control and might change. If I did want to run it that way, then this would be the appropriate crontab entry:
0 4 * * * ( cd /home/usrofsvn/markets/Code/irrigate ; python ../lib/cp_email.py --ob run-and-send @password @password recipient@email.com clamscan -r / --subject="Antivirus run")

cp_email.py --help

usage: cp_email.py run-and-send [-h]
                                [--loglevel {CRITICAL,ERROR,WARNING,INFO,DEBUG}]
                                [--ob] [--subject SUBJECT]
                                username password recipients args [args ...]

positional arguments:
  username              The username to use to log into gmail. The username must

                        be an @gmail.com address. If preceded by @, then
                        the value indicates a filename. The first line of the
                        file contents will be used for the username.
  password              The password, or, if preceded by @, the filename where
                        the password is stored. If the file contains multiple
                        lines, it will take the password from the last line.
  recipients            Comma separated list of emails to receive email.
                        (Specify "-" for the sending username.)
  args                  The remaining arguments are the command to run.

optional arguments:
  -h, --help            show this help message and exit
  --loglevel {CRITICAL,ERROR,WARNING,INFO,DEBUG}
                        (default: INFO)
  --ob                  Enable elementary password obfuscation (ROT1)
  --subject SUBJECT     The subject for the email.