Python UK business days with pandas

Here’s how to calculate UK business days (well at least for England & Wales) using pandas ‘s holiday calendar.

First you’ll need this calendar for UK holidays:

from pandas.tseries.holiday import (
    AbstractHolidayCalendar, DateOffset, EasterMonday,
    GoodFriday, Holiday, MO,
    next_monday, next_monday_or_tuesday)
class EnglandAndWalesHolidayCalendar(AbstractHolidayCalendar):
    rules = [
        Holiday('New Years Day', month=1, day=1, observance=next_monday),
        GoodFriday,
        EasterMonday,
        Holiday('Early May bank holiday',
                month=5, day=1, offset=DateOffset(weekday=MO(1))),
        Holiday('Spring bank holiday',
                month=5, day=31, offset=DateOffset(weekday=MO(-1))),
        Holiday('Summer bank holiday',
                month=8, day=31, offset=DateOffset(weekday=MO(-1))),
        Holiday('Christmas Day', month=12, day=25, observance=next_monday),
        Holiday('Boxing Day',
                month=12, day=26, observance=next_monday_or_tuesday)
    ]

It was tested with the dates from gov.uk so should be fine to use, but please let me know if you find anything wrong with it.

Now you can do stuff like:

from datetime import date
from pandas.tseries.offsets import CDay
business = CDay(calendar=EnglandAndWalesHolidayCalendar())
>>> date.today()
datetime.date(2016, 3, 2)
>>> five_business_days_later = date.today() + 5 * business
>>> five_business_days_later
Timestamp('2016-03-09 00:00:00')
>>> five_business_days_later.date()
datetime.date(2016, 3, 9)
>>> date.today() - business
>>> date(2016, 12, 25) + business
Timestamp('2016-12-28 00:00:00')

You can also just retrieve the UK holidays for a specific year as a list of datetime objects using e.g.:

>>> holidays = EnglandAndWalesHolidayCalendar().holidays(
    start=date(2016, 1, 1),
    end=date(2016, 12, 31))
>>> holidays.tolist()
[Timestamp('2016-01-01 00:00:00'), Timestamp('2016-03-25 00:00:00'), Timestamp('2016-03-28 00:00:00'), Timestamp('2016-05-02 00:00:00'), Timestamp('2016-05-30 00:00:00'), Timestamp('2016-08-29 00:00:00'), Timestamp('2016-12-26 00:00:00'), Timestamp('2016-12-27 00:00:00')
>>> holidays.to_pydatetime()
array([datetime.datetime(2016, 1, 1, 0, 0),
       datetime.datetime(2016, 3, 25, 0, 0),
       datetime.datetime(2016, 3, 28, 0, 0),
       datetime.datetime(2016, 5, 2, 0, 0),
       datetime.datetime(2016, 5, 30, 0, 0),
       datetime.datetime(2016, 8, 29, 0, 0),
       datetime.datetime(2016, 12, 26, 0, 0),
       datetime.datetime(2016, 12, 27, 0, 0)], dtype=object)
>>> h.to_native_types()
['2016-01-01', '2016-03-25', '2016-03-28', '2016-05-02', '2016-05-30', '2016-08-29', '2016-12-26', '2016-12-27']

Pandas has all sorts of funny stuff you can do with series and time series in particular. Also check pandas’s docs about Custom Business Days especially the warning about possible timezone issues.

If you don’t need the power of pandas or you just don’t like it (maybe because it pulls in a bazillion dependencies and includes a gazillion modules), workalendar looks pretty good.

Comments

The origins of the class Meta idiom in python

So I keep finding this class Meta idiom in python APIs lately. Found it in factory-boy and WTForms and I suspected they both got it from Django, but I googled and couldn’t find any explanations of the reason for it or where it came from or why they’re all it class Meta. So here it is!

TL;DR What it is

The inner Meta class has absolutely no relation to python’s metaclasses. The name is just a coincidence of history (as you can read below).

There’s nothing magical about this syntax at all, here’s an example from Django’s documentation:

class Ox(models.Model):
    horn_length = models.IntegerField()
    class Meta:
        ordering = ["horn_length"]
        verbose_name_plural = "oxen"

Having an inner Meta class makes it easier for both the users and the ORM to tell what is a field on the model and what is just other information (or metadata if you like) about the model. The ORM can simply do your_model.pop('Meta') to retrieve the information it needs. You can also do this in any library you implement just as factory-boy and WTForms have done.

Some early Django history

Now for the longer story. I did some software archaeology, which should totally be a thing!, and discovered the first commit which mentions class Meta (actually class META1) in Django: commit 25264c86. There is a Release Notes wiki page which includes that change.

From there we can see how Django models were declared before the introduction of the internal class Meta. A Django Model class had a few special attributes. The db_table attribute held the SQL table name. A fields attribute was a tuple (!) of instances of field types (e.g. CharField, IntegerField, ForeignKey). These mapped to SQL table columns. One other interesting attribute was admin which was mostly used to describe how that model would behave in django’s admin interface. Now all these classes were defined in the django.core.meta package i.e. meta.Model, meta.CharField, meta.ForeignKey, meta.Admin. That’s so meta! (and probably where the name came from in the end)

In a django-developers mailing list thread from July 2005 titled Cleaner approach to ORM fields description user deelan suggests bringing some of SQLObject ‘s ideas to Django’s ORM. This seems to be the first seed of the idea of having an inner class in Django to store part of a model’s attributes:

it’s desiderable to avoid name clashes between fields, so it would be
good to have a way to wrap fields into a private namespace.
deelan, Cleaner approach to ORM fields description

At the end of the thread, django ticket 122 is created which seems to contain the first mention of a separate internal Meta class.

What started off as a backwards-compatible change, soon turned backwards-incompatible and was the first really big community-driven improvement to Django as Adrian Holovaty will later describe it in the release announcement which included the change.

The first patch on ticket 122 by Matthew Marshall started by suggesting that fields should be able to be defined directly on the model class, as class attributes (Build models using fieldname=FieldClass) rather than in the fields list. So:

class Poll(meta.Model):
    question = meta.CharField(maxlength=200)
    pub_date = meta.DateTimeField('date published')

rather than:

class Poll(meta.Model):
    fields = (
        meta.CharField(maxlength=200),
        pub_date = meta.DateTimeField('date published'),
    )

But also that there should be two ways of defining a ForeignKey:

ForeignKey = Poll, {'edit_inline':True, 'num_in_admin':3}
.
#the attribute name is irrelevant here:
anything = ForeignKey(Poll, edit_inline=True, num_in_admin=3)

In his first comment, mmarshall introduces the inner class Meta to hold anything that’s not a field: the table name (strangely renamed to module_name) and the admin options. The fields would be class attributes.

The decision over what goes in an inner class and what goes in the outer class seems to be left to the user. An optional class Field inner class would be supported so the fields would live there and the metadata would live as class attributes (this seemed to offer the advantage of being backwards-compatible with the admin class attribute while allowing tables to have a column that’s also named admin.

There are some other ideas thrown around and the syntax for ForeignKey is also discussed. At one point, Adrian Holovaty (adrian) intervenes to say (about the original class Meta/class Field suggestion):

It’s too flexible, to the point of confusion. Making it possible to do either class Meta or class Field or plain class attributes just smacks of “there’s more than one way to do it.” There should be one, clear, obvious way to do it. If we decide to change model syntax, let’s have class Meta for non-field info, and all fields are just attributes of the class.
— Adrian Holovaty, django ticket 122, comment 9

The thread goes on from there. There are some detractors to the idea (citing performance and conformance to other python APIs), there are discussions about implementation details and talking again about the ForeignKey syntax.

Then, in a dramatic turn of events, Adrian Holovaty closes the ticket as wontfix!:

Jacob [Kaplan-Moss] and I have talked this over at length, and we’ve decided the model syntax shouldn’t change. Using a fieldname=FieldClass syntax would require too much “magic” behind the scenes for minimal benefit.
— Adrian Holovaty, django ticket 122, comment 33

It’s interesting, because IMHO this was a huge differentiator in terms of making django’s models API more human and was also what other frameworks like Rails and SQLObject were doing at the time.

An IRC discussion is then referenced in the ticket.2 From that discussion, it seems that adrian’s reasons for closing were mostly concerns about the ForeignKey syntax and making a backwards-incompatible change to the model. rmunn does a great job of moderating the discussion, clarifying the situation and everyone’s opinions while strongly pushing for the new syntax.

The trac ticket is reopened as a consequence and it looks like smooth-sailing from the on. Some days later the new syntax is merged and the ticket is once again closed, this time with Resolution set to fixed.

Adrian will later announce the change in a django-developers mailing list post. Here are some interesting fragments from that mailing list post:


I apologize for the backwards-incompatibility, but this is still unofficial software. ;-) Once we reach 1.0 — which is much closer now that the model syntax is changed — we’ll be very dedicated to backwards-compatibility.

I can’t think of any other backwards-incompatible changes that we’re planning before 1.0 (knock on wood). If this isn’t the last one, though, it’s at least the last major one.
— Adrian Holovaty, IMPORTANT: Django model syntax is changing

Things didn’t go as planned. In May 2006, came commit f69cf70e which was exactly another let’s-change-everything-in-one-huge-branch commit which was released as part of Django 0.95. As part of this API change, class META was renamed to class Meta (because it’s easier on the eyes). You can find the details on RemovingTheMagic wiki page. It’s funny how in ticket 122 all the comments use the Meta capitalization, except for the last person (who I guess submitted the patch) who uses META. There was some discussion, both in the ticket and on IRC, about it and a few people had concerns that users of Django would actually want to have a field called Meta in their models and the inner class name would clash with that.

That’s it. Almost…

Anyway, so that’s the end of the story of how Django got its class Meta. Now what if I told you that all of this had already happened more than one year before in the SQLObject project? Remember that first post to django-developers which said Django models should hold some of its attributes in a separate inner class like SQLObject already does?

In April 2004, Ian Bicking (creator of SQLObject) sent an email to the sqlobject-discuss mailing list:

There’s a bunch of metadata right now that is being stored in various instance variables, all ad hoc like, and with no introspective interfaces. I’d like to consolidate these into a single object/class that is separated from the SQLObject class. This way I don’t have to worry about name clashes, and I don’t feel like every added little interface will be polluting people’s classes. (Though most of the public methods that are there now will remain methods of the SQLObject subclasses, just like they are now) So I’m looking for feedback on how that should work.
— Ian Bicking, Metadata container

His code example:

class Contact(SQLObject):
     class sqlmeta(SQLObject.sqlmeta):
         table = 'contact_table'
         cacheInstances = False
     name = StringCol()

SQLObject’s community did not seem nearly as animated as Django’s. There were a couple of emails on the sqlobject-discuss mailing list from Ian Bicking which included the proposal and asked for feedback. I suspect some discussion happened through some other channels, but this community was neither as big nor as good at docummenting its functioning as Django. (And sourceforge’s interface to the mailing list archives and cvs logs does not make this easy to navigate).

A year later, Ian Bicking takes part in the django-developers mailinglist discussion where he makes some small syntax suggestions, but it does not seem that he made any other contributions to the design of this part of the Django models API.

Conclusion

As far as I could tell, Ian Bicking is the originator of the idea of storing metadata in a metadata container inner class. Although it was the Django project which settled on the class Meta name and popularised it outside of its own community.

Anyway, that’s the end of the story. To me, it shows just how awesome open source and the open internet can be. The fact that I was able to find all of this 11 years later, complete with the original source code, commit logs and all the discussion around the implementation on the issue tracker, mailing lists and IRC logs is just amazing community work and puts a tear in my eye.

Hope you’ve enjoyed the ride!

1 because in 2005, people were less soft-spoken

2 It’s very fun, you should read it. At some point someone’s cat catches a blue jay. And I think they meant it literally.

Comments

Run py.test test case inside any python class

I wanted a way to run the current test I was editing when I know the name of the method, but I’m too lazy to scroll up and see which class it’s defined in and then have to type that out as well. So I found a way for py.test to run any test in any class in a file by just giving it the name of that test or part of its name.

Instead of typing:

$ py.test path/to/my/test/test_file.py::MySuperLongClassNameTest::test_my_thing

I can just type:

$ py.test path/to/my/test/test_file.py -k test_my_thing

Or even:

$ py.test path/to/my/test/test_file.py -k test_my

Or even:

$ py.test path/to/my/test/test_file.py -k "thing or stuff"

Woo py.test!

FWIW, nosetests can also do this with the -m option which supports regexp. But AFAIK it does not support the human-friendly " or " interface like in the last example above.

Comments

AST literal_eval


Safely evaluate an expression node or a Unicode or Latin-1 encoded string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.

This can be used for safely evaluating strings containing Python values from untrusted sources without the need to parse the values oneself. It is not capable of evaluating arbitrarily complex expressions, for example involving operators or indexing.

—From python’s ast library

I used to discredit everything that meant converting a string to an arbitrary data structure. This is a nice third option which seems like it would be useful in the majority of cases.

Comments

Change the system timezone for the python interpreter

Wrapping my head around timezones is hard. And testing the implications of working with different timezones is especially difficult since I now live in GMT (which is mostly the same as UTC).

I found a way to change the timezone, that is used by most of python’s stdlib, by changing the TZ environment variable:

$ python -c 'import time; print(time.tzname)'
('GMT', 'BST')
$ TZ='Europe/Stockholm' python -c 'import time; print(time.tzname)'
('CET', 'CEST')
Comments

Mocking python's file open() builtin

I was working on a method to read some proxy information from several files today and then I wanted to test it.

A very simplified version (the original has all the different files being processed in different functions on different rules and it actually has error handling) of this function is this:

SYS_PROXY = '/etc/sysconfig/proxy'
CURL_PROXY = '/root/.curlrc'
def get_proxy():
    with open(SYS_PROXY) as f:
        contents = f.read()
        if 'http_proxy' in contents:
            proxy = contents.split('http_proxy = ')[-1] 
            if proxy:
                return proxy
    with open(CURL_PROXY) as f:
        contents = f.read()
        if '--proxy' in contents:
            proxy = contents.split('--proxy ')[-1] 
            if proxy:
                return proxy
    return os.getenv('http_proxy')

As unit tests should be self-contained, they shouldn’t read any files on disk. So we need to mock them. I generally use Michael Foord’s mock.

In order to intercept calls to python’s open(), we need to mock the builtins.open function:

TEST_PROXY = 'http://example.com:1111'
def test_proxy_url_in_sysproxy(self):
    with mock.patch("builtins.open",
                    return_value=io.StringIO("http_proxy = " + TEST_PROXY)):
         self.assertEqual(TEST_PROXY, get_proxy())

We’re good so far. Now we add the next natural test: we didn’t find anything in sysconfig, but we find the right proxy URL on our second try in CURL_PROXY:

def test_proxy_url_not_in_sysproxy_but_in_yastproxy(self):
    with mock.patch("builtins.open", return_value=io.StringIO()):
        with mock.patch("builtins.open",
                        return_value=io.StringIO(' --proxy ' + TEST_PROXY)):
            self.assertEqual(TEST_PROXY, get_proxy())

Urgh. That’s starting to look a bit clunky. It’s also wrong since the inner with statement ends up overriding the outer one and all we get for our second open() call is a closed file object:

ValueError: I/O operation on closed file.

Not to worry though. mock side_effect have got us covered!

def test_proxy_url_not_in_sysproxy_but_in_yastproxy(self):
    with mock.patch("builtins.open",
                    side_effect=[io.StringIO(),
                                 io.StringIO(' --proxy ' + TEST_PROXY)]):
        self.assertEqual(TEST_PROXY, get_proxy())

The code looks cleaner now. A bit. And at least it works. But the list we pass in to side_effect makes another issue pop up. We now seem to be dependent on the order that the files are opened and read. That seems clunky. If we had to refactor our code to change the order that we read files in get_proxy() we would also had to change all our tests. Also it’s not quite obvious why we’re setting our return values as side effects.

Ideally we’d have a way to assign each result to a filename and then not have to care about the order in which the files are open. In real life we would have two files with different contents anyway.

So let’s implement that method. We, of course, want to make it a context manager.

@contextmanager
def mock_open(filename, contents=None):
    def mock_file(*args):
        if args[0] == filename:
            return io.StringIO(contents)
        else:
            return open(*args)
    with mock.patch('builtins.open', mock_file):
        yield

So we only intercept the filename that we want to mock and let everything else pass through to builtins.open(). The yield is there because a contextmanager should be a generator function. Everything before the yield gets executed when entering the with mock_open ... statement, then the content of the with block is executed and then everything after the yield in our mock_open function (there’s nothing there in our case).

def test_proxy_url_not_in_sysproxy_but_in_yastproxy(self):
    with mock_open(SYS_PROXY):
        with mock_open(CURL_PROXY, ' --proxy ' + TEST_PROXY):
            self.assertEqual(TEST_PROXY, get_proxy())

Looks good.

RuntimeError: maximum recursion depth exceeded in comparison

Oops. It seems that we got into infinite recursion because we’re calling the mocked open() from the mocking function. We have to make sure that once we’ve mocked a call to open(), there’s no way we’re going to go through that mock again. Thankfully, the mock library provides methods to turn mocking on and off without using the with mock.patch context manager. Take a look at mock.patch’s start and stop methods.

@contextmanager
def mock_open(filename, contents=None):
    def mock_file(*args):
        if args[0] == filename:
            return io.StringIO(contents)
        else:
            mocked_file.stop()
            open_file = open(*args)
            mocked_file.start()
            return open_file
    mocked_file = mock.patch('builtins.open', mock_file)
    mocked_file.start()
    yield
    mocked_file.stop()

So we had to replace the with mock.patch statement with manually start()-ing and stop()-ing the mocking functionality before and after the yield. That’s basically what the with statement was doing, we just needed the indentifier so we can use it in the else branch.

In the else branch we turn off the mocking before calling open() (that’s what was causing us to go in the infinite loop). After we’ve called open(), we go back to mocking open(), in case there will be a future call that we actually do want to mock.

Test code now looks the same as before:

def test_proxy_url_not_in_sysproxy_but_in_yastproxy(self):
    with mock_open(SYS_PROXY):
        with mock_open(CURL_PROXY, ' --proxy ' + TEST_PROXY):
            self.assertEqual(TEST_PROXY, get_proxy())

But this time it works. So we could all go home now.

But say we wanted to ensure that no files were opened inside the with mock_open block other than the ones we mocked. It seems like a pretty sensible thing to do. Unit tests should be completely self-contained so you want to ensure they won’t be opening any files on the system. This would also catch some bugs that might only later pop-up on your CI server’s test runs, because of a custom development machine configuration.

The problem is pretty simple if you use only one with mock_open block, but once you start using more than one nested contest managers you have a problem. You need to have a way to communicate between the different context-managers. Ideally you’d have a way for each context-manager to say to the others (after it’s finished processing): hey, I finished my work here, but some dude opened a file which I didn’t mock. Did you mock it?.

So how do we solve that? We’ll use global variables! No. Just kidding.

We’ll use exceptions. Simply make the inner statement raise a custom NotMocked exception and let the enclosing context managers catch.If none of the enclosing context managers mock the file that was opened in the inner block, they just let the user deal with the exception.

So the exception can be a normal Exception subclass, but we need an extra bit of information, the filename that wasn’t mocked. I’ll also hardcode an error message in there:

class NotMocked(Exception):
    def __init__(self, filename):
        super(NotMocked, self).__init__(
            "The file %s was opened, but not mocked." % filename)
        self.filename = filename

The updated mock_open code looks like this:

@contextmanager
def mock_open(filename, contents=None, complain=True):
    open_files = []
    def mock_file(*args):
        if args[0] == filename:
            f = io.StringIO(contents)
            f.name = filename
        else:
            mocked_file.stop()
            f = open(*args)
            mocked_file.start()
            open_files.append(f.name)
        return f
    mocked_file = mock.patch('builtins.open', mock_file)
    mocked_file.start()
    try:
        yield
    except NotMocked as e:
        if e.filename != filename:
            raise
    mocked_file.stop()
    for open_file in open_files:
        if complain:
            raise NotMocked(open_file)

So we’re recording all the files that were opened in the open_files list. Then after all the code inside the with block was executed, we go through the open_files list and raise a NotMocked exception for each of those file names. We also added a new complain parameter just in case someone would like to turn this functionality off (maybe they want to use file fixtures after all).

The StringIO objects now also have a name attribute. It’s a bit tricky to see why this is needed since at first sight those objects never get into the open_files list. But when we have nested with mock_open blocks the file returned by the open() function in mock_file might actually have been mocked by an enclosing context manager and its type would then be StringIO.

The try: except: block around yield is for the enclosing context managers. When they get a NotMocked exception by running the code inside them, they check if it’s the file they’re mocking, in which case they ignore it. (Basically telling the nested context manager: I’ve got you covered.). If the NotMocked exception was raised on a file that’s different than the one they’re mocking, they simply re-raise it for someone else to deal with (either an enclosing context-manager) or the user.

If we now added another open() call in our initial get_proxy() function, or inside the with statement in the test case,

def test_proxy_url_not_in_sysproxy_but_in_yastproxy(self):
    with mock_open(SYS_PROXY):
        with mock_open(CURL_PROXY, ' --proxy ' + TEST_PROXY):
            self.assertEqual(TEST_PROXY, get_proxy())
            open('/dev/null')

we’d get this error:

NotMocked: The file /dev/null was opened, but not mocked.

Cool. Now how about the opposite? I had to refactor a lot of these test cases and at some point I wasn’t sure that all those assertions made sense. Was I really hitting all the files I had mocked? Well we could just add another check in our mock_open() code to see if all the files that were mocked, were actually accessed by the test code:

@contextmanager
def mock_open(filename, contents=None, complain=True):
    open_files = []
    def mock_file(*args):
        if args[0] == filename:
            f = io.StringIO(contents)
            f.name = filename
        else:
            print(filename)
            mocked_file.stop()
            f = open(*args)
            mocked_file.start()
        open_files.append(f.name)
        return f
    mocked_file = mock.patch('builtins.open', mock_file)
    mocked_file.start()
    try:
        yield
    except NotMocked as e:
        if e.filename != filename:
            raise
    mocked_file.stop()
    try:
        open_files.remove(filename)
    except ValueError:
        raise AssertionError("The file %s was not opened." % filename)
    for f_name in open_files:
        if complain:
            raise NotMocked(f_name)

We now track mocked files as open_files, too. Then at the end, we simply check if the file that we were supposed to be mocking (passed in as the filename argument) was indeed opened.

The gotcha here is that we need to raise this exception before NotMocked, otherwise we risk the code not ever getting to the file-not-opened check. I guess this is where the difference between using exceptions when something exceptional occured vs. when you want to communicate with the enclosing function becomes obvious.

If we now added another mock_open that we weren’t using to the test code:

def test_proxy_url_not_in_sysproxy_but_in_yastproxy(self):
    with mock_open(SYS_PROXY):
        with mock_open(CURL_PROXY, ' --proxy ' + TEST_PROXY):
            with mock_open('/dev/null'):
                get_proxy()
                self.assertEqual(TEST_PROXY, get_proxy())

We’d get:

AssertionError: The file /dev/null was not opened.

EDIT: Eric Moyer found a bug (and suggested a fix) in this implementation. When the same file is opened multiple times, the open_files list will contain the filename multiple times, but it will only get remove-ed once. This can be easily solved by making the open_files list a set instead.

So that’s about it, we now have a rock-solid mock_open function for mocking the builtin open().

Before we set it free, we need to add a nice docstring to it:

@contextmanager
def mock_open(filename, contents=None, complain=True):
    """Mock the open() builtin function on a specific filename
.
    Let execution pass through to open() on files different than
    :filename:. Return a StringIO with :contents: if the file was
    matched. If the :contents: parameter is not given or if it is None,
    a StringIO instance simulating an empty file is returned.
.
    If :complain: is True (default), will raise an AssertionError if
    :filename: was not opened in the enclosed block. A NotMocked
    exception will be raised if open() was called with a file that was
    not mocked by mock_open.
.
    """
    open_files = set()
    def mock_file(*args):
        if args[0] == filename:
            f = io.StringIO(contents)
            f.name = filename
        else:
            mocked_file.stop()
            f = open(*args)
            mocked_file.start()
        open_files.add(f.name)
        return f
    mocked_file = mock.patch('builtins.open', mock_file)
    mocked_file.start()
    try:
        yield
    except NotMocked as e:
        if e.filename != filename:
            raise
    mocked_file.stop()
    try:
        open_files.remove(filename)
    except KeyError:
        if complain:
            raise AssertionError("The file %s was not opened." % filename)
    for f_name in open_files:
        if complain:
            raise NotMocked(f_name)
Comments

testing a django blog's models

This post is a continuation of this post and I’ll be using that schema to write tests on top of. Here it is, for easy reference:

from django.db import models
class Category(models.Model):
    nume = models.CharField(max_length=20)
class Post(models.Model):
    title = models.CharField(max_length=50)
    body = models.TextField()
    category = models.ForeignKey(Category)
    published = models.BooleanField()
    creation_time = models.DateTimeField(auto_now_add=True)
    modified_time = models.DateTimeField(auto_now=True)
class Commentator(models.Model):
    name = models.CharField(max_length=50, unique=True)
    email = models.EmailField(max_length=50, unique=True)
    website = models.URLField(verify_exists=True)
class Comment(models.Model):
    body = models.TextField()
    post = models.ForeignKey(Post)
    author = models.ForeignKey(Commentator)
    approved = models.BooleanField()
    creation_time = models.DateTimeField(auto_now_add=True)

Ok, so here’s what we’re testing: our model — the emphasis is on our, because we’re only testing our code. And mostly, we’re actually testing that the code in models.py corresponds with what’s in the database. All test methods must begin with the word test.

One of the most annoying things which took me a while to figure out was that the setUp method is run every time before one of the other methods is run. That means that if you want to test for uniqueness, you have to build a tearDown method if you want to run any other independent tests. This is why snippet A won’t work, but snippet B will.
Here’s the model:

class Category(models.Model):
    name = models.CharField(max_length=20, unique=True)

snippet A

class CategoryTest(unittest.TestCase):
def setUp(self):
    self.cat1 = Category.objects.create(name="cat1")
def testexist(self):
    # make sure they get to the database
    self.assertEquals(self.cat1.name, "cat1")
def testunique(self):
    self.assertRaises(IntegrityError, Category.objects.create, name="cat1")

snippet B

class CategoryTest(unittest.TestCase):
def setUp(self):
    self.cat1 = Category.objects.create(name="cat1")
def testexist(self):
    # make sure they get to the database
    self.assertEquals(self.cat1.name, "cat1")
    self.assertRaises(IntegrityError, Category.objects.create, name="cat1")

The second snippet only calls the setUp method once because there is only one other method. But that’s not very nice. Ideally we’d to be able to run each test individually, so maybe we can write a tearDown method to be run after each other method, to restore the database.

However, there is an easier way to not have to write a tearDown method and that is using the django.test module which is an extention to unittest. All you have to do is import django.test instead of unittest and make every test object a sublclass of django.test.TestCase instead of unittest.TestCase.
Here is what it looks like now:

class CategoryTest(django.test.TestCase):
    def setUp(self):
        self.cat1 = Category.objects.create(name="cat1")
        self.cat2 = Category.objects.create(name="cat2")
    def testexist(self):
        # make sure they get to the database
        self.assertEquals(self.cat1.name, "cat1")
        self.assertEquals(self.cat2.name, "cat2")
    def testunique(self):
        self.assertRaises(IntegrityError, Category.objects.create, name="cat1")

Now, let’s test the Post class:

class Post(models.Model):
    title = models.CharField(max_length=50)
    body = models.TextField()
    category = models.ForeignKey(Category)
    published = models.BooleanField()
    creation_time = models.DateTimeField(auto_now_add=True)
    modified_time = models.DateTimeField(auto_now=True)

There’s a bunch more stuff to test here, like the fact that everything gets to the database (title, body, category) and that everything has it’s right type/class.
We setUp a post, but also a category, since the test will be independent, but needs a Category to generate a Post.

class PostTest(django.test.TestCase):
    def setUp(self):
        self.cat1 = Category.objects.create(name="cat1")
        self.post1 = Post.objects.create(title="name",body="trala lala",
                category=Category.objects.all()[0])

Next, we need to do a bit of a trivial test to check that the title, the body and the right category get to the db

def testtrivial(self):
        self.assertEquals(self.post1.title, "name")
        self.assertEquals(self.post1.body, "trala lala")
        self.assertEquals(self.post1.category, Category.objects.all()[0])

I think this is a good way to test that the creation_time and modified_time are newly generated datetime.datetime objects:

def testtime(self):
    self.assertEquals(self.post1.creation_time.hour, datetime.now().hour)

No, wait. I think this looks a bit more professional:

def testtime(self):
        delta = datetime.now() - self.post1.creation_time
        self.assert_(delta.seconds < 10)
        delta_modified = datetime.now() - self.post1.modified_time
        self.assert_(delta_modified.seconds < 10)

So now, we’re looking for datetime objects that were generated less than 10 seconds ago. That’s really very generous since the time it takes to run the test from the time the setUp method is run is in the range of microseconds.
This test doesn’t show the true difference between modified and creation time. Modification time is changed every time the object is saved to the database while creation time is not. So let’s write a new test based on that knowledge:

 def testModifiedVsCreation(self):
        modified = self.post1.modified_time
        created = self.post1.creation_time
        self.post1.save()
        self.assertNotEqual(modified, self.post1.modified_time)
        self.assertEqual(created, self.post1.creation_time)

Testing for a boolean value is really easy:

 def testpublished(self):
        self.assertEquals(self.post1.published, False)

And then there’s more than one way I can think of to test the Category ForeignKey:

def testcategory(self):
        self.assertEquals(self.cat1.__class__, self.post1.category.__class__)
        self.assertRaises(ValueError, Post.objects.create, name="name",
                body="tralaalal", category="ooopsie!")

In the end, I’ll go for the more general one, even though the second one is more excentric. So:

def testcategory(self):
        self.assertEquals(self.cat1.__class__, self.post1.category.__class__)
        self.assertRaises(ValueError, Post.objects.create, name="name",
                body="tralaalal", category="ooopsie!")

Btw, if you don’t know the errors (like ValueError — I didn’t know it), you can always drop to a manage.py console and try to Post.object.create(name="name",body="tralaalal", category="ooopsie!") and see if you get lucky.

Ok, passing on to the Commentator class:

class Commentator(models.Model):
    name = models.CharField(max_length=50, unique=True)
    email = models.EmailField(max_length=50, unique=True)
    website = models.URLField(verify_exists=True, blank=True)

We’re only going to test that the data gets to the database and that the name and email fields are unique. At this stage we can’t test the validation of the email and website fields. We’ll be doing that later, when we write the forms.
This should seem trivial by now:

class CommentatorTest(django.test.TestCase):
    def setUp(self):
        self.comtor = Commentator.objects.create(name="hacketyhack",
                email="hackety@example.com", website="example.com")
    def testExist(self):
        self.assertEquals(self.comtor.name, "hacketyhack")
        self.assertEquals(self.comtor.email, "hackety@example.com")
        self.assertEquals(self.comtor.website, "example.com")
     def testUnique(self):
        self.assertRaises(IntegrityError, Commentator.objects.create,
                name="hacketyhack", email="new@example.com",
                website="example.com")
        self.assertRaises(IntegrityError, Commentator.objects.create,
                name="nothackety", email="hackety@example.com",
                website="example.com")

Now, let’s get to testing the Comment class:

class Comment(models.Model):
    body = models.TextField()
    post = models.ForeignKey(Post)
    author = models.ForeignKey(Commentator)
    approved = models.BooleanField()
    creation_time = models.DateTimeField(auto_now_add=True)

There won’t be anything new here. And this is when and why testing is boring. But, hey! A man’s gotta do, what a man’s gotta do.

class CommentTest(django.test.TestCase):
    def setUp(self):
        self.cat = Category.objects.create(name="cat1")
        self.post = Post.objects.create(title="name",body="trala lala",
                category=Category.objects.all()[0])
        self.comtor = Commentator.objects.create(name="hacketyhack",
                email="hackety@example.com", website="example.com")
        self.com = Comment.objects.create(body="If the implementation is
        easy to explain, it may be a good idea.",
        post=Post.objects.all()[0], author=Commentator.objects.all()[0])
    def testExist(self):
        self.assertEquals(self.com.body, "If the implementation is
        easy to explain, it may be a good idea.")
        self.assertEquals(self.com.post, Post.objects.all()[0])
        self.assertEquals(self.com.author, Commentator.objects.all()[0])
        self.assertEquals(self.com.approved, False)
    def testTime(self):
        delta_creation = datetime.now() - self.comm.creation_time
        self.assert_(delta_creation.seconds < 7)
    def testCreationTime(self):
        # what if it's a modification_time instead?
        created = self.com.creation_time
        self.com.save()
        self.assertEqual(created, self.com.creation_time)

Now that we’ve written all the tests we have to make sure that they’re run against the actual database. Or better yet, a backup copy of it. Otherwise, the tests are useless, since django creates a new database based on the schema defined in models.py every time models.py test is run.

First, you’ll need to make a copy of django.test.simple (put it in your project’s directory for example). Then comment these lines:

# old_name = settings.DATABASE_NAME
# from django.db import connection
# connection.creation.create_test_db(verbosity, autoclobber=not interactive)
result = unittest.TextTestRunner(verbosity=verbosity).run(suite)
# connection.creation.destroy_test_db(old_name, verbosity)

And now, add this to your settings.py file:

TEST_RUNNER = 'myproject.simple.run_tests'

Be careful now. All the data in your database will be lost when you run manage.py test the next time. So back it up! First create a new database, say backup and then:

mysqldump -u DB_USER --password=DB_PASS DB_NAME|mysql -u DB_USER --password=DB_PASSWD -h localhost backup

You can reverse that when you’re done.

Here’s to show that it works (after I’ve made a little modification to the model, but not the database):

$ python manage.py test
..EEE..EEEEEE................
--> lots of tracebacks <--
----------------------------------------------------------------------
Ran 29 tests in 10.149s
FAILED (errors=9)

Ok, so that should provide a pretty good test coverage for now. Let’s go get breakfast!

Comments

blog database schema cu capsuni - Part 2

Tocmai am reușit (am găsit timp — furat timp) să scriu în django schema din postul trecut. Simplicity is divine:

from django.db import models
class Category(models.Model):
    nume = models.CharField(max_length=20)
class Post(models.Model):
    title = models.CharField(max_length=50)
    body = models.TextField()
    category = models.ForeignKey(Category)
    published = models.BooleanField()
    creation_time = models.DateTimeField(auto_now_add=True)
class Commentator(models.Model):
    name = models.CharField(max_length=50, unique=True)
    email = models.EmailField(max_length=50, unique=True)
    website = models.URLField(verify_exists=True)
class Comment(models.Model):
    body = models.TextField()
    post = models.ForeignKey(Post)
    author = models.ForeignKey(Commentator)
    approved = models.BooleanField()
    modified_time = models.DateTimeField(auto_now=True)

Pe lângă faptul că toate tipurile de date au nume și explicații pe care le poate înțelege oricine, django va folosi datele astea atunci când va construi interfața de administrare.
E interesant că trebuie să declari toate tabelele în ordine. La început pusesem Categoria ultima și n-o găsea când vroia să facă ForeignKey-ul de la Post. M-a cam răsfățat OOP-ul.
Alt lucru fain e că de pe acum se pregătesc feature-uri interesante cum ar fi URLField.verify_exists care verifică toate url-urile introduse și le refuză dacă primește 404. Așa că de-acum n-o să mai poată nimeni să pună variabile metasintactice de genu: caca, mumu și altele în field-ul ăla!

Și acum un mysql describe3 pentru tabelele rezultate:

mysql> describe revolution.blahg_category;
 +-------+-------------+------+-----+---------+
 | Field | Type        | Null | Key | Default |
 +-------+-------------+------+-----+---------+
 | id    | int(11)     | NO   | PRI | NULL    |
 | nume  | varchar(20) | NO   |     | NULL    |
 +-------+-------------+------+-----+---------+
 mysql> describe revolution.blahg_post;
 +---------------+-------------+------+-----+---------+
 | Field         | Type        | Null | Key | Default |
 +---------------+-------------+------+-----+---------+
 | id            | int(11)     | NO   | PRI | NULL    
 | title         | varchar(50) | NO   |     | NULL    |
 | body          | longtext    | NO   |     | NULL    |
 | category_id   | int(11)     | NO   | MUL | NULL    |
 | published     | tinyint(1)  | NO   |     | NULL    |
 | creation_time | datetime    | NO   |     | NULL    |
 | modified_time | datetime    | NO   |     | NULL    |
 +---------------+-------------+------+-----+---------+
 mysql> describe revolution.blahg_commentator;
 +---------+--------------+------+-----+---------+
 | Field   | Type         | Null | Key | Default | 
 +---------+--------------+------+-----+---------+
 | id      | int(11)      | NO   | PRI | NULL    |
 | name    | varchar(50)  | NO   | UNI | NULL    |
 | email   | varchar(50)  | NO   | UNI | NULL    |
 | website | varchar(200) | NO   |     | NULL    |
 +---------+--------------+------+-----+---------+
 mysql> describe revolution.blahg_comment;
 +---------------+------------+------+-----+---------+
 | Field         | Type       | Null | Key | Default |
 +---------------+------------+------+-----+---------+
 | id            | int(11)    | NO   | PRI | NULL    | 
 | body          | longtext   | NO   |     | NULL    |
 | post_id       | int(11)    | NO   | MUL | NULL    | 
 | author_id     | int(11)    | NO   | MUL | NULL    |
 | approved      | tinyint(1) | NO   |     | NULL    |
 | modified_time | datetime   | NO   |     | NULL    |
 +---------------+------------+------+-----+---------+

TADA!
Trebuie să studiez de ce se strică formatarea (de fapt știu de ce, trebuie să scot textile din blockul ăla), dar ideea de bază e clară.

3 merci gheorghe!

Comments

blog database schema cu capsuni

Am reușit să-mi scriu schema bazei de date a viitorului meu blog. Ah, da, m-am apucat de treabă. O să folosesc django cred (și deja mi s-a spus că era previzibil) deși încă mai am timp să mă răzgândesc. N-am găsit în 2 minute un script care să-mi deseneze scheme, așa că pun relațiile în engleză aici. Come bash me!

Post

  • belongs_to Category
  • has_many Comments

Comment

  • has_one Commentator
  • belongs_to Post

Commentator

  • has_many Comments

Category

  • has_many Posts

Ia să încerc să fac și niște tabele din ce scrie mai sus.

Post
id || title || body || category_id || created_at || published

Comment
id || post_id || commentator_id || body || approved || created_at

Commentator
id || name || email || website || gravatar_url

Category
id || name

Am renunțat la tabele și am improvizat o formatare. Sper să fie lizibil.

E evident ceea ce nu am făcut sper. Nu am lăsat commentatorii cu commenturile lor ceea ce ar fi dus la o relație cu 4 coloane redundante (nume, email, website, gravatar):

Comment
id || autor || email || website || gravatar_url || post_id || commentator_id || body || approved || created_at

Pe parcurs o să mai adaug rating la posturi și alte lucruri care mai îmi vin în minte. Ratingul o să încerce să fie ceva complex cu sus/jos, dar asta mai târziu.
Deci? Ce părere aveți? Ce să mai adaug? Am greșit ceva? Mă încadrez în forma normală 5? :-)

Comments
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.