Python UK business days with pandas

Here’s how to calculate UK business days (well at least for England & Wales) using pandas ‘s holiday calendar.

First you’ll need this calendar for UK holidays:

from pandas.tseries.holiday import (
    AbstractHolidayCalendar, DateOffset, EasterMonday,
    GoodFriday, Holiday, MO,
    next_monday, next_monday_or_tuesday)
class EnglandAndWalesHolidayCalendar(AbstractHolidayCalendar):
    rules = [
        Holiday('New Years Day', month=1, day=1, observance=next_monday),
        GoodFriday,
        EasterMonday,
        Holiday('Early May bank holiday',
                month=5, day=1, offset=DateOffset(weekday=MO(1))),
        Holiday('Spring bank holiday',
                month=5, day=31, offset=DateOffset(weekday=MO(-1))),
        Holiday('Summer bank holiday',
                month=8, day=31, offset=DateOffset(weekday=MO(-1))),
        Holiday('Christmas Day', month=12, day=25, observance=next_monday),
        Holiday('Boxing Day',
                month=12, day=26, observance=next_monday_or_tuesday)
    ]

It was tested with the dates from gov.uk so should be fine to use, but please let me know if you find anything wrong with it.

Now you can do stuff like:

from datetime import date
from pandas.tseries.offsets import CDay
business = CDay(calendar=EnglandAndWalesHolidayCalendar())
>>> date.today()
datetime.date(2016, 3, 2)
>>> five_business_days_later = date.today() + 5 * business
>>> five_business_days_later
Timestamp('2016-03-09 00:00:00')
>>> five_business_days_later.date()
datetime.date(2016, 3, 9)
>>> date.today() - business
>>> date(2016, 12, 25) + business
Timestamp('2016-12-28 00:00:00')

You can also just retrieve the UK holidays for a specific year as a list of datetime objects using e.g.:

>>> holidays = EnglandAndWalesHolidayCalendar().holidays(
    start=date(2016, 1, 1),
    end=date(2016, 12, 31))
>>> holidays.tolist()
[Timestamp('2016-01-01 00:00:00'), Timestamp('2016-03-25 00:00:00'), Timestamp('2016-03-28 00:00:00'), Timestamp('2016-05-02 00:00:00'), Timestamp('2016-05-30 00:00:00'), Timestamp('2016-08-29 00:00:00'), Timestamp('2016-12-26 00:00:00'), Timestamp('2016-12-27 00:00:00')
>>> holidays.to_pydatetime()
array([datetime.datetime(2016, 1, 1, 0, 0),
       datetime.datetime(2016, 3, 25, 0, 0),
       datetime.datetime(2016, 3, 28, 0, 0),
       datetime.datetime(2016, 5, 2, 0, 0),
       datetime.datetime(2016, 5, 30, 0, 0),
       datetime.datetime(2016, 8, 29, 0, 0),
       datetime.datetime(2016, 12, 26, 0, 0),
       datetime.datetime(2016, 12, 27, 0, 0)], dtype=object)
>>> h.to_native_types()
['2016-01-01', '2016-03-25', '2016-03-28', '2016-05-02', '2016-05-30', '2016-08-29', '2016-12-26', '2016-12-27']

Pandas has all sorts of funny stuff you can do with series and time series in particular. Also check pandas’s docs about Custom Business Days especially the warning about possible timezone issues.

If you don’t need the power of pandas or you just don’t like it (maybe because it pulls in a bazillion dependencies and includes a gazillion modules), workalendar looks pretty good.

Comments

AST literal_eval


Safely evaluate an expression node or a Unicode or Latin-1 encoded string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.

This can be used for safely evaluating strings containing Python values from untrusted sources without the need to parse the values oneself. It is not capable of evaluating arbitrarily complex expressions, for example involving operators or indexing.

—From python’s ast library

I used to discredit everything that meant converting a string to an arbitrary data structure. This is a nice third option which seems like it would be useful in the majority of cases.

Comments

Change the system timezone for the python interpreter

Wrapping my head around timezones is hard. And testing the implications of working with different timezones is especially difficult since I now live in GMT (which is mostly the same as UTC).

I found a way to change the timezone, that is used by most of python’s stdlib, by changing the TZ environment variable:

$ python -c 'import time; print(time.tzname)'
('GMT', 'BST')
$ TZ='Europe/Stockholm' python -c 'import time; print(time.tzname)'
('CET', 'CEST')
Comments

The XY Problem

So there’s a website devoted to The XY Problem:

The XY problem is asking about your attempted solution rather than
your actual problem. This leads to enormous amounts of wasted time and
energy, both on the part of people asking for help, and on the part of
those providing help.

Comments

Mocking python's file open() builtin

I was working on a method to read some proxy information from several files today and then I wanted to test it.

A very simplified version (the original has all the different files being processed in different functions on different rules and it actually has error handling) of this function is this:

SYS_PROXY = '/etc/sysconfig/proxy'
CURL_PROXY = '/root/.curlrc'
def get_proxy():
    with open(SYS_PROXY) as f:
        contents = f.read()
        if 'http_proxy' in contents:
            proxy = contents.split('http_proxy = ')[-1] 
            if proxy:
                return proxy
    with open(CURL_PROXY) as f:
        contents = f.read()
        if '--proxy' in contents:
            proxy = contents.split('--proxy ')[-1] 
            if proxy:
                return proxy
    return os.getenv('http_proxy')

As unit tests should be self-contained, they shouldn’t read any files on disk. So we need to mock them. I generally use Michael Foord’s mock.

In order to intercept calls to python’s open(), we need to mock the builtins.open function:

TEST_PROXY = 'http://example.com:1111'
def test_proxy_url_in_sysproxy(self):
    with mock.patch("builtins.open",
                    return_value=io.StringIO("http_proxy = " + TEST_PROXY)):
         self.assertEqual(TEST_PROXY, get_proxy())

We’re good so far. Now we add the next natural test: we didn’t find anything in sysconfig, but we find the right proxy URL on our second try in CURL_PROXY:

def test_proxy_url_not_in_sysproxy_but_in_yastproxy(self):
    with mock.patch("builtins.open", return_value=io.StringIO()):
        with mock.patch("builtins.open",
                        return_value=io.StringIO(' --proxy ' + TEST_PROXY)):
            self.assertEqual(TEST_PROXY, get_proxy())

Urgh. That’s starting to look a bit clunky. It’s also wrong since the inner with statement ends up overriding the outer one and all we get for our second open() call is a closed file object:

ValueError: I/O operation on closed file.

Not to worry though. mock side_effect have got us covered!

def test_proxy_url_not_in_sysproxy_but_in_yastproxy(self):
    with mock.patch("builtins.open",
                    side_effect=[io.StringIO(),
                                 io.StringIO(' --proxy ' + TEST_PROXY)]):
        self.assertEqual(TEST_PROXY, get_proxy())

The code looks cleaner now. A bit. And at least it works. But the list we pass in to side_effect makes another issue pop up. We now seem to be dependent on the order that the files are opened and read. That seems clunky. If we had to refactor our code to change the order that we read files in get_proxy() we would also had to change all our tests. Also it’s not quite obvious why we’re setting our return values as side effects.

Ideally we’d have a way to assign each result to a filename and then not have to care about the order in which the files are open. In real life we would have two files with different contents anyway.

So let’s implement that method. We, of course, want to make it a context manager.

@contextmanager
def mock_open(filename, contents=None):
    def mock_file(*args):
        if args[0] == filename:
            return io.StringIO(contents)
        else:
            return open(*args)
    with mock.patch('builtins.open', mock_file):
        yield

So we only intercept the filename that we want to mock and let everything else pass through to builtins.open(). The yield is there because a contextmanager should be a generator function. Everything before the yield gets executed when entering the with mock_open ... statement, then the content of the with block is executed and then everything after the yield in our mock_open function (there’s nothing there in our case).

def test_proxy_url_not_in_sysproxy_but_in_yastproxy(self):
    with mock_open(SYS_PROXY):
        with mock_open(CURL_PROXY, ' --proxy ' + TEST_PROXY):
            self.assertEqual(TEST_PROXY, get_proxy())

Looks good.

RuntimeError: maximum recursion depth exceeded in comparison

Oops. It seems that we got into infinite recursion because we’re calling the mocked open() from the mocking function. We have to make sure that once we’ve mocked a call to open(), there’s no way we’re going to go through that mock again. Thankfully, the mock library provides methods to turn mocking on and off without using the with mock.patch context manager. Take a look at mock.patch’s start and stop methods.

@contextmanager
def mock_open(filename, contents=None):
    def mock_file(*args):
        if args[0] == filename:
            return io.StringIO(contents)
        else:
            mocked_file.stop()
            open_file = open(*args)
            mocked_file.start()
            return open_file
    mocked_file = mock.patch('builtins.open', mock_file)
    mocked_file.start()
    yield
    mocked_file.stop()

So we had to replace the with mock.patch statement with manually start()-ing and stop()-ing the mocking functionality before and after the yield. That’s basically what the with statement was doing, we just needed the indentifier so we can use it in the else branch.

In the else branch we turn off the mocking before calling open() (that’s what was causing us to go in the infinite loop). After we’ve called open(), we go back to mocking open(), in case there will be a future call that we actually do want to mock.

Test code now looks the same as before:

def test_proxy_url_not_in_sysproxy_but_in_yastproxy(self):
    with mock_open(SYS_PROXY):
        with mock_open(CURL_PROXY, ' --proxy ' + TEST_PROXY):
            self.assertEqual(TEST_PROXY, get_proxy())

But this time it works. So we could all go home now.

But say we wanted to ensure that no files were opened inside the with mock_open block other than the ones we mocked. It seems like a pretty sensible thing to do. Unit tests should be completely self-contained so you want to ensure they won’t be opening any files on the system. This would also catch some bugs that might only later pop-up on your CI server’s test runs, because of a custom development machine configuration.

The problem is pretty simple if you use only one with mock_open block, but once you start using more than one nested contest managers you have a problem. You need to have a way to communicate between the different context-managers. Ideally you’d have a way for each context-manager to say to the others (after it’s finished processing): hey, I finished my work here, but some dude opened a file which I didn’t mock. Did you mock it?.

So how do we solve that? We’ll use global variables! No. Just kidding.

We’ll use exceptions. Simply make the inner statement raise a custom NotMocked exception and let the enclosing context managers catch.If none of the enclosing context managers mock the file that was opened in the inner block, they just let the user deal with the exception.

So the exception can be a normal Exception subclass, but we need an extra bit of information, the filename that wasn’t mocked. I’ll also hardcode an error message in there:

class NotMocked(Exception):
    def __init__(self, filename):
        super(NotMocked, self).__init__(
            "The file %s was opened, but not mocked." % filename)
        self.filename = filename

The updated mock_open code looks like this:

@contextmanager
def mock_open(filename, contents=None, complain=True):
    open_files = []
    def mock_file(*args):
        if args[0] == filename:
            f = io.StringIO(contents)
            f.name = filename
        else:
            mocked_file.stop()
            f = open(*args)
            mocked_file.start()
            open_files.append(f.name)
        return f
    mocked_file = mock.patch('builtins.open', mock_file)
    mocked_file.start()
    try:
        yield
    except NotMocked as e:
        if e.filename != filename:
            raise
    mocked_file.stop()
    for open_file in open_files:
        if complain:
            raise NotMocked(open_file)

So we’re recording all the files that were opened in the open_files list. Then after all the code inside the with block was executed, we go through the open_files list and raise a NotMocked exception for each of those file names. We also added a new complain parameter just in case someone would like to turn this functionality off (maybe they want to use file fixtures after all).

The StringIO objects now also have a name attribute. It’s a bit tricky to see why this is needed since at first sight those objects never get into the open_files list. But when we have nested with mock_open blocks the file returned by the open() function in mock_file might actually have been mocked by an enclosing context manager and its type would then be StringIO.

The try: except: block around yield is for the enclosing context managers. When they get a NotMocked exception by running the code inside them, they check if it’s the file they’re mocking, in which case they ignore it. (Basically telling the nested context manager: I’ve got you covered.). If the NotMocked exception was raised on a file that’s different than the one they’re mocking, they simply re-raise it for someone else to deal with (either an enclosing context-manager) or the user.

If we now added another open() call in our initial get_proxy() function, or inside the with statement in the test case,

def test_proxy_url_not_in_sysproxy_but_in_yastproxy(self):
    with mock_open(SYS_PROXY):
        with mock_open(CURL_PROXY, ' --proxy ' + TEST_PROXY):
            self.assertEqual(TEST_PROXY, get_proxy())
            open('/dev/null')

we’d get this error:

NotMocked: The file /dev/null was opened, but not mocked.

Cool. Now how about the opposite? I had to refactor a lot of these test cases and at some point I wasn’t sure that all those assertions made sense. Was I really hitting all the files I had mocked? Well we could just add another check in our mock_open() code to see if all the files that were mocked, were actually accessed by the test code:

@contextmanager
def mock_open(filename, contents=None, complain=True):
    open_files = []
    def mock_file(*args):
        if args[0] == filename:
            f = io.StringIO(contents)
            f.name = filename
        else:
            print(filename)
            mocked_file.stop()
            f = open(*args)
            mocked_file.start()
        open_files.append(f.name)
        return f
    mocked_file = mock.patch('builtins.open', mock_file)
    mocked_file.start()
    try:
        yield
    except NotMocked as e:
        if e.filename != filename:
            raise
    mocked_file.stop()
    try:
        open_files.remove(filename)
    except ValueError:
        raise AssertionError("The file %s was not opened." % filename)
    for f_name in open_files:
        if complain:
            raise NotMocked(f_name)

We now track mocked files as open_files, too. Then at the end, we simply check if the file that we were supposed to be mocking (passed in as the filename argument) was indeed opened.

The gotcha here is that we need to raise this exception before NotMocked, otherwise we risk the code not ever getting to the file-not-opened check. I guess this is where the difference between using exceptions when something exceptional occured vs. when you want to communicate with the enclosing function becomes obvious.

If we now added another mock_open that we weren’t using to the test code:

def test_proxy_url_not_in_sysproxy_but_in_yastproxy(self):
    with mock_open(SYS_PROXY):
        with mock_open(CURL_PROXY, ' --proxy ' + TEST_PROXY):
            with mock_open('/dev/null'):
                get_proxy()
                self.assertEqual(TEST_PROXY, get_proxy())

We’d get:

AssertionError: The file /dev/null was not opened.

EDIT: Eric Moyer found a bug (and suggested a fix) in this implementation. When the same file is opened multiple times, the open_files list will contain the filename multiple times, but it will only get remove-ed once. This can be easily solved by making the open_files list a set instead.

So that’s about it, we now have a rock-solid mock_open function for mocking the builtin open().

Before we set it free, we need to add a nice docstring to it:

@contextmanager
def mock_open(filename, contents=None, complain=True):
    """Mock the open() builtin function on a specific filename
.
    Let execution pass through to open() on files different than
    :filename:. Return a StringIO with :contents: if the file was
    matched. If the :contents: parameter is not given or if it is None,
    a StringIO instance simulating an empty file is returned.
.
    If :complain: is True (default), will raise an AssertionError if
    :filename: was not opened in the enclosed block. A NotMocked
    exception will be raised if open() was called with a file that was
    not mocked by mock_open.
.
    """
    open_files = set()
    def mock_file(*args):
        if args[0] == filename:
            f = io.StringIO(contents)
            f.name = filename
        else:
            mocked_file.stop()
            f = open(*args)
            mocked_file.start()
        open_files.add(f.name)
        return f
    mocked_file = mock.patch('builtins.open', mock_file)
    mocked_file.start()
    try:
        yield
    except NotMocked as e:
        if e.filename != filename:
            raise
    mocked_file.stop()
    try:
        open_files.remove(filename)
    except KeyError:
        if complain:
            raise AssertionError("The file %s was not opened." % filename)
    for f_name in open_files:
        if complain:
            raise NotMocked(f_name)
Comments

FOSDEM 2012 review

I went to FOSDEM this year. Thanks SUSE for sponsoring my trip! Here is a short review for the projects that I found interesting at this year’s FOSDEM.

SATURDAY

The Aeolus Project

Francesco Vollero – Red Hat

This is a very interesting project if you can go past how meta it is. It wants to be an abstraction over all the existing private and public cloud solutions. The aim of the project is to be able to create and control a virtual system throughout its life cycle. It can be converted from one VM image format to another and be deployed/moved from one cloud provider to another. Groups of images can be setup and controlled together. The way resources are managed and billed would also be cloud-independent.

It relies heavily on the DeltaCloud project.

Open Clouds with DeltaCloud

Michal Fojtik – Red Hat

DeltaCloud aims to be a RESTful API that is able to abstract all of the other public or private cloud APIs, allowing for the development of cloud-independent software. The project says it wants to be truly independent (esp. from Red Hat). It was accepted as a top-level Apache project.

DMTF CIMI and Apache DeltaCloud

Marios Andreou – Red Hat

The CIMI API is a specification for interacting with various cloud-resources. A lot of very big companies are part of the DMTF Cloud Management Working Group: Red Hat, VMware Inc., Oracle, IBM, Microsoft Corporation, Huawei, Fujitsu, Dell. It is currently being implemented as part of the DeltaCloud API. The presenter also showed some implementation details: a lot of the code is shared between the DeltaCloud and the CIMI API.

Infrastructure as an opensource project

Ryan Lane – Wikimedia Foundation

The talk went into some detail about the whole Wikimedia setup. It is built on top of open source projects and aims to be entirely free and available to anyone who wants to know more about it. The speaker presented some of the issues that the Wikimedia organization faced when they decided to give full root access to their machines to volunteers and how to allow for different levels of trust.

Orchestration for the cloud – Juju

Dave Walker – Canonical

Juju is a system for building recipes of configurations and packages that can then be deployed on openstack/EC2 systems. The project aims to integrate with tools like chef and puppet to be able to manage deploying, connecting, configuring and running suites of applications in the cloud.

OpenStack developers meeting

This was a rather informal discussion. 4 major distros were present: Fedora, Ubuntu, SUSE and Debian, but also some other contributors. Upstream asked about the problems that distributions face, some minor one-time occurrences were discussed briefly. Stefano Maffulli, the openstack community manager was also present and there were some heated discussions about the way the project is governed. There are still a lot of things being discussed behind closed doors. Negotiations about the future of the project and fund-gathering is done with only a few big companies at a very high level. The community, on the other hand, was very vocal about wanting to rule itself with no enterprise interference.

Rethinking system and distro development

Lars Wirzenius

Advanced the idea of maintaining groups of packages, all locked at a specific version. Having the maintainers always know which combination of versions a bug comes from would make the whole environment easier to replicate and the bug easier to reproduce. This would also, supposedly, reduce some of the complexities of dealing with dependencies.

These groups of packages would be built directly from the upstream’s sources, following rules laid out in a git repository. The speaker also said he wants to get rid of binary packages completely.

If this were to be implemented, distributions could write functional tests against whole systems (continuously built images), rather than individual binary packages and ensure that a full configuration works.

Someone from the audience mentioned that a lot of the ideas in the talk are already implemented in NixOS(nixos.org) (which looks like a very interesting project in itself).

SUNDAY

Continuos Integration/ Continuos Delivery

Karanbir Singh – CentOS

The speaker discussed the system which CentOS uses for continuous integration. I liked their laissez-faire approach to which type of functional test language they should be using. They basically allow any type of language/environment to be used when running tests. The only requirement is that the test returns 0 on success and something else on failure. Anyone can write functional tests in any language they want (they just specify the packages as requirements for their test environment). People can choose to maintain different groups of packages along with the tests associated to them.

The Apache Cassandra Storage Engine

Sylvain Lebresne

A lot of interesting concepts about the optimizations that were made in the Cassandra project in order to speed up writes and make reads twice as fast (almost as fast as reads): different levels of caching, queuing writes, merge sorting the read cache with the physical data on reads etc.

Freedom, Out of the Box!

Bdale Garbee

An interesting project about making a truly free easily available software as well as hardware system. Some interesting concepts are used in this project like GPG keys for authentication, but also for the trust required to provide a truly decentralized peer based network, free from DNSes.


I’ve been to a few other talks that I can’t remember anything from either because of the bad quality of the presentation or because I didn’t have the prerequisite knowledge to understand what they were talking about. Next time I should also take notes.

A lot of the talks were recorded and are available over here (with more coming): FOSDEM 2012 videos. The quality of the recordings (esp. in the main room) is sometimes even better than being there live. The voice is clearer and there is no ambient noise. Also, as it was really cold in most of the rooms – I had to keep my jacket and hat on.

Comments

SQL and Relation Theory Master Class

This video course is perhaps the best way to meet the famous C. J. Date and his astonishingly comprehensive style. The lectures are a great introduction to database theory while at the same time they lay a very solid foundation for any database practitioners or theorists. The author introduces some very useful theoretical notions that are essential to grasping the more subtle concepts of database design and he does so in a high-class fashion.

C. J. Date’s style of explaining and teaching, which can also be seen in his books, is didactic and very thorough while at the same time astonishingly clear. Many times while reading the book that these videos are based on and even afterward while watching the videos, I had to stop in order to reflect at the great volume of information that I had absorbed in a surprisingly simple manner. These videos are full of very deep notions about databases and can really benefit from reviewing at a later time, just to cement the knowledge or reflect on certain topics which come up during everyday practice.

C. J. Date sets out to demolish SQL as a language fit for relational theory and databases in general. While going through all the database theory concepts he presents the ideal case and an ideal query language (actually not ideal, but as he demonstrates, the correct ones) contrasting them to generic SQL. He also posits and sets out to prove, in a very interesting argument, that relational databases are the only way to store data and all other data models will not endure.

These are the days of NOSQL databases, but I think that the information contained in these lectures will be useful for a lot more time and in a lot more settings than just conventional SQL databases that are used in the majority of current systems. I oftentimes find myself thinking in relational terms even while designing the redis data model that I’m currently working on.

The only problem I have is that I sometimes felt that the lectures were a bit dull. It is also possible that I got this impression because I was watching too many without interruption :). While the content of the lectures is excellent, the presentation could be improved. Often times I felt that the audience present in the classroom could have done more to improve the dynamism of the lectures. It seemed that the only reason why they were there was so that the presenter wouldn’t feel alone. I would have enjoyed more challenging questions and especially some skeptical comments from industry veterans perhaps. I’m sure those would have led to very interesting debates considering the high class of the lecturer and presumably, the attendants.

Comments

The Productive Programmer

Today I read The Productive Programmer .

I’ve already got a bunch of books piled up and waiting to be read, some of which I’ve reached the middle of and some of which I’ve read only the introduction to. I was bored today and this looked like an easy read that I could drop at any time. It is an easy read, but the fact is, it’s very catchy. It draws you in and doesn’t want to be let go. It’s a great way to spend half a day.

This could have been another one of those “94 things you need to know” books, but I think this title is way better. The tips are divided into 2 big sections and a few separate chapters within each one, giving the book some structure. One of the strange things about it is that its advice spans across 3 different platforms (*nix, OS X and Windows) and they’re all mingled together in the same chapter. I was put off by this at first, but after I’d gotten into the book a bit I realized that it is, in fact, a good idea. The whole theme of the book is programmer productivity and the reason it has this title and not a cheesy title with numbers like “94.3 productivity tips for programmers” is that the tips aren’t what’s important. The book is there to hit people in the head with and open their minds. You aren’t supposed to just use the tips provided, that’s unimportant. What’s important is that you have a shock and realize that you’ve been the wrong kind of lazy as a programmer. You’ve stopped automating, you’ve become a machine; in the author’s own words, computers have started “getting together late at night to make fun of their users”. As soon as I realized what this book was really about, I started reading the Windows tips as well. I also stopped a few times to look in my distro’s repository for tools that I had known of, but never used before, only now understanding their true purpose. There are many applications that seem just trendy at first, until you realize that even a small productivity boost is a big productivity boost. (Go check out gnome-do , I’d heard about it years ago, but never tried it until now).
The book contains tips ranging from application launchers, Firefox’s magic address bar, bash scripting commands to office productivity tips for killing distractions. Once again, the mindset is important, not the tips themselves. The big take-away from this book is beginning to constantly judge everything you do as a programmer. This isn’t new advice (at least for those who have read The Pragmatic Programmer”, which by the way is mentioned several times in this book), but I find it’s better emphasized in this book. At one point in the book, the author explains how it took his team one hour to devise a Ruby script to automate some simple task that would’ve required 10 minutes if done manually and finally only needed doing 5 times. One could say there was a loss in productivity, but as the author points out, one would be wrong. Those 50 minutes would’ve been spent with the brain turned off, whereas the hour writing the script was spent learning, focusing, practicing, gaining knowledge that can later be used on a different project. Some of us would probably have gotten bored in those 50 minutes and fallen into procrastination. That doesn’t happen when you’ve got a complex problem to solve.
That was the first part of the book, Mechanics. The second part, Practice is a bit harder to read, as it’s not just disparaged tips on very different applications. They’re quite two separate books actually. The majority of the examples in Part IIare java (they’re mostly readable even for someone who doesn’t speak the exact dialect of OOP that java uses) and this part is mostly about software construction as Steve McConnell would say, but it’s also about Java. I learned a new acronym: YAGNI which basically means that thinking ahead is bad. This is probably one of the advices that I feel the least guilty about, but which I sometime observe in people around me. Never program a feature you don’t urgently have a need for.
One of the good points of this book is the originality with which the ideas are expressed. Most of these ideas aren’t new, especially to anyone who’s read other software engineering books. The text is spiced up with little narratives of different adventures from the author’s experience as a consultant and there is also an Ancient Philosophers chapter and an explanation of the PacMan game’s way of functioning (although I didn’t understand how that’s supposed to make the game less enjoyable).

Comments

GSOC - it begins...

My Fedora proposal got accepted to this year’s Google Summer of Code Program. You can look at a short abstract here . Now I’m going to try to explain what this project is about and what I did to prepare for being accepted, hopefully without going mad about how happy I am about it.

I started work on the Fedora Project almost a year ago. One day I popped on the mailing list and then on the irc channel of the infrastructure team and asked for something to do. Luckily, Toshio Kuratomi was on the watch and after giving me a short tour of the various projects he could help me get familiar with, I picked the package database. Most of the work I’ve done so far is in the pkgdb (the search capability is the most obvious thing I worked on). The overview on the front page describes it quite well; it’s got package information and it’s aimed at package developers. It’s not a very famous part of the fedora project websites, certainly not as famous as something like packages.ubuntu.com is for ubuntu. But that’s not what it was intended for, even if that’s what attracted me to the project at first. I liked the exposure of such a website, but also the fact that, at the time, it was easier for me to understand what it did and how it worked :).

The idea of making the package database more user-friendly as opposed to developer-centric wasn’t a new one. Toshio, the main developer had been thinking about it for a long time, but I guess it never really became a priority. The idea had also been proposed for last year’s GSOC, but it hadn’t been accepted (this scared me a bit when I found out). I picked this idea on a whim when I told Toshio I wanted to participate in this year’s GSOC on pkgdb and he asked me what exactly I wanted to do. I wasn’t expecting the question, so I answered with the first thing that came to mind. Looking back, I think it was a good choice.

All my involvement with the Fedora Project owes a lot to the best possible person who could have become my mentor for GSOC. The Infrastructure Team is a great one to work with, and the Fedora contributor community is made up of a lot of smart, fun and selfless people. I say this after having spent a lot of time lurking the IRC channels, the various mailing lists, the planet etc. and to a somewhat lesser extent interacting with other contributors. However, I wouldn’t have continued contributing if it weren’t for the continuous support and guidance of Toshio. I probably wouldn’t have been able to participate in the GSOC without the many discussions (starting in February) with Toshio about the proposal and the support when explaining the idea to other community members. Having said that, I think that being familiar with the pkgdb also helped a lot with writing the proposal. I didn’t have to waste time on getting to know the code, the community, the devs as I would have if I had written a proposal for a different project. I also had a fair idea of what would constitute a good proposal and a rough idea about how it could be implemented. I think this helped with my credibility in the eyes of the mentors who ranked my proposal.

I was never convinced I would get a spot on Fedora & JBoss’s accepted proposal’s list , but it was is a great thing to dream of. The butterflies in my stomach were killing me at the end of the waiting period, especially since it had lasted for more than 2 months. I now have a summer to work full time on my hobby :).

At the end of the summer, the fedora community will hopefully have a package database with package versions, size, dependencies, rss feeds, tagging, package reviews etc. There’s even a detailed schedule from my proposal you can drool on if you’re so inclined.

And hello, fedora planet! Sorry for being late.

Comments

testing a django blog's models

This post is a continuation of this post and I’ll be using that schema to write tests on top of. Here it is, for easy reference:

from django.db import models
class Category(models.Model):
    nume = models.CharField(max_length=20)
class Post(models.Model):
    title = models.CharField(max_length=50)
    body = models.TextField()
    category = models.ForeignKey(Category)
    published = models.BooleanField()
    creation_time = models.DateTimeField(auto_now_add=True)
    modified_time = models.DateTimeField(auto_now=True)
class Commentator(models.Model):
    name = models.CharField(max_length=50, unique=True)
    email = models.EmailField(max_length=50, unique=True)
    website = models.URLField(verify_exists=True)
class Comment(models.Model):
    body = models.TextField()
    post = models.ForeignKey(Post)
    author = models.ForeignKey(Commentator)
    approved = models.BooleanField()
    creation_time = models.DateTimeField(auto_now_add=True)

Ok, so here’s what we’re testing: our model — the emphasis is on our, because we’re only testing our code. And mostly, we’re actually testing that the code in models.py corresponds with what’s in the database. All test methods must begin with the word test.

One of the most annoying things which took me a while to figure out was that the setUp method is run every time before one of the other methods is run. That means that if you want to test for uniqueness, you have to build a tearDown method if you want to run any other independent tests. This is why snippet A won’t work, but snippet B will.
Here’s the model:

class Category(models.Model):
    name = models.CharField(max_length=20, unique=True)

snippet A

class CategoryTest(unittest.TestCase):
def setUp(self):
    self.cat1 = Category.objects.create(name="cat1")
def testexist(self):
    # make sure they get to the database
    self.assertEquals(self.cat1.name, "cat1")
def testunique(self):
    self.assertRaises(IntegrityError, Category.objects.create, name="cat1")

snippet B

class CategoryTest(unittest.TestCase):
def setUp(self):
    self.cat1 = Category.objects.create(name="cat1")
def testexist(self):
    # make sure they get to the database
    self.assertEquals(self.cat1.name, "cat1")
    self.assertRaises(IntegrityError, Category.objects.create, name="cat1")

The second snippet only calls the setUp method once because there is only one other method. But that’s not very nice. Ideally we’d to be able to run each test individually, so maybe we can write a tearDown method to be run after each other method, to restore the database.

However, there is an easier way to not have to write a tearDown method and that is using the django.test module which is an extention to unittest. All you have to do is import django.test instead of unittest and make every test object a sublclass of django.test.TestCase instead of unittest.TestCase.
Here is what it looks like now:

class CategoryTest(django.test.TestCase):
    def setUp(self):
        self.cat1 = Category.objects.create(name="cat1")
        self.cat2 = Category.objects.create(name="cat2")
    def testexist(self):
        # make sure they get to the database
        self.assertEquals(self.cat1.name, "cat1")
        self.assertEquals(self.cat2.name, "cat2")
    def testunique(self):
        self.assertRaises(IntegrityError, Category.objects.create, name="cat1")

Now, let’s test the Post class:

class Post(models.Model):
    title = models.CharField(max_length=50)
    body = models.TextField()
    category = models.ForeignKey(Category)
    published = models.BooleanField()
    creation_time = models.DateTimeField(auto_now_add=True)
    modified_time = models.DateTimeField(auto_now=True)

There’s a bunch more stuff to test here, like the fact that everything gets to the database (title, body, category) and that everything has it’s right type/class.
We setUp a post, but also a category, since the test will be independent, but needs a Category to generate a Post.

class PostTest(django.test.TestCase):
    def setUp(self):
        self.cat1 = Category.objects.create(name="cat1")
        self.post1 = Post.objects.create(title="name",body="trala lala",
                category=Category.objects.all()[0])

Next, we need to do a bit of a trivial test to check that the title, the body and the right category get to the db

def testtrivial(self):
        self.assertEquals(self.post1.title, "name")
        self.assertEquals(self.post1.body, "trala lala")
        self.assertEquals(self.post1.category, Category.objects.all()[0])

I think this is a good way to test that the creation_time and modified_time are newly generated datetime.datetime objects:

def testtime(self):
    self.assertEquals(self.post1.creation_time.hour, datetime.now().hour)

No, wait. I think this looks a bit more professional:

def testtime(self):
        delta = datetime.now() - self.post1.creation_time
        self.assert_(delta.seconds < 10)
        delta_modified = datetime.now() - self.post1.modified_time
        self.assert_(delta_modified.seconds < 10)

So now, we’re looking for datetime objects that were generated less than 10 seconds ago. That’s really very generous since the time it takes to run the test from the time the setUp method is run is in the range of microseconds.
This test doesn’t show the true difference between modified and creation time. Modification time is changed every time the object is saved to the database while creation time is not. So let’s write a new test based on that knowledge:

 def testModifiedVsCreation(self):
        modified = self.post1.modified_time
        created = self.post1.creation_time
        self.post1.save()
        self.assertNotEqual(modified, self.post1.modified_time)
        self.assertEqual(created, self.post1.creation_time)

Testing for a boolean value is really easy:

 def testpublished(self):
        self.assertEquals(self.post1.published, False)

And then there’s more than one way I can think of to test the Category ForeignKey:

def testcategory(self):
        self.assertEquals(self.cat1.__class__, self.post1.category.__class__)
        self.assertRaises(ValueError, Post.objects.create, name="name",
                body="tralaalal", category="ooopsie!")

In the end, I’ll go for the more general one, even though the second one is more excentric. So:

def testcategory(self):
        self.assertEquals(self.cat1.__class__, self.post1.category.__class__)
        self.assertRaises(ValueError, Post.objects.create, name="name",
                body="tralaalal", category="ooopsie!")

Btw, if you don’t know the errors (like ValueError — I didn’t know it), you can always drop to a manage.py console and try to Post.object.create(name="name",body="tralaalal", category="ooopsie!") and see if you get lucky.

Ok, passing on to the Commentator class:

class Commentator(models.Model):
    name = models.CharField(max_length=50, unique=True)
    email = models.EmailField(max_length=50, unique=True)
    website = models.URLField(verify_exists=True, blank=True)

We’re only going to test that the data gets to the database and that the name and email fields are unique. At this stage we can’t test the validation of the email and website fields. We’ll be doing that later, when we write the forms.
This should seem trivial by now:

class CommentatorTest(django.test.TestCase):
    def setUp(self):
        self.comtor = Commentator.objects.create(name="hacketyhack",
                email="hackety@example.com", website="example.com")
    def testExist(self):
        self.assertEquals(self.comtor.name, "hacketyhack")
        self.assertEquals(self.comtor.email, "hackety@example.com")
        self.assertEquals(self.comtor.website, "example.com")
     def testUnique(self):
        self.assertRaises(IntegrityError, Commentator.objects.create,
                name="hacketyhack", email="new@example.com",
                website="example.com")
        self.assertRaises(IntegrityError, Commentator.objects.create,
                name="nothackety", email="hackety@example.com",
                website="example.com")

Now, let’s get to testing the Comment class:

class Comment(models.Model):
    body = models.TextField()
    post = models.ForeignKey(Post)
    author = models.ForeignKey(Commentator)
    approved = models.BooleanField()
    creation_time = models.DateTimeField(auto_now_add=True)

There won’t be anything new here. And this is when and why testing is boring. But, hey! A man’s gotta do, what a man’s gotta do.

class CommentTest(django.test.TestCase):
    def setUp(self):
        self.cat = Category.objects.create(name="cat1")
        self.post = Post.objects.create(title="name",body="trala lala",
                category=Category.objects.all()[0])
        self.comtor = Commentator.objects.create(name="hacketyhack",
                email="hackety@example.com", website="example.com")
        self.com = Comment.objects.create(body="If the implementation is
        easy to explain, it may be a good idea.",
        post=Post.objects.all()[0], author=Commentator.objects.all()[0])
    def testExist(self):
        self.assertEquals(self.com.body, "If the implementation is
        easy to explain, it may be a good idea.")
        self.assertEquals(self.com.post, Post.objects.all()[0])
        self.assertEquals(self.com.author, Commentator.objects.all()[0])
        self.assertEquals(self.com.approved, False)
    def testTime(self):
        delta_creation = datetime.now() - self.comm.creation_time
        self.assert_(delta_creation.seconds < 7)
    def testCreationTime(self):
        # what if it's a modification_time instead?
        created = self.com.creation_time
        self.com.save()
        self.assertEqual(created, self.com.creation_time)

Now that we’ve written all the tests we have to make sure that they’re run against the actual database. Or better yet, a backup copy of it. Otherwise, the tests are useless, since django creates a new database based on the schema defined in models.py every time models.py test is run.

First, you’ll need to make a copy of django.test.simple (put it in your project’s directory for example). Then comment these lines:

# old_name = settings.DATABASE_NAME
# from django.db import connection
# connection.creation.create_test_db(verbosity, autoclobber=not interactive)
result = unittest.TextTestRunner(verbosity=verbosity).run(suite)
# connection.creation.destroy_test_db(old_name, verbosity)

And now, add this to your settings.py file:

TEST_RUNNER = 'myproject.simple.run_tests'

Be careful now. All the data in your database will be lost when you run manage.py test the next time. So back it up! First create a new database, say backup and then:

mysqldump -u DB_USER --password=DB_PASS DB_NAME|mysql -u DB_USER --password=DB_PASSWD -h localhost backup

You can reverse that when you’re done.

Here’s to show that it works (after I’ve made a little modification to the model, but not the database):

$ python manage.py test
..EEE..EEEEEE................
--> lots of tracebacks <--
----------------------------------------------------------------------
Ran 29 tests in 10.149s
FAILED (errors=9)

Ok, so that should provide a pretty good test coverage for now. Let’s go get breakfast!

Comments

blog database schema cu capsuni - Part 2

Tocmai am reușit (am găsit timp — furat timp) să scriu în django schema din postul trecut. Simplicity is divine:

from django.db import models
class Category(models.Model):
    nume = models.CharField(max_length=20)
class Post(models.Model):
    title = models.CharField(max_length=50)
    body = models.TextField()
    category = models.ForeignKey(Category)
    published = models.BooleanField()
    creation_time = models.DateTimeField(auto_now_add=True)
class Commentator(models.Model):
    name = models.CharField(max_length=50, unique=True)
    email = models.EmailField(max_length=50, unique=True)
    website = models.URLField(verify_exists=True)
class Comment(models.Model):
    body = models.TextField()
    post = models.ForeignKey(Post)
    author = models.ForeignKey(Commentator)
    approved = models.BooleanField()
    modified_time = models.DateTimeField(auto_now=True)

Pe lângă faptul că toate tipurile de date au nume și explicații pe care le poate înțelege oricine, django va folosi datele astea atunci când va construi interfața de administrare.
E interesant că trebuie să declari toate tabelele în ordine. La început pusesem Categoria ultima și n-o găsea când vroia să facă ForeignKey-ul de la Post. M-a cam răsfățat OOP-ul.
Alt lucru fain e că de pe acum se pregătesc feature-uri interesante cum ar fi URLField.verify_exists care verifică toate url-urile introduse și le refuză dacă primește 404. Așa că de-acum n-o să mai poată nimeni să pună variabile metasintactice de genu: caca, mumu și altele în field-ul ăla!

Și acum un mysql describe3 pentru tabelele rezultate:

mysql> describe revolution.blahg_category;
 +-------+-------------+------+-----+---------+
 | Field | Type        | Null | Key | Default |
 +-------+-------------+------+-----+---------+
 | id    | int(11)     | NO   | PRI | NULL    |
 | nume  | varchar(20) | NO   |     | NULL    |
 +-------+-------------+------+-----+---------+
 mysql> describe revolution.blahg_post;
 +---------------+-------------+------+-----+---------+
 | Field         | Type        | Null | Key | Default |
 +---------------+-------------+------+-----+---------+
 | id            | int(11)     | NO   | PRI | NULL    
 | title         | varchar(50) | NO   |     | NULL    |
 | body          | longtext    | NO   |     | NULL    |
 | category_id   | int(11)     | NO   | MUL | NULL    |
 | published     | tinyint(1)  | NO   |     | NULL    |
 | creation_time | datetime    | NO   |     | NULL    |
 | modified_time | datetime    | NO   |     | NULL    |
 +---------------+-------------+------+-----+---------+
 mysql> describe revolution.blahg_commentator;
 +---------+--------------+------+-----+---------+
 | Field   | Type         | Null | Key | Default | 
 +---------+--------------+------+-----+---------+
 | id      | int(11)      | NO   | PRI | NULL    |
 | name    | varchar(50)  | NO   | UNI | NULL    |
 | email   | varchar(50)  | NO   | UNI | NULL    |
 | website | varchar(200) | NO   |     | NULL    |
 +---------+--------------+------+-----+---------+
 mysql> describe revolution.blahg_comment;
 +---------------+------------+------+-----+---------+
 | Field         | Type       | Null | Key | Default |
 +---------------+------------+------+-----+---------+
 | id            | int(11)    | NO   | PRI | NULL    | 
 | body          | longtext   | NO   |     | NULL    |
 | post_id       | int(11)    | NO   | MUL | NULL    | 
 | author_id     | int(11)    | NO   | MUL | NULL    |
 | approved      | tinyint(1) | NO   |     | NULL    |
 | modified_time | datetime   | NO   |     | NULL    |
 +---------------+------------+------+-----+---------+

TADA!
Trebuie să studiez de ce se strică formatarea (de fapt știu de ce, trebuie să scot textile din blockul ăla), dar ideea de bază e clară.

3 merci gheorghe!

Comments

blog database schema cu capsuni

Am reușit să-mi scriu schema bazei de date a viitorului meu blog. Ah, da, m-am apucat de treabă. O să folosesc django cred (și deja mi s-a spus că era previzibil) deși încă mai am timp să mă răzgândesc. N-am găsit în 2 minute un script care să-mi deseneze scheme, așa că pun relațiile în engleză aici. Come bash me!

Post

  • belongs_to Category
  • has_many Comments

Comment

  • has_one Commentator
  • belongs_to Post

Commentator

  • has_many Comments

Category

  • has_many Posts

Ia să încerc să fac și niște tabele din ce scrie mai sus.

Post
id || title || body || category_id || created_at || published

Comment
id || post_id || commentator_id || body || approved || created_at

Commentator
id || name || email || website || gravatar_url

Category
id || name

Am renunțat la tabele și am improvizat o formatare. Sper să fie lizibil.

E evident ceea ce nu am făcut sper. Nu am lăsat commentatorii cu commenturile lor ceea ce ar fi dus la o relație cu 4 coloane redundante (nume, email, website, gravatar):

Comment
id || autor || email || website || gravatar_url || post_id || commentator_id || body || approved || created_at

Pe parcurs o să mai adaug rating la posturi și alte lucruri care mai îmi vin în minte. Ratingul o să încerce să fie ceva complex cu sus/jos, dar asta mai târziu.
Deci? Ce părere aveți? Ce să mai adaug? Am greșit ceva? Mă încadrez în forma normală 5? :-)

Comments
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.