Python is a great language. Dict is one of data structure available in python which allows data to store in the form of key/value pair. Many times we might encounter a situation where we have to retrieve a value using the dictionary key. When our key is not found in the dictionary it will throw an exception of KeyError.

KeyError

The solution to this is using defaultdict or dict.get() to return a default value if the key is not found in the dictionary.

Lets dig into both of these

defaultdict can be found in collections module of python. So, in order to use it, we have to import it first as:

from collections import defaultdict
mydict = defaultdict(int)

defaultdict constructor takes default_factory as argument which is a callable. This can be

int : default will be integer value of 0

str : default will be empty string ”

list : default will be empty list []

and so on.

If we want our own default value then we can pass function pointer.

Lets say that we want default value to string ‘default’. We can achieve this in defaultdict as

def mydefault():
        return 'default'

mydict = defaultdict(mydefault)
print mydict['test']

will output ‘default

The same result can also be achieved using the dict.get method as

mydict = {}
mydict.get('test','default')

will output ‘default

Thus same result can be achieved using both approach. dict.get have to provide a default value every time it is called whereas using defaultdict we cant setup a default value only one time.

Now lets check the efficiency of both of them in term of execution time.

For this we will be using ipython notebook since it has got %timeit command to measure the execution of any python statements over any desired looping of same statements.

In Ipython notebook environment, let us create two function that implements defaultdict in one and dict.get method in another.

defining_two_defs

Here we have implemented a() function to execute defaultdict value retrieving and b() to use dict.get method. Now lets calculate the execution time of both function in 100 loops as

defaultdict_timeit

dict.get_timeit_2

First executing a() function over 100 loops we can see that it took over 8.88 micro sec per loop using the best of three.

dict.get_timeit

Here when we run b() function over 100 loops we can see that it tooks 19.3 micro sec per loop using the best of 3 approach.

Hence defaultdict seems more efficient over dict.get method.

Lets rerun this test again but this time will be using default %timeit command of notebook which will be looping the statement for default 1000 loops

defaultdict_timeit_2

Clearly here also defaultdict seems more efficient that dict.get method and the experiment shows that defaultdict more that two times faster than dict.get method.

Advertisements