Wednesday, August 1, 2012

Coding (more) Efficiently in Python! Part 1: prun

Since it’s been a while, I think I will write about how to make your python code more efficient! (Also, writing this up means I can stop being stuck on my project for a few minutes… hence, thus I posit my ulterior motive…)

Having never had formal training in coding (barring a high school course, in which we spent several months on ‘Hello World’…. So, no formal training), I am what I term a ‘functional coder.’ I can write code. But that’s about all. It might run and get the job done, but when you look at the actual raw code you will see recursive for loops and a stark lack of commenting, variable names like ‘bunny’ and ‘widget’ (hey, it seemed cute at the time! Who knew I would need to come back to this code later?) and almost every other mark of a rookie you can think of. This summer, this has begun to haunt me; I need to run little chunks of code thousands of times as I approach my correct parameters in a Markov Chain Monte Carlo (this process I am also trying to make more efficient, but, more on that later).

I started out knowing very little about making code efficient. Well, I did realize in Ay117 (the astrostats course I took this past spring) that my five-deep for loops could cause laughter when shown to others. I don’t want to be a comedian, though: I want to be a competent coder!

Tim (a graduate student in Professor Johnson’s group) showed me a couple cool tricks earlier this summer in making code more efficient, all of which I have been using ever since! I will show here my personal favorite diagnostic tool, prun.

prun is a function that is part of the ‘core.magic’ module (so, no need to import anything! It’s super easy!). You call is by simply writing, either on your command line on in a developing environment:

%prun functionname

When functionname is the function you want to test. There are also some options that you can insert as follows, when 'o' is the identifier of your option: documentation of prun.

This is what you might see when running prun from the command line.


For me, since I will be calling this function many, many times, I want to try and eliminate whatever is taking the most time. I see that the total time for this function is 1.584 seconds, which seems pretty good, right? Only two seconds? You can wait for two seconds for your function to run, right?

That is what I thought at first, too. 1.5 seconds seems fine at first glance, but the nature of my project is such that I will need to run this particular method hundreds or even thousands of times. Suddenly, a couple seconds isn’t sounding so good anymore… luckily, prun separates out for me which functions are taking the most time to run, and how many times they are being called. By looking at this chart generated by prun, I can see that the functions taking the most time are the interpolation and shift_lhood, I function I wrote. There are two easy ways to fix that! Firstly, you might notice that the interpolation it being called 11 times. By examining the code, I notice that the same function is being interpolated 11 times. Silly me, I hadn’t realized that I did that… well, that’s an easy fix!

The second thing I can do to improve the efficiency of this function is to improve my shift_likelihood function. The old version took, as according to the profile shown above, 0.121 seconds PER CALL to run (see the fourth row, fifth column). Here is the improved version of that function, shift_lhood, called on its own. To improve it, I tried using matrices instead of for loops to do element by element manipulations. Now it only takes 0.093 seconds! What an improvement!




prun is a useful function, and it’s my first step on the long journey to becoming a better coder (and not spending days and unneeded days running code)!

2 comments:

  1. This is a great blog post! I can definitely see other students learning a lot from what you've written here. Nice work.

    Keep these posts coming!

    -Prof.

    ReplyDelete
  2. i need to learn python! :/ still running the bloated matlab GUI. :( i should switch to numpy, but the activation barrier is just so large. i will keep prun in mind though. :) AHH I MISS RUNNING WITH YOU. :( i get to run AND learn things. win win.

    ReplyDelete