Wednesday, November 16, 2011

don't throw away old code: github-it!

My thesis experiments on Exemplar-SVMs (my PhD thesis link: Note, 33MB) would have taken approximately 20 CPU years to finish.  But not on a fat CMU cluster!  Here is some simple code which helped make things possible in ~1month of 200+ cores of crunching.  That scale of computation is not quite Google-scale computing, but it was a unforgettable experience as a CMU PhD student.  I've recently had to go back to the SSH / GNU Screen method of starting scripts at MIT, since we do not have torque/pbs there, but I definitely use these scripts.  Fork it, use it, change it, hack it, improve it, break it, learn from it, etc.

I used these scripts to drive the experiments in my Exemplar-SVM framework (also on Github).

The basic take home message is "do not throw away old code" which you found useful at some time.  C'mon ex-phd students, I know you wrote a lot of code, you graduated and now you feel embarrassed to share your code.  Who cares if you never had a chance to clean it up, if the world never gets to see it then it will die a silent death from lack of use.  Just put it on Github, and let others take a look.  Git is the world's best source control/versioning system. Its distributed nature makes it perfect for large-scale collaboration.  Now with github sharing is super easy! Sharing is caring.  Let's make the world a better place for hackerdom, one repository at a time.  I've met some great hackers at MIT, such as the great cvondrick, who is still teaching me how to branch like a champ.

Mathematicians share proofs.  Hackers share code.  Embrace technology, embrace Github.  If you ever want to hack with me, it is probably as important for you to know the basics of git as it is for you to be a master of linear algebra.

Additional Reading:
Distributed Version Control: The Future of History, an article about Git by some Kitware software engineers


  1. Hi Tomasz, I was wondering if you take permission from adviser/univ/funding agency before you make code public?

  2. Just be a bad-ass and do what you want.

  3. Serious answer now:

    Making your code public is like making pre-prints of your papers available on arXiv before publication -- a lot of people are afraid of doing it and "getting into trouble." I think it is best to let your advisor know that you plan on making all of your research code/data available to the public. If you are writing code and not allowed to make it public, then try to get paid at least 100K/year for doing it.

    If you are going to be a poor academic, then give all your stuff away.