Wednesday, November 16, 2011

don't throw away old code: github-it!

My thesis experiments on Exemplar-SVMs (my PhD thesis link: Note, 33MB) would have taken approximately 20 CPU years to finish.  But not on a fat CMU cluster!  Here is some simple code which helped make things possible in ~1month of 200+ cores of crunching.  That scale of computation is not quite Google-scale computing, but it was a unforgettable experience as a CMU PhD student.  I've recently had to go back to the SSH / GNU Screen method of starting scripts at MIT, since we do not have torque/pbs there, but I definitely use these scripts.  Fork it, use it, change it, hack it, improve it, break it, learn from it, etc.

I used these scripts to drive the experiments in my Exemplar-SVM framework (also on Github).

The basic take home message is "do not throw away old code" which you found useful at some time.  C'mon ex-phd students, I know you wrote a lot of code, you graduated and now you feel embarrassed to share your code.  Who cares if you never had a chance to clean it up, if the world never gets to see it then it will die a silent death from lack of use.  Just put it on Github, and let others take a look.  Git is the world's best source control/versioning system. Its distributed nature makes it perfect for large-scale collaboration.  Now with github sharing is super easy! Sharing is caring.  Let's make the world a better place for hackerdom, one repository at a time.  I've met some great hackers at MIT, such as the great cvondrick, who is still teaching me how to branch like a champ.

Mathematicians share proofs.  Hackers share code.  Embrace technology, embrace Github.  If you ever want to hack with me, it is probably as important for you to know the basics of git as it is for you to be a master of linear algebra.

Additional Reading:
Distributed Version Control: The Future of History, an article about Git by some Kitware software engineers