Wednesday, October 19, 2005

Latent Dirichlet Allocation

I have begun work on a MATLAB implementation of Latent Dirichlet Allocation ala Blei(who is currently at CMU). Jon and I will be looking at classification/clustering/searching of a data set made up of 20,000 newsgroup articles.

LDA does not model a document as belonging to one topic. Instead, each document from the corpus is modeled as a finite mixture of topics. In this generative probabilistic model, parameter estimation and inference are the two main algorithms that we will implement.

I am interested in LDA for several reasons. Primarily I want experience implementing machine learning algorithms; however, LDA is actually of interest to vision researchers. I think this project will teach me useful things about Machine Learning and I look forward to having a MATLAB implementation of LDA that I can later adopt for vision tasks.