## Wednesday, October 19, 2005

### Latent Dirichlet Allocation

I have begun work on a MATLAB implementation of Latent Dirichlet Allocation ala Blei(who is currently at CMU). Jon and I will be looking at classification/clustering/searching of a data set made up of 20,000 newsgroup articles.

LDA does not model a document as belonging to one topic. Instead, each document from the corpus is modeled as a finite mixture of topics. In this generative probabilistic model, parameter estimation and inference are the two main algorithms that we will implement.

I am interested in LDA for several reasons. Primarily I want experience implementing machine learning algorithms; however, LDA is actually of interest to vision researchers. I think this project will teach me useful things about Machine Learning and I look forward to having a MATLAB implementation of LDA that I can later adopt for vision tasks.

#### 7 comments:

1. Jonathan has posted the LDA code we wrote for our Machine Learning course.

The code can be found under the "Code" section of Jon's Projects Page.

2. This MATLAB version of Latent Dirichlet Allocation requires Minka's Lightspeed toolbox.

3. Anonymous3:41 AM

Do you have try to compare the LDA and PLSA?

4. In the domain of text, we did compare LDA and pLSA and found the results fairly similar. The LDA results looked slightly better; however we didn't do a quantitative evaluation.

An advantage of being Bayesian and using LDA as opposed to pLSA is that you aren't restricted to using a Dirichlet Prior for the topic mixture proportions. In the domain of object detection, Jon and I used the Logistic-Normal distribution to model the context between objects (like the Correlated Topic Model paper from Blei/Lafferty). You can find more information about the Logistic-Normal here:

Detecting Objects via Multiple Segmentations and Latent Topic Models

and
Correlated Topic Model Details

5. Anonymous9:32 PM

Hey, do you read Blei's work on GM-LDA? Do you know how to use Gibbs sampling to do inference for GM-LDA? Please let me know by using sueyan@gmail.com. Thanks!

6. Anonymous11:44 AM

What license is the code released under?

7. I am new to lda, but I confuse about it input and output, how can I use Ida. how can I chang text file into lda input format. I am not cleat on feature id. is it means this word is fifth in the vocabulary and it's feature id is 15.