Relational Topic Models

From CSWiki
Revision as of 18:36, 7 April 2008 by Jcone (talk | contribs) (Choosing the sparsity parameter)

Jump to: navigation, search

Choosing the sparsity parameter

On the senate dataset, running spectral clustering for various values of K gives the following:

K False positives False negatives
5 .606 .058
10 .354 .078
15 .126 .078
20 .193 .094
25 .157 .107
30 .135 .114

Even with 30 topics, this would imply that we're not seeing at least around 15% of true links. Since spectral clustering is likely to be overfitting in this case, a reasonable compromise between all the K might be 25%. Although, since for this dataset we'd expect the true K to be small, 50% might be a better estimate.

--Jcone 18:27, 7 April 2008 (EDT)