Relational Topic Models
From CSWiki
Choosing the sparsity parameter
On the senate dataset, running spectral clustering for various values of K gives the following:
K | False positives | False negatives |
---|---|---|
5 | .606 | .058 |
10 | .354 | .078 |
15 | .126 | .078 |
20 | .193 | .094 |
25 | .157 | .107 |
30 | .135 | .114 |
Even with 30 topics, this would imply that we're not seeing at least around 15% of true links. Since spectral clustering is likely to be overfitting in this case, a reasonable compromise between all the K might be 25%. Although, since for this dataset we'd expect the true K to be small, 50% might be a better estimate.
--Jcone 18:27, 7 April 2008 (EDT)