Relational Topic Models
Choosing the sparsity parameter
On the senate dataset, running spectral clustering for various values of K gives the following:
|K||False positives||False negatives|
Even with 30 topics, this would imply that we're not seeing at least around 15% of true links. Since spectral clustering is likely to be overfitting in this case, a reasonable compromise between all the K might be 25%. Although, since for this dataset we'd expect the true K to be small, 50% might be a better estimate.
--Jcone 18:27, 7 April 2008 (EDT)