Difference between revisions of "Relational Topic Models"
m (→Choosing the sparsity parameter) |
m |
||
Line 1: | Line 1: | ||
+ | == Modeling Sparsity == | ||
+ | |||
+ | In an undirected setting, let us consider having chosen z_ij and z_ji and then selecting the response according to r_ij ~ Bernoulli(\eta_{z_ij, z_ji}). | ||
+ | In modeling sparsity, we assume that we draw another hidden variable say y_ij ~ Bernoulli(\eta_{z_ij, z_ji}). And then draw r_ij ~ Bernoulli(\rho) if y_ij = 1 and r_ij ~ \delta(0) otherwise. | ||
+ | |||
== Choosing the sparsity parameter == | == Choosing the sparsity parameter == | ||
On the senate dataset, running spectral clustering for various values of K gives the following: | On the senate dataset, running spectral clustering for various values of K gives the following: |
Revision as of 18:51, 7 April 2008
Modeling Sparsity
In an undirected setting, let us consider having chosen z_ij and z_ji and then selecting the response according to r_ij ~ Bernoulli(\eta_{z_ij, z_ji}). In modeling sparsity, we assume that we draw another hidden variable say y_ij ~ Bernoulli(\eta_{z_ij, z_ji}). And then draw r_ij ~ Bernoulli(\rho) if y_ij = 1 and r_ij ~ \delta(0) otherwise.
Choosing the sparsity parameter
On the senate dataset, running spectral clustering for various values of K gives the following:
K | False positives | False negatives |
---|---|---|
5 | .606 | .058 |
10 | .354 | .078 |
15 | .126 | .078 |
20 | .193 | .094 |
25 | .157 | .107 |
30 | .135 | .114 |
Even with 30 topics, this would imply that we're not seeing at least around 15% of true links. Since spectral clustering is likely to be overfitting in this case, a reasonable compromise between all the K might be 25%. Although, since for this dataset we'd expect the true K to be small, 50% might be a better estimate.
--Jcone 18:27, 7 April 2008 (EDT)