Difference between revisions of "Relational Topic Models"

From CSWiki
Jump to: navigation, search
m (Choosing the sparsity parameter)
m
Line 1: Line 1:
 +
== Modeling Sparsity ==
 +
 +
In an undirected setting, let us consider having chosen z_ij and z_ji and then selecting the response according to r_ij ~ Bernoulli(\eta_{z_ij, z_ji}). 
 +
In modeling sparsity, we assume that we draw another hidden variable say y_ij ~ Bernoulli(\eta_{z_ij, z_ji}).  And then draw r_ij ~ Bernoulli(\rho) if y_ij = 1 and r_ij ~ \delta(0) otherwise.
 +
 
== Choosing the sparsity parameter ==
 
== Choosing the sparsity parameter ==
 
On the senate dataset, running spectral clustering for various values of K gives the following:
 
On the senate dataset, running spectral clustering for various values of K gives the following:

Revision as of 18:51, 7 April 2008

Modeling Sparsity

In an undirected setting, let us consider having chosen z_ij and z_ji and then selecting the response according to r_ij ~ Bernoulli(\eta_{z_ij, z_ji}). In modeling sparsity, we assume that we draw another hidden variable say y_ij ~ Bernoulli(\eta_{z_ij, z_ji}). And then draw r_ij ~ Bernoulli(\rho) if y_ij = 1 and r_ij ~ \delta(0) otherwise.

Choosing the sparsity parameter

On the senate dataset, running spectral clustering for various values of K gives the following:

K False positives False negatives
5 .606 .058
10 .354 .078
15 .126 .078
20 .193 .094
25 .157 .107
30 .135 .114

Even with 30 topics, this would imply that we're not seeing at least around 15% of true links. Since spectral clustering is likely to be overfitting in this case, a reasonable compromise between all the K might be 25%. Although, since for this dataset we'd expect the true K to be small, 50% might be a better estimate.

--Jcone 18:27, 7 April 2008 (EDT)