Difference between revisions of "MLReadingGroup"
m (Reverted edit of Monicag, changed back to last version by Zbarutcu) |
|||
(124 intermediate revisions by 21 users not shown) | |||
Line 9: | Line 9: | ||
We maintain an announcement/discussion list for the reading group. You may sign up for the list [https://lists.cs.princeton.edu/mailman/listinfo/ml-reading/ here]. | We maintain an announcement/discussion list for the reading group. You may sign up for the list [https://lists.cs.princeton.edu/mailman/listinfo/ml-reading/ here]. | ||
− | == Schedule == | + | ==Schedule (Fall 2008) == |
− | Our weekly meetings are | + | Our weekly meetings are '''Mo 1:00-5:00pm''' on the 3rd floor of the CS building (CS 302). |
+ | |||
+ | * Graphical Models, exponential families, and variational inference | ||
+ | |||
+ | |||
+ | ==Schedule (Spring 2008) == | ||
+ | Our weekly meetings are '''Tue 1:00-2:30pm''' in the AI lab on the 4th floor of the CS building (CS 431). | ||
+ | |||
+ | |||
+ | Schedule of topics: | ||
+ | |||
+ | * 13 May 2008 | ||
+ | ** Topic: Maximum Entropy Discrimination | ||
+ | ** Leader: Chong Wang | ||
+ | ** Main Paper: [http://people.csail.mit.edu/tommi/papers/JaaMeiJeb-nips99.ps Tommi Jaakkola, Marina Meila, and Tony Jebara, Maximum Entropy Discrimination, In ''NIPS'' 1999.] | ||
+ | ** Long Version: [http://people.csail.mit.edu/tommi/papers/maxent.ps Tommi Jaakkola, Marina Meila, and Tony Jebara, Maximum Entropy Discrimination, Technical Report AITR-1668, MIT, 1999] | ||
+ | |||
+ | * 6 May 2008 | ||
+ | ** Topic: Feature selection for relational data | ||
+ | ** Leader: Jonathan Chang | ||
+ | ** Main Paper: [http://citeseer.ist.psu.edu/635777.html Jensen, Neville, and Hay (2003), Avoiding Bias when Aggregating Relational Data with Degree Disparity] | ||
+ | ** Background: [http://citeseer.ist.psu.edu/jensen02linkage.html Jensen and Neville (2002), Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning] | ||
+ | |||
+ | * 22 April 2008 | ||
+ | ** Topic: Game theory | ||
+ | ** Leader: Indraneel Mukherjee | ||
+ | ** Main Paper: [http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.pjm/1103044235 An Analog of the Minimax Theorem for Vector Payoffs] | ||
+ | |||
+ | * 15 April 2008 | ||
+ | ** Topic: Conditional Random Fields | ||
+ | ** Leader: Berk Kapicioglu | ||
+ | ** Main Paper: [http://www.seas.upenn.edu/~strctlrn/bib/PDF/crf.pdf J. Lafferty, A. McCallum, and F. Pereira (2001), Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data] | ||
+ | |||
+ | * 8 April 2008 | ||
+ | ** Topic: Online Feature Selection | ||
+ | ** Leader: Melissa Carroll | ||
+ | ** Main Paper: [http://jmlr.csail.mit.edu/papers/volume7/zhou06a/zhou06a.pdf Jing Zhou, Dean P. Foster, Robert A. Stine, and Lyle H. Ungar (2006), Streamwise Feature Selection] | ||
+ | ** Background Paper: [http://www-stat.wharton.upenn.edu/~stine/research/smr.pdf Robert A. Stine (2003), Model Selection using Information Theory and the MDL Principle] | ||
+ | |||
+ | * 1 April 2008 | ||
+ | ** Topic: Reinforcement learning and online learning | ||
+ | ** Leader: Umar Syed | ||
+ | ** Main Paper: [http://books.nips.cc/papers/files/nips20/NIPS2007_0631.pdf Alexander Strehl and Michael Littman (2008), Online Linear Regression and Its Application to Reinforcement Learning] | ||
+ | ** Background Paper: [http://citeseer.ist.psu.edu/638941.html Peter Auer (2002), Using Confidence Bounds for Exploitation-Exploration Trade-offs] | ||
+ | ** Background Paper: [http://citeseer.ist.psu.edu/443693.html Ronen I. Brafman and Moshe Tennenholtz (2002), R-max – A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning] | ||
+ | |||
+ | * 25 March 2008 | ||
+ | ** Topic: Network/Relational Learning | ||
+ | ** Leader: Jonathan Chang | ||
+ | ** Main Paper: [http://arxiv.org/abs/0803.1628v1 Janne Sinkkonen, Janne Aukia, Samuel Kaski. Component models for large networks] | ||
+ | ** Background Paper: [http://citeseer.ist.psu.edu/cohn01missing.html D Cohn, T Hofmann. The Missing Link-A Probabilistic Model of Document Content and Hypertext Connectivity] | ||
+ | ** Background Paper: [http://arxiv.org/abs/0705.4485 Edoardo M Airoldi, David M Blei, Stephen E Fienberg, Eric P Xing. Mixed membership stochastic blockmodels] | ||
+ | |||
+ | * 11 March 2008 | ||
+ | ** Topic: Online learning with experts | ||
+ | ** Leader: Indraneel Mukherjee | ||
+ | ** Paper: [http://nagoya.uchicago.edu/~jabernethy/Binning.pdf Jacob Abernethy, John Langford, Manfred Warmuth. The Binning Algorithm ] | ||
+ | |||
+ | * 04 March 2008 | ||
+ | ** Topic: Incorporating domain knowledge into POS tagging | ||
+ | ** Leader: Jordan Boyd-Graber | ||
+ | ** Paper: [http://books.nips.cc/papers/files/nips20/NIPS2007_0964.pdf Toutanova, Kristina and Johnson, Mark. A Bayesian LDA-based model for semi-supervised part-of-speech tagging. (2007)] | ||
+ | ** Paper: [http://portal.acm.org/citation.cfm?id=1219884 Smith, Noah and Eisner, Jason. Contrastive Estimation: Training Log-Linear Models on Unlabeled Data. (2005)] | ||
+ | |||
+ | * 26 February 2008 | ||
+ | ** Leader: Berk Kapicioglu | ||
+ | ** Paper: [http://citeseer.ist.psu.edu/celeux95stochastic.html Gilles Celeux, Didier Chauveau, Jean Diebolt. "On Stochastic Versions of the EM Algorithm.", 1995] | ||
+ | |||
+ | ==Schedule (Fall 2007) == | ||
+ | |||
+ | Our weekly meetings are '''Wed 4:00-5:30pm''' in the AI lab on the 4th floor of the CS building (CS 431). | ||
+ | |||
+ | Schedule of topics: | ||
+ | * 12 December 2007 | ||
+ | ** Leader: Zafer Barutcuoglu | ||
+ | ** Paper: [http://www.cs.utoronto.ca/~hinton/absps/fastnc.pdf G.E. Hinton, S. Osindero, Y.W. Teh. "A Fast Learning Algorithm for Deep Belief Nets." Neural Computation, 2006.] | ||
+ | ** More empirical results: [http://www-etud.iro.umontreal.ca/~larocheh/publications/greedy-deep-nets-nips-06.pdf Bengio et al. "Greedy Layer-wise Training of Deep Networks." NIPS, 2006.] | ||
+ | * 28 November 2007 | ||
+ | ** Leader: Umar Syed | ||
+ | ** Paper: [http://www.cs.princeton.edu/~usyed/SyedSchapireNIPS2007.pdf Umar Syed and Robert E. Schapire. "A game-theoretic approach to apprenticeship learning", NIPS (2008)]. | ||
+ | ** Background reading: The work is based on [http://www.cs.princeton.edu/~schapire/uncompress-papers.cgi/FreundSc96b.ps Yoav Freund and Robert E. Schapire, "Game theory, on-line prediction, and boosting", COLT (1996)] (see Section 2 and the Appendix). | ||
+ | * 14 November 2007 | ||
+ | ** Leader: Melissa Carroll | ||
+ | ** Paper: [http://www-stat.stanford.edu/~hastie/Papers/B67.2%20(2005)%20301-320%20Zou%20&%20Hastie.pdf Hui Zou and Trevor Hastie. "Regularization and variable selection via the elastic net." (2005) J. R. Statist. Soc. B, 67(2), pp. 301–320.] | ||
+ | ** Background reading: It may be helpful to read up on or review LASSO and LARS. See [http://www-stat.stanford.edu/~tibs/lasso.html the LASSO Page] | ||
+ | |||
+ | * 24 October 2007 | ||
+ | ** Patrón: Berk Kapicioglu | ||
+ | ** Paper: [http://www.research.microsoft.com/~joshuago/exponentialprior-final.pdf Joshua Goodman. "Exponential Priors for Maximum Entropy Models." North American ACL 2004.] | ||
+ | |||
+ | Additional topics: | ||
+ | * relational network models | ||
+ | * DP + parse trees | ||
+ | * online learning | ||
+ | * semi-supervised learning | ||
+ | * stochastic gradient | ||
+ | * convex optimizing | ||
+ | * parallel learning | ||
+ | * game theory | ||
+ | |||
+ | ==Schedule (Spring 2007) == | ||
+ | |||
+ | Our weekly meetings are '''Thu 1:30-3:00pm''' in the AI lab on the 4th floor of the CS building (CS 431). | ||
+ | |||
+ | Schedule of topics: | ||
+ | * 29 March 2007 | ||
+ | ** Leader: Indraneel Mukherjee | ||
+ | ** Paper: [http://www.cs.berkeley.edu/~feisha/pubs/nips2006.pdf Large Margin Hidden Markov Models for Automatic Speech Recognition] | ||
+ | * 15 March 2007 | ||
+ | ** Leader: Jordan Boyd-Graber | ||
+ | ** Paper 1: [http://www.cs.cornell.edu/home/llee/papers/textstruct.pdf Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization (2004)] | ||
+ | ** Paper 2: [http://www.stanford.edu/~mpurver/papers/purver-et-al06acl.pdf Unsupervised topic modelling for multi-party spoken discourse (2006)] | ||
+ | * 8 March 2007 | ||
+ | ** Leader: Umar Syed | ||
+ | ** Paper: [http://ai.stanford.edu/~ang/papers/icml04-apprentice.pdf Pieter Abbeel and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning." ICML 2004.] | ||
+ | * 1 March 2007 | ||
+ | ** Leader: Miro Dudik | ||
+ | ** Paper: [http://www.cs.princeton.edu/~mdudik/ShalevShwartzSi06.pdf Shai Shalev-Shwartz and Yoram Singer. "Convex Repeated Games and Fenchel Duality."] | ||
+ | ** See also [http://www.cs.huji.ac.il/~shais/papers/ShalevSi06_fench_tech.pdf a more recent version from NIPS 2006]. It contains more references and the math is slightly different; e.g., it introduces strong convexity relative to a norm. I find it a little bit too condensed and more difficult to read. | ||
+ | * 22 February 2007 | ||
+ | ** Leader: Melissa Carroll | ||
+ | ** Paper: [http://www.icml2006.org/icml_documents/camera-ready/055_Hidden_Process_Model.pdf R.A. Hutchinson, T. Mitchell, I. Rustandi. "Hidden Process Models." ICML 2006.] | ||
+ | ** Background on fMRI classification application: [http://www.cs.cmu.edu/afs/cs/project/theo-73/www/papers/mlj04-final-published.pdf T. Mitchell, R. Hutchinson, R. Niculescu, F. Pereira, X. Wang. "Learning to Decode Cognitive States from Brain Images." Machine Learning, 57, 145–175, 2004.] | ||
+ | * 15 February 2007 | ||
+ | ** Leader: Edo Airoldi | ||
+ | ** Paper: [http://research.microsoft.com/~cmbishop/downloads/Bishop-CVPR-06.pdf J. Lasserre, C. M. Bishop, and T. Minka. "Principled hybrids of generative and discriminative models." CVPR 2006.] | ||
+ | ** Notes: [ftp://ftp.research.microsoft.com/pub/tr/TR-2005-144.pdf T. Minka. "Discriminative models, not discriminative training." MSR-TR-144 2005.] | ||
+ | * 8 February 2007 | ||
+ | ** Leader: Joe Calandrino | ||
+ | ** Paper: [http://www.cs.bgu.ac.il/~kobbi/papers/psd.pdf I. Dinur, K. Nissim. "Revealing Information while Preserving Privacy." PODS 2003.] | ||
+ | * inverse RL (Umar) | ||
+ | * disc/gen approaches (Bishop...) | ||
+ | * active learning (BK) | ||
+ | * hidden process models (MC) | ||
+ | * Gaussian processes (Z) | ||
+ | * Dirichlet processes | ||
+ | * Semisupervised learning (Florian) | ||
+ | ** [http://www.e-publications.org/ims/submission/index.php/STS/user/submissionFile/45?confirm=ab8bceff <b>The use of unlabelled data in predictive modelling</b>, Liang F, Mukherjee S and West M, Statistical Science, to appear.] | ||
+ | ** some paper by Lafferty and Wasserman ? | ||
+ | * Quantam neural networks (Vaneet) | ||
+ | * Manifold learning (Z) | ||
+ | * On-line learning (Berk) | ||
+ | * Music stuff/transcription (R) | ||
+ | * Variational methods (JC) | ||
+ | * Random projections (Charikar) | ||
+ | |||
+ | == Schedule (Fall 2006) == | ||
+ | Our weekly meetings are <b>Fridays, 3pm to 5pm, in CS 402</b>. | ||
Scheduled readings: | Scheduled readings: | ||
− | * | + | * <font color="gray"><b>5</b> October 2006 <b>THURSDAY 4:30PM this week</b></font> |
** Leader: Zafer | ** Leader: Zafer | ||
− | ** Paper: [http://www. | + | ** Paper: [http://www.cs.princeton.edu/~zbarutcu/temp/DC_Overview.pdf Horst, R., Thoai, N. V. 1999. <b>DC Programming: Overview</b>. <i>J. Optim. Theory Appl.</i>] |
+ | *: (If you need a review: [http://www.cs.princeton.edu/~zbarutcu/temp/convex_opt_tutorial.pdf Hindi, H. 2004. <b>A Tutorial on Convex Optimization</b>. <i>American Control Conference</i>.]) | ||
+ | *: (Application for the interested: [http://www.icml2006.org/icml_documents/camera-ready/006_A_DC_Programming_Alg.pdf Argyriou A. <i>et al.</i> 2006. <b>A DC-Programming Algorithm for Kernel Selection</b>. <i>ICML</i>].) | ||
+ | *: And another application [http://www.optimization-online.org/DB_FILE/2005/06/1149.pdf Ellis, S. and Nayakkankuppam, V. <b> Phylogenetic Analysis Via DC Programming </b>.] | ||
+ | |||
+ | * 13 October 2006 | ||
+ | ** Leader: Miro | ||
+ | ** Papers: | ||
+ | *** [http://www.cs.princeton.edu/~mdudik/lasso/lasso.pdf Robert Tibshirani. <b>Regression shrinkage and selection via the Lasso</b>. <i>J. R. Statist. Soc. B</i>, 1995.] | ||
+ | *** [http://www.cs.princeton.edu/~mdudik/lasso/ng_l1logisticreg.pdf Su-In Lee, Honglak Lee, Pieter Abbeel and Andrew Y. Ng. <b>Efficient L1 regularized logistic regression</b>. <i>In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-06)</i>, 2006.] | ||
+ | ** Paper: | ||
+ | * 20 October 2006 | ||
+ | ** Leader: Florian | ||
+ | ** Paper: [http://www.stat.berkeley.edu/~nicolai/consistent.pdf N Meinshausen and P Bühlmann, <b>High Dimensional Graphs and Variable Selection With the Lasso</b>, Annals of Statistics 34(3), 1436-1462] | ||
+ | ** Background reading: [http://www.jmlr.org/papers/volume1/heckerman00a/heckerman00a.pdf D Heckerman, DM Chickering, C Meek, R Rounthwaite, C Kadie, <b>Dependency Networks for Inference, Collaborative Filtering, and Data Visualization</b>, JMLR, 1(Oct):49-75, 2000] <!--and [http:// S.-I. Lee, V. Ganapathi, and D. Koller, <b>Efficient Structure Learning of Markov Networks using L1-Regularization</b> Advances in Neural Information Processing Systems (NIPS 2006)] --> | ||
+ | * 27 October 2006 | ||
+ | ** Leader: | ||
+ | ** Paper: | ||
+ | * 3 November 2006 | ||
+ | ** Leader: | ||
+ | ** Paper: | ||
+ | * 10 November 2006 | ||
+ | ** "Leader": Jordan | ||
+ | ** Paper: [http://www.vinartus.net/spa/03c-v7.pdf Understanding the Yarowsky Algorithm] Abney, S. 2004. Understanding the Yarowsky Algorithm. Comput. Linguist. 30, 3 (Sep. 2004), 365-395. | ||
+ | ** Background: [http://acl.ldc.upenn.edu/P/P95/P95-1026.pdf The original paper] Yarowsky, D. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting on Association For Computational Linguistics (Cambridge, Massachusetts, June 26 - 30, 1995). Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 189-196. | ||
+ | ** Background: [http://www.cs.princeton.edu/~jbg/p2-ide.pdf A good overview of the problem area] Ide, N. and Véronis, J. 1998. Introduction to the special issue on word sense disambiguation: the state of the art. Comput. Linguist. 24, 1 (Mar. 1998), 2-40. | ||
+ | * 17 November 2006 | ||
+ | ** Boss: Berk Kapicioglu | ||
+ | ** Paper: [http://hunch.net/~jl/projects/reductions/woa/woa.pdf Weighted One-Against-All] | ||
+ | ** Nostalgia: Rifkin and Klautau. [http://five-percent-nation.mit.edu/PersonalPages/rif/Pubs/ovadefense.ps "In Defense of One-Vs-All Classification."] Journal of Machine Learning Research, Volume 5, pp. 101-141, 2004. | ||
+ | * 1 December 2006 | ||
+ | ** Leader: Jonathan Chang | ||
+ | ** Paper: [http://acl.ldc.upenn.edu/P/P06/P06-1101.pdf Snow, Jurafsky, and Ng. "Semantic Taxonomy Induction from Heterogenous Evidence." Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 801-808, July 2006.] | ||
+ | ** Some useful (perhaps) background on the technique: [http://ai.stanford.edu/~ang/papers/nips04-hypernym.pdf Snow, Jurafsky, and Ng. "Learning Syntactic Patterns for Automatic Hypernym Discovery."] | ||
+ | ** Background on what they're trying to do: [http://portal.acm.org/citation.cfm?doid=219717.219748 Miller. "WordNet: A Lexical Database for English." Communications of the ACM, Volume 38, Issue 11 (November 1995), Pages 39-41.] | ||
+ | * 15 December 2006 | ||
+ | ** Leader: Rebecca Fiebrink | ||
+ | ** Paper: [http://citeseer.ist.psu.edu/cache/papers/cs/2737/http:zSzzSzwizpak.iaf.uiowa.eduzSz%7EsgiserviceszSzVarsityzSzsilicon_campzSzMinesetzSztechzSzaccEst.pdf/kohavi95study.pdf Kohavi, R. 1995. "A study of cross-validation and bootstrap for accuracy estimation and model selection." Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI).] | ||
== Proposed Topics and Papers == | == Proposed Topics and Papers == | ||
Line 21: | Line 205: | ||
* Lasso | * Lasso | ||
+ | ** The original paper: [http://www.cs.princeton.edu/~mdudik/lasso/lasso.pdf Robert Tibshirani. <b>Regression shrinkage and selection via the Lasso</b>. <i>J. R. Statist. Soc. B 58</i>, 1995.] | ||
+ | ** [http://www-stat.stanford.edu/~tibs/lasso.html The Lasso Page] | ||
+ | ** Generalization properties: | ||
+ | *** [http://stat.ethz.ch/research/research_reports/2006/133 Sara A. van de Geer. <b>High-dimensional generalized linear models and the Lasso</b>. <i>Tech. Report</i>, June 2006.] | ||
+ | *** [http://www.cs.princeton.edu/~mdudik/lasso/knightfu_lasso.pdf Keith Knight and Wenjiang Fu. <b>Asymptotics for lasso-type estimators</b>. <i>Ann. Statist. 28</i>, 2000.] | ||
+ | ** Sparse approximation: | ||
+ | *** [http://www-stat.stanford.edu/~donoho/Reports/2004/l1l0approx.pdf David Donoho. <b>For most large underdetermined systems of linear equations, the minimal l1-norm near-solution approximates the sparsest near-solution</b>, <i>Tech. Report</i>, August 2004.] | ||
+ | *** [http://www-stat.stanford.edu/~donoho/Reports/2004/l1l0EquivCorrected.pdf David Donoho. <b>For most large underdetermined systems of linear equations, the minimal l1-norm solution is also the sparsest solution</b>, <i>Tech. Report</i>, September 2004.] | ||
+ | ** Model selection: [http://www.cs.princeton.edu/~mdudik/lasso/msboost.ps P. Buhlmann and B. Yu. <b>Boosting, Model Selection, Lasso and Nonnegative Garotte</b>. <i>Tech. Report</i>, 2005.] | ||
+ | ** Relatives: | ||
+ | *** [http://www.cs.cmu.edu/~lafferty/pub/rodeo.pdf <b>RODEO</b>] | ||
+ | *** [http://www-stat.stanford.edu/~tibs/ftp/LeastAngle_2002.pdf <b>Least angle regression</b>] | ||
+ | *** [http://www.cs.princeton.edu/~mdudik/lasso/elastic_net.pdf Hui Zou and Trevor Hastie. <b>Regularization and Variable Selection via the Elastic Net</b>. <i>J. R. Statist. Soc. B</i>, 2005] + [http://www.cs.princeton.edu/~mdudik/lasso/elastic_net_correction.pdf <b>Addendum</b>] | ||
+ | *** [http://www.cs.princeton.edu/~mdudik/lasso/ng_l1logisticreg.pdf Su-In Lee, Honglak Lee, Pieter Abbeel and Andrew Y. Ng. <b>Efficient L1 regularized logistic regression</b>. <i>In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-06)</i>, 2006.] | ||
* Optimization | * Optimization | ||
+ | * Language Applications | ||
* Music | * Music | ||
+ | ** [http://scholar.google.com/scholar?q=%22a+generative+model+for+music+transcription%22&hl=en&lr=&btnG=Search Cemgil, A. T., H. J. Kappen, and D. Barber. 2005. A generative model for music transcription. <i>IEEE Transactions on Speech and Audio Processing</i>.] | ||
+ | * Markov Decision Processes (MDPs) and Reinforcement Learning | ||
+ | ** State-space abstraction/aggregation in MDPs | ||
+ | ** The E^3 algorithm (Kearns and Singh) | ||
+ | ** Inverse reinforcement learning | ||
+ | * Active Learning | ||
+ | * Deep Neural Networks | ||
+ | * Factor Graphs<br><i>Algorithms that must deal with complicated global functions of many variables often exploit the manner in which the given functions factor as a product of "local" functions, each of which depends on a subset of the variables.</i> | ||
+ | ** Tutorial: [http://cba.mit.edu/events/03.11.ASE/docs/Loeliger.pdf Factor Graphs and the Sum-Product Algorithm], F.R. Kschischang, B. Frey, H-A Loelinger, <i>IEEE Transactions on Information Theory, Vol 47, No 2, Feb 2001.</i> | ||
+ | ** Application: [http://people.csail.mit.edu/tommi/papers/YeaIdeJaa-jcb04.pdf Physical Network Models], C-H Yeang, T Ideker, T Jaakkola, <i>Journal of Computational Biology, Vol 11, No 2-3, 2004</i> | ||
+ | * Cross-validation | ||
+ | ** [http://citeseer.ist.psu.edu/cache/papers/cs/2737/http:zSzzSzwizpak.iaf.uiowa.eduzSz%7EsgiserviceszSzVarsityzSzsilicon_campzSzMinesetzSztechzSzaccEst.pdf/kohavi95study.pdf Kohavi, R. 1995. "A study of cross-validation and bootstrap for accuracy estimation and model selection." Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI).] | ||
+ | *** "Do 10-fold CV instead of leave-one-out," at least when doing model selection | ||
+ | ** [http://www.esa.espci.fr/ARTICLES/1999vcnc.pdf Rivals, I., and L. Personnaz. 1999. "On cross-validation for model selection." Neural Computation 11 (4).] | ||
+ | *** "Do statistical tests instead of leave-one-out," at least when doing model selection | ||
+ | ** [http://www.cs.ucl.ac.uk/staff/M.Pontil/reading/natoasi.pdf Elisseeff, A., and M. Pontil. 2003. "Leave-one-out error and stability of learning algorithms with applications." Advances in Learning Theory: Methods, Models and Applications, NATO Science Series III: Computer and Systems Sciences, Vol. 190, J. Suykens et al. Eds.] | ||
+ | *** Reasons about sufficient conditions for LOO-CV error to approach generalization error for fixed algorithms (including kernel methods). A more theoretical (and maybe less practical) paper. | ||
+ | *** see also [http://www.cs.ucl.ac.uk/staff/M.Pontil/reading/EvgPonEli02.pdf Evgeniou, T., M. Pontil, and A. Elisseeff. 2004. "Leave-one-out error, stability, and generalization of voting combinations of classifiers." Machine Learning 55:(1): 71-97.] (I haven't read this but it's very related.) | ||
== Participants == | == Participants == | ||
Line 28: | Line 245: | ||
=== Faculty === | === Faculty === | ||
+ | * David Blei | ||
* Rob Schapire | * Rob Schapire | ||
− | * | + | |
+ | === PostDocs === | ||
+ | * Edo Airoldi, LSI & CS | ||
+ | * Florian Markowetz, LSI | ||
=== Students === | === Students === | ||
− | * | + | * Indraneel Mukherjee, CS |
+ | * Jordan Boyd-Graber, CS | ||
+ | * Joseph Calandrino, CS | ||
+ | * Melissa Carroll, CS | ||
+ | * Jonathan Chang, EE | ||
* Miroslav Dudik, CS | * Miroslav Dudik, CS | ||
* Rebecca Fiebrink, CS | * Rebecca Fiebrink, CS | ||
+ | * Berk Kapicioglu, CS | ||
+ | * Umar Syed, CS | ||
+ | * Chong Wang, CS | ||
+ | * Sina Jafarpour, CS | ||
+ | * Sean Gerrish, CS | ||
+ | * Richard Socher, CS |
Latest revision as of 09:18, 3 February 2009
Machine Learning Reading Group
Welcome to the wiki of the machine learning reading group.
Contents
Mailing list
We maintain an announcement/discussion list for the reading group. You may sign up for the list here.
Schedule (Fall 2008)
Our weekly meetings are Mo 1:00-5:00pm on the 3rd floor of the CS building (CS 302).
- Graphical Models, exponential families, and variational inference
Schedule (Spring 2008)
Our weekly meetings are Tue 1:00-2:30pm in the AI lab on the 4th floor of the CS building (CS 431).
Schedule of topics:
- 13 May 2008
- Topic: Maximum Entropy Discrimination
- Leader: Chong Wang
- Main Paper: Tommi Jaakkola, Marina Meila, and Tony Jebara, Maximum Entropy Discrimination, In NIPS 1999.
- Long Version: Tommi Jaakkola, Marina Meila, and Tony Jebara, Maximum Entropy Discrimination, Technical Report AITR-1668, MIT, 1999
- 6 May 2008
- Topic: Feature selection for relational data
- Leader: Jonathan Chang
- Main Paper: Jensen, Neville, and Hay (2003), Avoiding Bias when Aggregating Relational Data with Degree Disparity
- Background: Jensen and Neville (2002), Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning
- 22 April 2008
- Topic: Game theory
- Leader: Indraneel Mukherjee
- Main Paper: An Analog of the Minimax Theorem for Vector Payoffs
- 15 April 2008
- Topic: Conditional Random Fields
- Leader: Berk Kapicioglu
- Main Paper: J. Lafferty, A. McCallum, and F. Pereira (2001), Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
- 8 April 2008
- Topic: Online Feature Selection
- Leader: Melissa Carroll
- Main Paper: Jing Zhou, Dean P. Foster, Robert A. Stine, and Lyle H. Ungar (2006), Streamwise Feature Selection
- Background Paper: Robert A. Stine (2003), Model Selection using Information Theory and the MDL Principle
- 1 April 2008
- Topic: Reinforcement learning and online learning
- Leader: Umar Syed
- Main Paper: Alexander Strehl and Michael Littman (2008), Online Linear Regression and Its Application to Reinforcement Learning
- Background Paper: Peter Auer (2002), Using Confidence Bounds for Exploitation-Exploration Trade-offs
- Background Paper: Ronen I. Brafman and Moshe Tennenholtz (2002), R-max – A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
- 25 March 2008
- Topic: Network/Relational Learning
- Leader: Jonathan Chang
- Main Paper: Janne Sinkkonen, Janne Aukia, Samuel Kaski. Component models for large networks
- Background Paper: D Cohn, T Hofmann. The Missing Link-A Probabilistic Model of Document Content and Hypertext Connectivity
- Background Paper: Edoardo M Airoldi, David M Blei, Stephen E Fienberg, Eric P Xing. Mixed membership stochastic blockmodels
- 11 March 2008
- Topic: Online learning with experts
- Leader: Indraneel Mukherjee
- Paper: Jacob Abernethy, John Langford, Manfred Warmuth. The Binning Algorithm
- 04 March 2008
- Topic: Incorporating domain knowledge into POS tagging
- Leader: Jordan Boyd-Graber
- Paper: Toutanova, Kristina and Johnson, Mark. A Bayesian LDA-based model for semi-supervised part-of-speech tagging. (2007)
- Paper: Smith, Noah and Eisner, Jason. Contrastive Estimation: Training Log-Linear Models on Unlabeled Data. (2005)
- 26 February 2008
- Leader: Berk Kapicioglu
- Paper: Gilles Celeux, Didier Chauveau, Jean Diebolt. "On Stochastic Versions of the EM Algorithm.", 1995
Schedule (Fall 2007)
Our weekly meetings are Wed 4:00-5:30pm in the AI lab on the 4th floor of the CS building (CS 431).
Schedule of topics:
- 12 December 2007
- Leader: Zafer Barutcuoglu
- Paper: G.E. Hinton, S. Osindero, Y.W. Teh. "A Fast Learning Algorithm for Deep Belief Nets." Neural Computation, 2006.
- More empirical results: Bengio et al. "Greedy Layer-wise Training of Deep Networks." NIPS, 2006.
- 28 November 2007
- Leader: Umar Syed
- Paper: Umar Syed and Robert E. Schapire. "A game-theoretic approach to apprenticeship learning", NIPS (2008).
- Background reading: The work is based on Yoav Freund and Robert E. Schapire, "Game theory, on-line prediction, and boosting", COLT (1996) (see Section 2 and the Appendix).
- 14 November 2007
- Leader: Melissa Carroll
- Paper: Hui Zou and Trevor Hastie. "Regularization and variable selection via the elastic net." (2005) J. R. Statist. Soc. B, 67(2), pp. 301–320.
- Background reading: It may be helpful to read up on or review LASSO and LARS. See the LASSO Page
- 24 October 2007
- Patrón: Berk Kapicioglu
- Paper: Joshua Goodman. "Exponential Priors for Maximum Entropy Models." North American ACL 2004.
Additional topics:
- relational network models
- DP + parse trees
- online learning
- semi-supervised learning
- stochastic gradient
- convex optimizing
- parallel learning
- game theory
Schedule (Spring 2007)
Our weekly meetings are Thu 1:30-3:00pm in the AI lab on the 4th floor of the CS building (CS 431).
Schedule of topics:
- 29 March 2007
- Leader: Indraneel Mukherjee
- Paper: Large Margin Hidden Markov Models for Automatic Speech Recognition
- 15 March 2007
- Leader: Jordan Boyd-Graber
- Paper 1: Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization (2004)
- Paper 2: Unsupervised topic modelling for multi-party spoken discourse (2006)
- 8 March 2007
- 1 March 2007
- Leader: Miro Dudik
- Paper: Shai Shalev-Shwartz and Yoram Singer. "Convex Repeated Games and Fenchel Duality."
- See also a more recent version from NIPS 2006. It contains more references and the math is slightly different; e.g., it introduces strong convexity relative to a norm. I find it a little bit too condensed and more difficult to read.
- 22 February 2007
- Leader: Melissa Carroll
- Paper: R.A. Hutchinson, T. Mitchell, I. Rustandi. "Hidden Process Models." ICML 2006.
- Background on fMRI classification application: T. Mitchell, R. Hutchinson, R. Niculescu, F. Pereira, X. Wang. "Learning to Decode Cognitive States from Brain Images." Machine Learning, 57, 145–175, 2004.
- 15 February 2007
- 8 February 2007
- Leader: Joe Calandrino
- Paper: I. Dinur, K. Nissim. "Revealing Information while Preserving Privacy." PODS 2003.
- inverse RL (Umar)
- disc/gen approaches (Bishop...)
- active learning (BK)
- hidden process models (MC)
- Gaussian processes (Z)
- Dirichlet processes
- Semisupervised learning (Florian)
- The use of unlabelled data in predictive modelling, Liang F, Mukherjee S and West M, Statistical Science, to appear.
- some paper by Lafferty and Wasserman ?
- Quantam neural networks (Vaneet)
- Manifold learning (Z)
- On-line learning (Berk)
- Music stuff/transcription (R)
- Variational methods (JC)
- Random projections (Charikar)
Schedule (Fall 2006)
Our weekly meetings are Fridays, 3pm to 5pm, in CS 402.
Scheduled readings:
- 5 October 2006 THURSDAY 4:30PM this week
- Leader: Zafer
- Paper: Horst, R., Thoai, N. V. 1999. DC Programming: Overview. J. Optim. Theory Appl.
- (If you need a review: Hindi, H. 2004. A Tutorial on Convex Optimization. American Control Conference.)
- (Application for the interested: Argyriou A. et al. 2006. A DC-Programming Algorithm for Kernel Selection. ICML.)
- And another application Ellis, S. and Nayakkankuppam, V. Phylogenetic Analysis Via DC Programming .
- 13 October 2006
- Leader: Miro
- Papers:
- Paper:
- 20 October 2006
- Leader: Florian
- Paper: N Meinshausen and P Bühlmann, High Dimensional Graphs and Variable Selection With the Lasso, Annals of Statistics 34(3), 1436-1462
- Background reading: D Heckerman, DM Chickering, C Meek, R Rounthwaite, C Kadie, Dependency Networks for Inference, Collaborative Filtering, and Data Visualization, JMLR, 1(Oct):49-75, 2000
- 27 October 2006
- Leader:
- Paper:
- 3 November 2006
- Leader:
- Paper:
- 10 November 2006
- "Leader": Jordan
- Paper: Understanding the Yarowsky Algorithm Abney, S. 2004. Understanding the Yarowsky Algorithm. Comput. Linguist. 30, 3 (Sep. 2004), 365-395.
- Background: The original paper Yarowsky, D. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting on Association For Computational Linguistics (Cambridge, Massachusetts, June 26 - 30, 1995). Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 189-196.
- Background: A good overview of the problem area Ide, N. and Véronis, J. 1998. Introduction to the special issue on word sense disambiguation: the state of the art. Comput. Linguist. 24, 1 (Mar. 1998), 2-40.
- 17 November 2006
- Boss: Berk Kapicioglu
- Paper: Weighted One-Against-All
- Nostalgia: Rifkin and Klautau. "In Defense of One-Vs-All Classification." Journal of Machine Learning Research, Volume 5, pp. 101-141, 2004.
- 1 December 2006
- Leader: Jonathan Chang
- Paper: Snow, Jurafsky, and Ng. "Semantic Taxonomy Induction from Heterogenous Evidence." Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 801-808, July 2006.
- Some useful (perhaps) background on the technique: Snow, Jurafsky, and Ng. "Learning Syntactic Patterns for Automatic Hypernym Discovery."
- Background on what they're trying to do: Miller. "WordNet: A Lexical Database for English." Communications of the ACM, Volume 38, Issue 11 (November 1995), Pages 39-41.
- 15 December 2006
Proposed Topics and Papers
Please add further topics, suggest papers for particular topics, etc. here.
- Lasso
- The original paper: Robert Tibshirani. Regression shrinkage and selection via the Lasso. J. R. Statist. Soc. B 58, 1995.
- The Lasso Page
- Generalization properties:
- Sparse approximation:
- David Donoho. For most large underdetermined systems of linear equations, the minimal l1-norm near-solution approximates the sparsest near-solution, Tech. Report, August 2004.
- David Donoho. For most large underdetermined systems of linear equations, the minimal l1-norm solution is also the sparsest solution, Tech. Report, September 2004.
- Model selection: P. Buhlmann and B. Yu. Boosting, Model Selection, Lasso and Nonnegative Garotte. Tech. Report, 2005.
- Relatives:
- RODEO
- Least angle regression
- Hui Zou and Trevor Hastie. Regularization and Variable Selection via the Elastic Net. J. R. Statist. Soc. B, 2005 + Addendum
- Su-In Lee, Honglak Lee, Pieter Abbeel and Andrew Y. Ng. Efficient L1 regularized logistic regression. In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-06), 2006.
- Optimization
- Language Applications
- Music
- Markov Decision Processes (MDPs) and Reinforcement Learning
- State-space abstraction/aggregation in MDPs
- The E^3 algorithm (Kearns and Singh)
- Inverse reinforcement learning
- Active Learning
- Deep Neural Networks
- Factor Graphs
Algorithms that must deal with complicated global functions of many variables often exploit the manner in which the given functions factor as a product of "local" functions, each of which depends on a subset of the variables.- Tutorial: Factor Graphs and the Sum-Product Algorithm, F.R. Kschischang, B. Frey, H-A Loelinger, IEEE Transactions on Information Theory, Vol 47, No 2, Feb 2001.
- Application: Physical Network Models, C-H Yeang, T Ideker, T Jaakkola, Journal of Computational Biology, Vol 11, No 2-3, 2004
- Cross-validation
- Kohavi, R. 1995. "A study of cross-validation and bootstrap for accuracy estimation and model selection." Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI).
- "Do 10-fold CV instead of leave-one-out," at least when doing model selection
- Rivals, I., and L. Personnaz. 1999. "On cross-validation for model selection." Neural Computation 11 (4).
- "Do statistical tests instead of leave-one-out," at least when doing model selection
- Elisseeff, A., and M. Pontil. 2003. "Leave-one-out error and stability of learning algorithms with applications." Advances in Learning Theory: Methods, Models and Applications, NATO Science Series III: Computer and Systems Sciences, Vol. 190, J. Suykens et al. Eds.
- Reasons about sufficient conditions for LOO-CV error to approach generalization error for fixed algorithms (including kernel methods). A more theoretical (and maybe less practical) paper.
- see also Evgeniou, T., M. Pontil, and A. Elisseeff. 2004. "Leave-one-out error, stability, and generalization of voting combinations of classifiers." Machine Learning 55:(1): 71-97. (I haven't read this but it's very related.)
- Kohavi, R. 1995. "A study of cross-validation and bootstrap for accuracy estimation and model selection." Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI).
Participants
(Participants, please add your name to the list below.)
Faculty
- David Blei
- Rob Schapire
PostDocs
- Edo Airoldi, LSI & CS
- Florian Markowetz, LSI
Students
- Indraneel Mukherjee, CS
- Jordan Boyd-Graber, CS
- Joseph Calandrino, CS
- Melissa Carroll, CS
- Jonathan Chang, EE
- Miroslav Dudik, CS
- Rebecca Fiebrink, CS
- Berk Kapicioglu, CS
- Umar Syed, CS
- Chong Wang, CS
- Sina Jafarpour, CS
- Sean Gerrish, CS
- Richard Socher, CS