Difference between revisions of "MLReadingGroup"
(→Schedule (Spring 2008)) |
(→Schedule (Spring 2008)) |
||
Line 21: | Line 21: | ||
** Leader: Jordan Boyd-Graber | ** Leader: Jordan Boyd-Graber | ||
** Paper: [http://books.nips.cc/papers/files/nips20/NIPS2007_0964.pdf Toutanova, Kristina and Johnson, Mark. A Bayesian LDA-based model for semi-supervised part-of-speech tagging. (2007)] | ** Paper: [http://books.nips.cc/papers/files/nips20/NIPS2007_0964.pdf Toutanova, Kristina and Johnson, Mark. A Bayesian LDA-based model for semi-supervised part-of-speech tagging. (2007)] | ||
+ | ** Paper: [http://portal.acm.org/citation.cfm?id=1219884 (If there's time, an additional paper)] | ||
Revision as of 13:46, 3 March 2008
Machine Learning Reading Group
Welcome to the wiki of the machine learning reading group.
Contents
Mailing list
We maintain an announcement/discussion list for the reading group. You may sign up for the list here.
Schedule (Spring 2008)
Our weekly meetings are Tue 1:00-2:30pm in the AI lab on the 4th floor of the CS building (CS 431).
Schedule of topics:
- 04 March 2008
- Leader: Jordan Boyd-Graber
- Paper: Toutanova, Kristina and Johnson, Mark. A Bayesian LDA-based model for semi-supervised part-of-speech tagging. (2007)
- Paper: (If there's time, an additional paper)
- 26 February 2008
- Leader: Berk Kapicioglu
- Paper: Gilles Celeux, Didier Chauveau, Jean Diebolt. "On Stochastic Versions of the EM Algorithm.", 1995
Schedule (Fall 2007)
Our weekly meetings are Wed 4:00-5:30pm in the AI lab on the 4th floor of the CS building (CS 431).
Schedule of topics:
- 12 December 2007
- Leader: Zafer Barutcuoglu
- Paper: G.E. Hinton, S. Osindero, Y.W. Teh. "A Fast Learning Algorithm for Deep Belief Nets." Neural Computation, 2006.
- More empirical results: Bengio et al. "Greedy Layer-wise Training of Deep Networks." NIPS, 2006.
- 28 November 2007
- Leader: Umar Syed
- Paper: Umar Syed and Robert E. Schapire. "A game-theoretic approach to apprenticeship learning", NIPS (2008).
- Background reading: The work is based on Yoav Freund and Robert E. Schapire, "Game theory, on-line prediction, and boosting", COLT (1996) (see Section 2 and the Appendix).
- 14 November 2007
- Leader: Melissa Carroll
- Paper: Hui Zou and Trevor Hastie. "Regularization and variable selection via the elastic net." (2005) J. R. Statist. Soc. B, 67(2), pp. 301–320.
- Background reading: It may be helpful to read up on or review LASSO and LARS. See the LASSO Page
- 24 October 2007
- Patrón: Berk Kapicioglu
- Paper: Joshua Goodman. "Exponential Priors for Maximum Entropy Models." North American ACL 2004.
Additional topics:
- relational network models
- DP + parse trees
- online learning
- semi-supervised learning
- stochastic gradient
- convex optimizing
- parallel learning
- game theory
Schedule (Spring 2007)
Our weekly meetings are Thu 1:30-3:00pm in the AI lab on the 4th floor of the CS building (CS 431).
Schedule of topics:
- 29 March 2007
- Leader: Indraneel Mukherjee
- Paper: Large Margin Hidden Markov Models for Automatic Speech Recognition
- 15 March 2007
- Leader: Jordan Boyd-Graber
- Paper 1: Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization (2004)
- Paper 2: Unsupervised topic modelling for multi-party spoken discourse (2006)
- 8 March 2007
- 1 March 2007
- Leader: Miro Dudik
- Paper: Shai Shalev-Shwartz and Yoram Singer. "Convex Repeated Games and Fenchel Duality."
- See also a more recent version from NIPS 2006. It contains more references and the math is slightly different; e.g., it introduces strong convexity relative to a norm. I find it a little bit too condensed and more difficult to read.
- 22 February 2007
- Leader: Melissa Carroll
- Paper: R.A. Hutchinson, T. Mitchell, I. Rustandi. "Hidden Process Models." ICML 2006.
- Background on fMRI classification application: T. Mitchell, R. Hutchinson, R. Niculescu, F. Pereira, X. Wang. "Learning to Decode Cognitive States from Brain Images." Machine Learning, 57, 145–175, 2004.
- 15 February 2007
- 8 February 2007
- Leader: Joe Calandrino
- Paper: I. Dinur, K. Nissim. "Revealing Information while Preserving Privacy." PODS 2003.
- inverse RL (Umar)
- disc/gen approaches (Bishop...)
- active learning (BK)
- hidden process models (MC)
- Gaussian processes (Z)
- Dirichlet processes
- Semisupervised learning (Florian)
- The use of unlabelled data in predictive modelling, Liang F, Mukherjee S and West M, Statistical Science, to appear.
- some paper by Lafferty and Wasserman ?
- Quantam neural networks (Vaneet)
- Manifold learning (Z)
- On-line learning (Berk)
- Music stuff/transcription (R)
- Variational methods (JC)
- Random projections (Charikar)
Schedule (Fall 2006)
Our weekly meetings are Fridays, 3pm to 5pm, in CS 402.
Scheduled readings:
- 5 October 2006 THURSDAY 4:30PM this week
- Leader: Zafer
- Paper: Horst, R., Thoai, N. V. 1999. DC Programming: Overview. J. Optim. Theory Appl.
- (If you need a review: Hindi, H. 2004. A Tutorial on Convex Optimization. American Control Conference.)
- (Application for the interested: Argyriou A. et al. 2006. A DC-Programming Algorithm for Kernel Selection. ICML.)
- And another application Ellis, S. and Nayakkankuppam, V. Phylogenetic Analysis Via DC Programming .
- 13 October 2006
- Leader: Miro
- Papers:
- Paper:
- 20 October 2006
- Leader: Florian
- Paper: N Meinshausen and P Bühlmann, High Dimensional Graphs and Variable Selection With the Lasso, Annals of Statistics 34(3), 1436-1462
- Background reading: D Heckerman, DM Chickering, C Meek, R Rounthwaite, C Kadie, Dependency Networks for Inference, Collaborative Filtering, and Data Visualization, JMLR, 1(Oct):49-75, 2000
- 27 October 2006
- Leader:
- Paper:
- 3 November 2006
- Leader:
- Paper:
- 10 November 2006
- "Leader": Jordan
- Paper: Understanding the Yarowsky Algorithm Abney, S. 2004. Understanding the Yarowsky Algorithm. Comput. Linguist. 30, 3 (Sep. 2004), 365-395.
- Background: The original paper Yarowsky, D. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting on Association For Computational Linguistics (Cambridge, Massachusetts, June 26 - 30, 1995). Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 189-196.
- Background: A good overview of the problem area Ide, N. and Véronis, J. 1998. Introduction to the special issue on word sense disambiguation: the state of the art. Comput. Linguist. 24, 1 (Mar. 1998), 2-40.
- 17 November 2006
- Boss: Berk Kapicioglu
- Paper: Weighted One-Against-All
- Nostalgia: Rifkin and Klautau. "In Defense of One-Vs-All Classification." Journal of Machine Learning Research, Volume 5, pp. 101-141, 2004.
- 1 December 2006
- Leader: Jonathan Chang
- Paper: Snow, Jurafsky, and Ng. "Semantic Taxonomy Induction from Heterogenous Evidence." Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 801-808, July 2006.
- Some useful (perhaps) background on the technique: Snow, Jurafsky, and Ng. "Learning Syntactic Patterns for Automatic Hypernym Discovery."
- Background on what they're trying to do: Miller. "WordNet: A Lexical Database for English." Communications of the ACM, Volume 38, Issue 11 (November 1995), Pages 39-41.
- 15 December 2006
Proposed Topics and Papers
Please add further topics, suggest papers for particular topics, etc. here.
- Lasso
- The original paper: Robert Tibshirani. Regression shrinkage and selection via the Lasso. J. R. Statist. Soc. B 58, 1995.
- The Lasso Page
- Generalization properties:
- Sparse approximation:
- David Donoho. For most large underdetermined systems of linear equations, the minimal l1-norm near-solution approximates the sparsest near-solution, Tech. Report, August 2004.
- David Donoho. For most large underdetermined systems of linear equations, the minimal l1-norm solution is also the sparsest solution, Tech. Report, September 2004.
- Model selection: P. Buhlmann and B. Yu. Boosting, Model Selection, Lasso and Nonnegative Garotte. Tech. Report, 2005.
- Relatives:
- RODEO
- Least angle regression
- Hui Zou and Trevor Hastie. Regularization and Variable Selection via the Elastic Net. J. R. Statist. Soc. B, 2005 + Addendum
- Su-In Lee, Honglak Lee, Pieter Abbeel and Andrew Y. Ng. Efficient L1 regularized logistic regression. In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-06), 2006.
- Optimization
- Language Applications
- Music
- Markov Decision Processes (MDPs) and Reinforcement Learning
- State-space abstraction/aggregation in MDPs
- The E^3 algorithm (Kearns and Singh)
- Inverse reinforcement learning
- Active Learning
- Deep Neural Networks
- Factor Graphs
Algorithms that must deal with complicated global functions of many variables often exploit the manner in which the given functions factor as a product of "local" functions, each of which depends on a subset of the variables.- Tutorial: Factor Graphs and the Sum-Product Algorithm, F.R. Kschischang, B. Frey, H-A Loelinger, IEEE Transactions on Information Theory, Vol 47, No 2, Feb 2001.
- Application: Physical Network Models, C-H Yeang, T Ideker, T Jaakkola, Journal of Computational Biology, Vol 11, No 2-3, 2004
- Cross-validation
- Kohavi, R. 1995. "A study of cross-validation and bootstrap for accuracy estimation and model selection." Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI).
- "Do 10-fold CV instead of leave-one-out," at least when doing model selection
- Rivals, I., and L. Personnaz. 1999. "On cross-validation for model selection." Neural Computation 11 (4).
- "Do statistical tests instead of leave-one-out," at least when doing model selection
- Elisseeff, A., and M. Pontil. 2003. "Leave-one-out error and stability of learning algorithms with applications." Advances in Learning Theory: Methods, Models and Applications, NATO Science Series III: Computer and Systems Sciences, Vol. 190, J. Suykens et al. Eds.
- Reasons about sufficient conditions for LOO-CV error to approach generalization error for fixed algorithms (including kernel methods). A more theoretical (and maybe less practical) paper.
- see also Evgeniou, T., M. Pontil, and A. Elisseeff. 2004. "Leave-one-out error, stability, and generalization of voting combinations of classifiers." Machine Learning 55:(1): 71-97. (I haven't read this but it's very related.)
- Kohavi, R. 1995. "A study of cross-validation and bootstrap for accuracy estimation and model selection." Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI).
Participants
(Participants, please add your name to the list below.)
Faculty
- David Blei
- Rob Schapire
PostDocs
- Edo Airoldi, LSI & CS
- Florian Markowetz, LSI
Students
- Indraneel Mukherjee, CS
- Zafer Barutcuoglu, CS
- Jordan Boyd-Graber, CS
- Joseph Calandrino, CS
- Melissa Carroll, CS
- Jonathan Chang, EE
- Miroslav Dudik, CS
- Rebecca Fiebrink, CS
- Berk Kapicioglu, CS
- Umar Syed, CS