the MaD Seminar
The MaD seminar features leading specialists at the interface of Applied Mathematics, Statistics and Machine Learning.
Room: Auditorium Hall 150, Center for Data Science, NYU, 60 5th ave.
Time: 2:00pm-3:00pm, Reception will follow.
Subscribe to the Seminar Mailing list here
Note special time and location of the first talk by Sebastien Bubeck
Schedule with Confirmed Speakers
Date | Speaker | Title |
---|---|---|
Jan 30, 3:45pm-5:00pm, WWH 1302 | Sebastien Bubeck (Microsoft Research) | k-server and metrical task systems on trees |
Feb 8 | David Rothschild (Microsoft Research) | Public Opinion during the 2020 election |
Feb 15, 3-4pm, CDS 150 | Rachel Ward (UT Austin) | Autotuning the learning rate in stochastic gradient methods |
Feb 22 | Ivan Selesnick (NYU) | |
Mar 1 | Thomas Pock (TU Graz) | |
Mar 8 | Amir Ali Ahmadi (Princeton) | |
Mar 15 | SPRING BREAK | |
Mar 22 | Mahdi Soltanolkotabi (USC) | |
Mar 29 | Alejandro Ribeiro (UPenn) | |
Apr 5 | Justin Romberg (Georgia Tech) | |
Apr 12 | Wotao Yin (UCLA) | |
Apr 19 | Rene Vidal (Johns Hopkins) | |
Apr 26 | Ankur Moitra (MIT) |
Abstracts
Rachel Ward: Autotuning the learning rate in stochastic gradient methods
Choosing a proper learning rate in stochastic gradient methods can be difficult. If certain parameters of the problem are known, .e.g. Lipschitz smoothness or strong convexity parameters, are known a priori, optimal theoretical rates are known. However, in practice, these parameters are not known, and the loss function of interest is not convex, and only locally smooth. Thus, adjusting the learning rate is an important problem – a learning rate that is too small leads to painfully slow convergence, while a learning rate that is too large can cause the loss function to fluctuate around the minimum or even to diverge. Several methods have been proposed in the last few years to adjust the learning rate according to gradient data that is received along the way. We review these methods, and propose a simple method, inspired by reparametrization of the loss function in polar coordinates. We prove that the proposed method achieves optimal convergence rates in batch and stochastic settings, without having to know parameters of the loss function in advance.
David Rothschild: Public Opinion during the 2020 election
Traditional data collection in the multi-billion dollar survey research field utilizes representative samples. It is expensive, slow, inflexible, and its accuracy is unproven; the 2016 election is crushing blow to its reputation (although, it did not do that bad!). Intelligence drawn from surveys of non-representative samples, both self-selected respondents and random, but non-representative respondents, is now cheaper, quicker, flexible, and adequately accurate. Along with cutting-edge data collection and analytics built around non-representative samples and large-scale behavioral data, will transform our understanding of public opinion.
Sebastien Bubeck: k-server and metrical task systems on trees
In the last decade the mirror descent strategy of Nemirovski and Yudin has emerged as a powerful tool for online learning. I will argue that in fact the reach of this technique is far broader than expected, and that it can be useful for online computation problems in general. I will illustrate this on two classical problems in online decision making, the k-server problem and its generalization to metrical task systems. Both problems have long-standing conjectures about the optimal competitive ratio in arbitrary metric spaces, namely O(log(k)) for k-server and O(log(n)) for MTS. We will show that mirror descent, with a certain multiscale entropy regularization, yields respectively O(log^2(k)) and O(log(n)) for a very general class of metric spaces (namely hierarchically separated trees, which in particular implies the same bounds up to an additional log(n) factor for arbitrary metric spaces).
Joint work with Michael B. Cohen, James R. Lee, Yin Tat Lee, and Aleksander Madry.