Abstract

Artificial Intelligence and Information Systems Seminar

Nathan Srebro

More Data Less Work: SVM Training in Time Decreasing with Larger Data Sets

Traditional runtime analysis of training Support Vector Machines, and indeed most learning methods, shows how the training runtime increases as more training examples are available. Considering the true objective of training, which is to obtain a good predictor, I will argue that training time should be studied as a decreasing function of training set size. I will then present both theoretical and empirical results demonstrating how a simple stochastic subgradient descent approach for training SVMs indeed displays such monotonic decreasing behavior.

I will also discuss a similar phenomena in the context of Gaussian mixture clustering, where it appears that excess data turns the problem from computationally intractable to computationally tractable.

Joint work with Shai Shalev-Shwartz, Yoram Singer, Greg Shakhnarovich and Sam Roweis.