Coordinate Sampling for Sublinear Optimization and Nearest Neighbor Search
Clarkson, Kenneth L.
MetadataShow full item record
I will describe randomized approximation algorithms for some classical problems of machine learning, where the algorithms have provable bounds that hold with high probability. Some of our algorithms are sublinear, that is, they do not need to touch all the data. Specifically, for a set of points a[subscript 1]...a[subscript n] in d dimensions, we show that finding a d-vector x that approximately maximizes the margin min[subscript i] a[subscript i dot x can be done in O(n+d)/epsilon[superscript 2] time, up to logarithmic factors, where epsilon>0 is an additive approximation parameter. This was joint work with Elad Hazan and David Woodruff. A key step in these algorithms is the use of coordinate sampling to estimate dot products. This simple technique can be an effective alternative to random projection sketching in some settings. I will discuss the potential of coordinate sampling for speeding up some data structures for nearest neighbor searching in the Euclidean setting, via fast approximate distance evaluations.