Graphical Models for the Internet
Abstract
In this talk I will present algorithms for performing large scale inference using Latent Dirichlet Allocation and a novel Cluster-Topic model to estimate user preferences and to group stories into coherent, topically consistent storylines. I will discuss both the statistical modeling challenges involved and the very large scale implementation of such models which allows us to perform estimation on over 50 million users on a Hadoop cluster.