Modifying sparse coding to model imbalanced datasets
Whitaker, Bradley M.
MetadataShow full item record
The objective of this research is to explore the use of sparse coding as a tool for unsupervised feature learning to more effectively model imbalanced datasets. Traditional sparse coding dictionaries are learned by minimizing the average approximation error between a vector and its sparse decomposition. As such, these dictionaries may overlook important features that occur infrequently in the data. Without these features, it may be difficult to accurately classify between classes if one or more classes are not well-represented in the training data. To overcome this problem, this work explores novel modifications to the sparse coding dictionary learning framework that encourage dictionaries to learn anomalous features. Sparse coding also inherently assumes that a vector can be represented as a sparse linear combination of a feature set. This work addresses the ability of sparse coding to learn a representative dictionary when the underlying data has a nonlinear sparse structure. Finally, this work illustrates one benefit of improved signal modeling by utilizing sparse coding in three imbalanced classification tasks.