A Fast Randomized Method for Local Density-based Outlier Detection in High Dimensional Data

Show full item record

Please use this identifier to cite or link to this item: http://hdl.handle.net/1853/35004

Title: A Fast Randomized Method for Local Density-based Outlier Detection in High Dimensional Data
Author: Nguyen, Minh Quoc ; Omiecinski, Edward ; Mark, Leo
Abstract: Local density-based outlier (LOF) is a useful method to detect outliers because of its model free and locally based property. However, the method is very slow for high dimensional datasets. In this paper, we introduce a randomization method that can computer LOF very efficiently for high dimensional datasets. Based on a consistency property of outliers, random points are selected to partition a data set to compute outlier candidates locally. Since the probability of a point to be isolated from its neighbors is small, we apply multiple iterations with random partitions to prune false outliers. The experiments on a variety of real and synthetic datasets show that the randomization is effective in computing LOF. The experiments also show that our method can compute LOF very efficiently with very high dimensional data.
Description: Research area: Databases Research topic: Data Mining
Type: Technical Report
URI: http://hdl.handle.net/1853/35004
Date: 2010
Contributor: Georgia Institute of Technology. College of Computing
Relation: CC Technical Report; GT-CS-10-09
Publisher: Georgia Institute of Technology
Subject: Data mining
Randomization
Outliers

All materials in SMARTech are protected under U.S. Copyright Law and all rights are reserved, unless otherwise specifically indicated on or in the materials.

Files in this item

Files Size Format View
GT-CS-10-09.pdf 278.8Kb PDF View/ Open

This item appears in the following Collection(s)

Show full item record