Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy

Zhang, Ji (2008) Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy. PhD thesis, Dalhousie University, Canada.


Download (1463Kb)


[Abstract]: Outlier detection is an important research problem in data mining that aims to discover useful abnormal and irregular patterns hidden in large data sets. Most existing outlier detection methods only deal with static data with relatively low dimensionality.
Recently, outlier detection for high-dimensional stream data became a new emerging research problem. A key observation that motivates this research is that outliers
in high-dimensional data are projected outliers, i.e., they are embedded in lower-dimensional subspaces. Detecting projected outliers from high-dimensional stream
data is a very challenging task for several reasons. First, detecting projected outliers is difficult even for high-dimensional static data. The exhaustive search for the out-lying subspaces where projected outliers are embedded is a NP problem. Second, the algorithms for handling data streams are constrained to take only one pass to process the streaming data with the conditions of space limitation and time criticality. The currently existing methods for outlier detection are found to be ineffective for detecting projected outliers in high-dimensional data streams.

In this thesis, we present a new technique, called the Stream Project Outlier deTector (SPOT), which attempts to detect projected outliers in high-dimensional
data streams. SPOT employs an innovative window-based time model in capturing dynamic statistics from stream data, and a novel data structure containing a set of
top sparse subspaces to detect projected outliers effectively. SPOT also employs a multi-objective genetic algorithm as an effective search method for finding the
outlying subspaces where most projected outliers are embedded. The experimental results demonstrate that SPOT is efficient and effective in detecting projected outliers
for high-dimensional data streams. The main contribution of this thesis is that it provides a backbone in tackling the challenging problem of outlier detection for high-
dimensional data streams. SPOT can facilitate the discovery of useful abnormal patterns and can be potentially applied to a variety of high demand applications, such as for sensor network data monitoring, online transaction protection, etc.

Statistics for USQ ePrint 5645
Statistics for this ePrint Item
Item Type: Thesis (Non-Research) (PhD)
Item Status: Live Archive
Additional Information (displayed to public): Doctor of Philosopyh (PhD) thesis, Dalhousie University, Canada.
Depositing User: Dr Ji Zhang
Faculty / Department / School: Historic - Faculty of Sciences - Department of Maths and Computing
Date Deposited: 08 Sep 2009 02:36
Last Modified: 02 Jul 2013 23:23
Uncontrolled Keywords: Stream Projected Outlier deTector; SPOT; outlier detection
Fields of Research (FoR): 08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080109 Pattern Recognition and Data Mining

Actions (login required)

View Item Archive Repository Staff Only