SVDD Variants for Anomaly Detection with Implementations using Hadoop & Spark

Rekha, A G

dc.contributor.author	Rekha, A G
dc.date.accessioned	2022-12-16T07:47:11Z
dc.date.available	2022-12-16T07:47:11Z
dc.date.issued	2016
dc.identifier.uri	http://dspace.iimk.ac.in:80/xmlui/handle/2259/1082
dc.description	Research Advisory Committee: Prof. Mohammed Shahid Abdulla (Chair-person), Prof. Asharaf S (Member), Prof. Saji Gopinath (Member):: Hardcopy of the thesis is available in the library. Please contact the help desk for reference.	en_US
dc.description.abstract	Big data analytics facilitates better informed business decisions through the analysis of large data sets that remain unexploited by traditional business intelligence systems. ‘Big Data’ as input enhances the inferential power of established algorithms, but it challenges even the state-of-the-art computation and analysis methods. Though machine learning is a solution to overcome these problems, its current techniques have to be improved to deal with the Big Data. Another drawback of big data analytics is the greater focus on aggregates over outliers. However, in many situations the insights gathered from outliers could be of more significance. In light of this, the focus of this work is on developing machine learning techniques to make outlier detection practical on large business datasets. For over a decade, Support Vector Data Description (SVDD) technique has shown good predictive accuracy on a wide range of outlier detection tasks. It has been adapted to numerous business problems also. Inspired by this trend, this thesis explores the scalability problems associated with SVDD and tries to address it. Three approaches, namely, LT-SVDD, ELT-SVDD, and PELT- SVDD have been proposed. The feasibility of these methods was assessed using a set of experiments on synthetic as well as benchmark data sets; many of these with an order-of- magnitude advantage in terms of running time. The application of these methods to three real world business problems is also demonstrated. This work contributes to the support vector literature by establishing these methods as efficient for outlier detection on large data sets.	en_US
dc.language.iso	en	en_US
dc.publisher	Indian Institute of Management Kozhikode	en_US
dc.subject	Anomaly Detection	en_US
dc.subject	Big Data	en_US
dc.subject	Support Vector Data Description	en_US
dc.title	SVDD Variants for Anomaly Detection with Implementations using Hadoop & Spark	en_US
dc.type	Thesis	en_US