Anomaly Detection for Critical Infrastructure Protection: Second Generation
Several factors make anomaly detection in high dimensional big data (HDBD) a challenging task: learning HDBD distributions, the boundary between normal and abnormal behavior is sometimes vague, many scenarios exhibit data that evolve in time which means that what is currently considered as a normal behavior might be abnormal in future and vice versa and there is a need to employ many different domain experts. This may cause high false alarms rate. In this proposal, we focus on an automatic and unsupervised anomaly detection in an unstructured HDBD that do not necessitate domain expertise, signatures, rules, patterns or semantics understanding of the features and propose several new methodologies for anomaly detection for protecting critical infrastructures. Anomalies can originate from either a cyber-attack/threat or operational malfunction, or both. The proposal shows that those can be detected simultaneously even though the data sources leveraged for each case can be entirely distinct. We also show that cyber threat and operational malfunction are converging into a single detection paradigm.
Why there is a problem: The basic approach in securing critical infrastructures in the past 45 years, classified as "walls and gates", has failed.
The primary goal of this proposal is to develop methodologies (theories, algorithms, software and systems) to detect anomalies in an unstructured HDBD, which can be the underlying signs of malware, zero day attacks or operational malfunctions (or both), that can impact critical infrastructure. This will be accomplished from our understanding massive amounts of data by designing unsupervised learning algorithms, that “understand/quantify” and model complex topics/contexts that extract critical intelligence from data to uncover unprecedented unknown unknowns (anomalies = threats, operational malfunction, trends). This proposal can be considered as part of the Industrial Internet initiative, which is a subset of Internet of Things. HDBD can be described by hundreds or even thousands of parameters (features). Anomaly detection identifies patterns in a given HDBD that do not conform to an established/expected normal behavior baseline. The detected patterns, which deviate from normality, are called "anomalies".
We propose a methodology blending tools from multidisciplinary approaches such as applied and computational harmonic analysis, stochastic processing (random walk, Brownian motion), randomized algorithms, differential geometry, classical analysis, geometric measure theory, manifold learning, low rank matrix decomposition, spectral graphs, kernel methods and dictionary constructions that are versatile to process efficiently HDBD. The goal is to turn data into quantitative knowledge. The availability of massive data is a huge opportunity for us since we can understand, process, manipulate and extract actionable intelligence from it.
The proposed algorithms are generic and the same core underlying infrastructure can be used to perform a general anomaly detection for various tasks such as performance monitoring and analysis, unified threat manager for network health, smart phone protection, risk management in diverse financial transactions, fraud detection, prediction and tracking of emerging problems and problem avoidance.
The research builds upon our First Generation anomaly detection methodologies, which were developed in the last 7 years, have used diffusion geometry of HDBD for manifold learning to detect cyber based anomalies in structured HDBD, and published in 26 papers.