Attack Resilient Resource Placement in Cloud Computing System and Power Grid

Hanoch Levy; Eli Brosh (Canary Connect); Gil Zussman (Columbia University)​


Distributed data centers (that provide cloud based services) and power grids are key infrastructure systems whose resilience to cyber (and physical) attacks is of utmost importance. In particular, failures in these systems can have devastating impacts on various interdependent military and civilian systems (e.g., communications, gas and water supply, and transportation). Hence, in this project, we will focus on resource allocation in cloud computing and power grids that accounts for failures resulting from attacks and for highly variable demands.

Resource allocation schemes for these geographically distributed systems should support mitigating the impacts of potential cyber attacks while maintaining the required level of service during regular operation. However, designing such schemes poses major challenges due to the high-dimensionality of the problems and the special characteristics of the flows in power grids. Addressing these challenges requires an interdisciplinary approach that employs methods and techniques from various areas, including stochastic control, power flow optimization, and algorithm design. Specifically, we will consider the general problem of resource allocation in a geographically distributed system, where resources have to be allocated for m types of services in n geographical locations. The allocation is based on a known stochastic demand for the services (mxn dimensional) and on the costs of providing the services from different locations. Under this general setting, we will address the two following problems: Cloud services under attack and Power grids under attack

We will extend the methodology we previously developed that provided very efficient solutions to a wide variety of these problems in non-hostile environments, and devise algorithmic solutions which will provide resource placement strategies that will be efficient/optimal with respect to malicious environments. We will build on this methodology and tailor it to the special challenges posed by hostile environments and power grids. In the context of cloud computing, we will capture the volatility of the resources due to attacks by modeling the resources, namely the  variables, as random variables, whose value depends on the number of resources the designer placed in the i-th site as well as on the probability that they fail (due to attacks). Since in our previous work, the  variables were deterministic, this will require a significant generalization of the model and the analysis approach using tools from stochastic analysis, optimization, and graph algorithms. We expect the analysis to reveal the number of resources, the types of resources, and their locations, such that resilient service is provided, while taking into account the cost and performance of services in regular operation. This analysis will provide insight into the tradeoffs between resilience to attacks, level of service in regular operation, and cost.

A very important variant of this problem arises when there is a need to accommodate mutually hostile resources. This need arises when security-aware clients require that their resources are (physically or logically) isolated from other resources (e.g., commercial or government entities concerned with data leakage between cloud tenants and espionage on their data, and defense or public safety organizations that need to separate confidential and non-confidential services). The service provider can, for example, grant secure service using geographic isolation (i.e., place the services of mutually-hostile organizations in separate data centers). Such separation, however, will inflict operational costs. These costs can be incorporated in our framework, where remote service costs more than local service (see toy problem). We plan to use our methodology to develop optimal and approximate attack resilient placement algorithms that satisfy the separation requirements.

In the context of the power grid, we will focus on cyber attacks that have a physical impact (e.g., shutting down a generator or faulting a power line). We will study the design problem of placing resources (e.g., generators and additional power lines) in a manner that can provide attack resiliency. This will require combining the methodology, described above, that takes into account stochastic supply (due to failures) with the DC approximation of the power flow [1] that allows evaluating the effects of changes in supply and demand. To better understand the design problem, we will also study the cascade control problem in which there is a need to halt a cascade that is initiated by an attack on some of the allocated resources. Finally, we plan to develop resource allocation algorithms that take into account the dependency between the grid and the cloud, where due to an attack on the grid and loss of power, cloud resources become unavailable.

Tel Aviv University, P.O. Box 39040, Tel Aviv 6997801, Israel
UI/UX Basch_Interactive