ORCHESTRATING BULK DATA TRANSFER ACROSS GEO-DISTRIBUTED DATA CENTRES

ABSTRACT

A challenge
arises on how to schedule the bulk data transfers at different urgency levels,
in order to fully utilize the available inter-datacenter bandwidth. The
Software Defined Networking (SDN) paradigm has emerged recently which decouples
the control plane from the data paths, enabling potential global optimization
of data routing in a network. It provides dynamic, highly efficient bulk data
transfer service in a geo-distributed datacenter system.Instead of treating
each transfer as an infinite flow, which can be temporarily stored at
intermediate datacenters to mitigate bandwidth contention with more urgent
transfers.In this model we use three algorithms, namely bandwidth-reserving algorithmdynamically-adjusting
algorithm, future-demand-friendly algorithm.We build an SDN system based on the
Beacon platform and Open Flow APIs, and carefully engineer our bulk data
transfer algorithms in the system. Extensive real-world experiments are carried
out to compare the three algorithms as well as those from the existing
literature, in terms of routing optimality, computational delay and overhead.

INTRODUCTION

This
paper proposes a novel optimization model for dynamic, highly efficient
scheduling of bulk data transfers in a geo-distributed datacenter system, and
engineers its design and solution algorithms practically within OpenFlowbased
SDN architecture. We model data transfer requests as delay tolerant data
migration tasks with different finishing deadlines. Thanks to the flexibility
of transmission scheduling provided by SDN, we enable dynamic, optimal routing
of distinct chunks within each bulk data transfer which can be temporarily
stored at intermediate datacenters and transmitted only at carefully scheduled
times, to mitigate bandwidth contention among tasks of different urgency
levels. Our contributions are summarized as follows. First, we formulate the
bulk data transfer problem into a novel, optimal chunk routing problem, which
maximizes the aggregate utility gain due to timely transfer completions before
the specified deadlines. Such an optimization model enables flexible, dynamic
adjustment of chunk transfer schedules in a system with dynamically-arriving
data transfer requests, which is impossible with a popularly-modeled flow-based
optimal routing model. Second, we discuss three dynamic algorithms to solve the
optimal chunk routing problem, namely a bandwidth reserving algorithm, a
dynamically-adjusting algorithm, and a future-demand-friendly algorithm. These
solutions are targeting at different levels of optimality and computational
complexity. Third, carefully engineer our bulk data transfer algorithms in the
system. Extensive realworld experiments with real network traffic are carried
out to compare the three algorithms as well as those in the existing
literature, in terms of routing optimality, computational delay and overhead.

 

Literature Review:

1. Wahlroos, Mikko, et
al. “Future views on waste heat utilization–Case of data centers in
Northern Europe.” Renewable and Sustainable Energy ,2018.

In this
study the potential for data center waste heat utilization was analyzed in the
Nordic countries. An overview of upcoming data center projects where waste heat
is utilized is presented. Especially in Finland data center operators are
planning to reuse waste heat in district heating. However, business models
between the district heating network operator and data center operator are
often not transparent. The implications of economics and emissions on waste
heat utilization in district heating were analyzed through life cycle
assessment.Currently the biggest barriers for utilizing waste heat are the low
quality of waste heat (e.g. low temperature orunstable source of heat) and high
investment costs. A systematic 8-step change process was suggested to ensure success
in changing the priority of waste heat utilization in the data center and district
heating market. Relevant energy efficiency
metrics were introduced to support rational decision-making in the reuse of
waste heat.Economic calculations showed that the investment payback time is
under the estimated lifetime of the heatpump equipment, when waste heat was
utilized in district heating. However, the environmental impact of waste heat
utilization depends on the fuel, which waste heat replaces.

 

2. He, Long, Zhiwei Tony Qin, and Jagtej Bewli.
“Low-Rank Tensor Recovery for Geo-Demand Estimation in Online
Retailing.” Procedia Computer Science ,2015.

In
this the National retailers often rely on past sales data in their inventory
allocation decisions where the understanding of the item-location-time specific
demand (geo-demand) distributions is crucial.However, in many cases, errors and
sparsity of the geo-demand data undermine the quality of data-driven decisions.
It is thus important to recover the missing entries and identify errors. We organize
the geo-demand data as a tensor in item, zone and time dimensions with a
significant amount of missing entries. The problem is formulated as a robust
low-rank tensor recovery problem in a convex optimization framework. We further
propose a tailored optimization algorithm based on the alternating direction
augmented Lagrangian method. By tests on synthetic data, the recovery
performance and algorithm convergence are verified. Lastly, we demonstrate the
framework with a real set of sales data from a major online retailer and
investigate the effectiveness of the optimization framework both quantitatively
and qualitatively.

 

3. Subbiah, Sankari, et al. “Energy
efficient big data infrastructure management in geo-federated cloud data
centers.” Procedia Computer Science ,2015.

The hot-tempered development of  hassle on big data processing make obligatory
an intense load on computation, storage and networking in data centres. We
suggested an approach of data centre node clustering for an efficient data
placement and data retrieval which is unlike the routine in centralised
architecture. The main objective for the proposed system is the shortcomings present
in the conventional centralised server which is mainly the assumption that a
single head is in the connectivity range of all other nodes. We proposed Hit
Rate Geographical Locations Analysis Algorithm (HIRGLAA) for the dynamic
election of cluster

head based on the periodic hit rate
analysis performance. We suggested candidate cluster heads containing redundant
routing information to ensure data storage backup. Thus the proposed system
assures Quality of Services (QoS) such as increased reliability, robustness, an
energy efficient remote access and its efficiency can be validated by extensive
simulation based studies.

 

4. Teli, Prasad, Manoj V. Thomas, and K.
Chandrasekaran. “Big data migration between data centers in online cloud
environment.” Procedia Technolog,2016.

 

Big data has become one of the major areas
of research for cloud service providers. Big data with its characteristics such
as size,complexity etc. requires efficient methods for migration from one
location to the geographically distant other location. Also,processing the big
data located at different geographically distributed data centers using
MapReduce like frameworks consume a

lot of  bandwidth. One of the solutions for reducing
the cost of processing such geographically distributed big data is data aggregation.
In this paper, we propose an online algorithm to find out optimal cost data
aggregation site among the

geographically distributed data centers.
This proposed approach gives an optimal cost solution for the data aggregation
from different geographically distributed data centers which can be efficiently
processed at a single site using distributed frameworks.We propose a graph
model of Geo-distributed data centers. Results are obtained in the online cloud
environment, which show that proposed approach gives better results.

 

5. Hara, Yusuke. “Behaviour analysis using
tweet data and geo-tag data in a natural disaster.” Transportation
Research Procedia ,2015.

 

This paper clarifies
the factors that resulted in commuters being unable to return home and
commuters’ returning-home decisionmaking process at the time of the Great East
Japan Earthquake using Twitter data. First, to extract the behavioural data
from the tweet data, we identify each user’s returning-home behaviour using
support vector machines. Second, we create nonverbal explanatory factors using
geo-tag data and verbal explanatory factors using tweet data. Following this,
we model users’ returning-home decision-making using a discrete choice model
and clarify the factors quantitatively. Finally, we show the usefulness and the
challenges of social media data for travel behaviour
analysis.