Yuki Matsui*, Yasuhiro Watashiba, Yoshiyuki Kido,

Architecture of Resource Manager for Software-Defined IT Infrastructure
Yuki Matsui*, Yasuhiro Watashiba**, Yoshiyuki Kido**, Susumu Date** and Shinji Shimojo** * Graduate School of Information Science and Technology, Osaka University, Japan ** Cybermedia Center, Osaka University, Japan (Thank you chair person) I’m Matsui Yuki, from Osaka University, Japan. Today I would like to talk about “Architecture of Resource Manager for Software-Defined IT infrastructure” （ゆっくり）

The disaster management is important.
Natural Disaster The disaster management is important. Tohoku Earthquake Hurricane Harvey Today, the disaster management is important. Some natural disaster brought serious human and material damage to the world, to name a few, Tohoku earthquake and Hurricane Harvey. In order to cope with such damage, rescue activities and relief supplies were needed. , (c)AFP/SOURCE/NASA/RANDY BRESNIK/HANDOUT

Disaster Management In disaster management, multiple group attend
decision making process from different sites. Site C Decision making Site A Decision making In the disaster management , expectations for global collaboration are rising. Multiple organizations participate the disaster management from different sites or countries. Within the each organization, many members are discussing about their topics. At site near the disaster area, decisions will be made on evacuation of the residents. In neighboring countries, dispatch of support units will be considered. Site D Decision making Site B Decision making

decision making process from different sites. Collaboration between sites has become important. Site C Decision making Site A Decision making Then, in disaster management, Collaboration between sites has become important. Because, the disaster management is formed by sharing policies decided at a certain site to other sites. Site D Decision making Site B Decision making

decision making process from different sites. An application to support collaboration between sites is required. Site C Decision making Site A Decision making So, In order to promote cooperation between different sites, an application to support collaboration between sites is required. Site D Decision making Site B Decision making

Information-as-a-Service
An application to support collaboration between sites is required. Information-as-a-Service (InfaaS) is a critical concept for disaster management application. Key factors of application Clarity / Synchronization / Continuation For constructing disaster management application, Information-as-a-Service is a critical concept. InfaaS means that any information is provided in the proper format to the receiver. For achieving Infaas-compliant application, There are three key factors in the disaster management application. Clarity , synchronization, and continuation. The application platform requires a mechanism that can satisfy these three factors.

To understand information easily and quickly.
Key factors (1/3) Clarity To understand information easily and quickly. Since visualization makes us understand various information easily, disaster management application visualize any data. 20**/**/** 34aogha 20**/**/** 78hijwsa 20**/**/** 44j9uoia 20**/**/** 91o2idfh Clarity means to understand information easily and quickly. In the disaster management system, various types of data are exchanged. For example, movie from IoT sensor, whether forecast, and Simulation result of damage range caused by tsunami. ?

To understand information easily and quickly.
Key factors (1/3) Clarity To understand information easily and quickly. Since visualization makes us understand various information easily, disaster management application visualize any data. 20**/**/** 34aogha 20**/**/** 78hijwsa 20**/**/** 44j9uoia 20**/**/** 91o2idfh To make decisions smoothly, It is necessary to understand such information as soon as possible. In general, it is said that visualization allow us to understand information easily and quickly. To suffer this benefit, disaster management application should visualize any data. ?

Key factors (2/3) Synchronization
To share same information among all sites. It leads to collaboration between different sites. Info X Info X Info Y Synchronization means to share same information among different sites. For example, suppose that each site receives a certain information about the disaster. Although this two site receive Information X, other sites receive Y or Z, that is delivered at a different time from X. Info X Info X Info X Info Z Info X

Key factors (2/3) Synchronization
To share same information among all sites. It leads to collaboration between different sites. Info X Info X Info Y When the time of receiving the information is different, It may not be possible to smoothly share the policies discussed at a certain site with other sites. This is because decision making process in each site is performed based on different information. For promoting collaboration between different sites, All sites need to receive same information. Info X Info X Info X Info Z Info X

Key factors (3/3) Continuation
To keep any functionalities provided by the system even in a disaster. It increases the availability of system. Site A Site A Site C Continuation is also an important factor. It means to keep any functionalities provided by the system even in a disaster. When a disaster damaged any sites, the Application is required to keep functioning. For example, Site A transfers an information to Site D via site C. If resources in site C are down by a disaster, the information does not reach site D, In this case, by using a substitute route, the data can transfer to site and the application is able to continue to run. Such a continuable system increase the availability of the application. Site C Site B Site B Site D Site D

Software-Defined IT infrastructure
for distributed visualization system 1:Tiled display wall (TDW) 3:Docker For achieving application platform satisfied with these key factors, distributed visualization system is suitable. The Distributed visualization system is a form in which multiple sites are connected by a visualization device, and has the function to achieve the key factor. Clarity and synchronization have been achieved on the Distributed Visualization System. Toward achieving the continuation, we have been studying and developing the Software-Defined IT infrastructure. Software-Defined IT infrastructure consists of main 4 components. /Tiled display wall,/ SAGE2 /Docker/ and Software-Defined Networking 2:SAGE 2 4:Software-Defined Networking (SDN)

・Multiple people involved to see the same information.
1:Tiled Display Wall 7:00 TDW is one of the visualization system like this picture. TDW provides a large scale virtual screen by using multiple display monitors, so it can provide high resolution and scalable visualization environment for large data. Moreover, it can also provide collaborative and sharable research environment because visualized contents on TDW can be seen by multiple people. ・It can provide high resolution and scalable visualization environment for large data. ・Multiple people involved to see the same information.

2:SAGE2 Data stream Rendering Data storage SAGE2 application server
SAGE2 is a middleware for supporting synchronization and visualization on tiled display wall. At first, / SAGE 2 server receives gets and prepares the visualization image from a data storage. Next, /By querying SAGE2 application server, SAGE2 display nodes receive the visualization image. After that, /By applying display nodes rendering processing, visual information is displayed on the TDW. Since information can be provided simultaneously from one site to multiple TDW sites, SAGE 2 useful not only clarity but also synchronization. 9:00 SAGE2 display nodes Tiled display wall

3:Docker Damaged SAGE2 node
Docker is a container technology that allows to package up an application. In distributed visualization system, SAGE2 is packaged by docker, Docker is active when computing nodes for SAGE2 become unavailable. For example, now, SAGE2 is executed using Docker on the left node. But, / the server has stopped due to a disaster.

3:Docker Redeploy Damaged SAGE2 node Alternative SAGE2 node Access
After stopping the node, SAGE2 is re-deployed on the alternative nodes by Docker. By switching nodes with docker, SAGE 2 can continue to provide services. Access

4:Software-Defined Networking
SDN Controller Control Control Flow Table Flow Table Flow Table Flow Table Flow Table Control Control Software-defined Networking is Dynamically controlling the network with software. Before SDN was introduced, each network-switch has to control the data flow individually. On the other hand, if SDN is introduced, all data flow are controlled collectively by a dedicated software, called SDN controller. Each network-switch holds flow table provided from SDN controller. This flow table indicates the destination of the data flow.

4:Software-Defined Networking
SDN Controller Control Control Flow Table Flow Table Flow Table Flow Table Flow Table Control Control By controlling all flow entries from one SDN controller, Management of network becomes easy. The controller can also instruct disconnection of the affected network and construction of a detour route. These four components are deployed on a PRAGMA-ENT, that is a international SDN test bed.

PRAGMA-ENT Distributed visualization system is deployed
onto PRAGAM-ENT. PRAGMA-ENT connects heterogeneous resources in different countries. PRAGMA-ENT is breakable in the sense that it offers complete freedom for researchers to access network resources to develop, experiment, and evaluate new ideas without concerns of interfering with a production network. PRAGMA-ENT provides the necessary networking support to the PRAGMA multi-cloud and user-defined trust envelopes. +4:30

Dynamic Control There is a lack of mechanism to control
each component dynamically. Currently, it is necessary to control each components independently. Control Control Control In our research to date, we have achieved the basic continuation of SDIT. In the prototype on the PRAGMA-ENT, By operating each component individually, service recovery can be realized statically. However, there is lack of mechanism to control each components dynamically. For the reason, The current SDIT can not maintain its function autonomously in response to a disaster. Computing nodes SDN Docker

Proposal Even if a disaster occurs, there is lack of mechanism
to control each component dynamically yet. We propose a resource manager handling with behavior of these components comprehensively. Centralized control by resource manager Therefore, we propose a resource manager handling with behavior of these components comprehensively. RM plays a role like the brain of distributed visualization system. By aggregating the management of each component using resource manager, flexible management corresponding to the disaster situation is realized. And, SDIT can eliminate the affects of the disaster autonomously. Computing nodes SDN Docker

Scenario Before After Redeploy the docker Keep the dataflow
We considered 1 scenario for designing the RM, like this. In this scenario, the computing node on used by Docker is down by a disaster. After such a disaster, at first, / docker needs to be redeployed to other safety computing node. At this time, by directing access to the same storage, SAGE2 can keep its functionality. Second, / the Data flow towards tiled display wall must be kept.

Scenario Before After Redeploy the docker Keep the dataflow
Furthermore, the network is need to rerouted to maintain the data flow. By reconfiguring the network in accordance with the docker migration, TDW can receive information. Reroute of the network

Requirements We extracted 5 functionalities necessary for resource manager by analyzing the behavior of the system. 1 : Collection of component’s status 2 : Detection of service down 3 : Selection of the alternative resource 4 : Reroute of the network 5 : Redeployment of the docker container Based on the scenario, we have extracted 5 functionalities necessary for Resource Manager. Collection of components status Detection of service down. Selection of the alternative resource. Reroute of the network Migration of the docker container. I will explain along the scenario. +8:00

Functionalities (1/5) Collection Resource Manager access
The first important thing is to collects the status of all components. This is because it is necessary to constantly store resource information necessary for detecting and recovering from outages. By collecting all resource’s status, it is possible to cope with the stop of service quickly. The statuses are various, to name a few, utilization ratio on computing resources , network routes and execution status of Docker’s.

Functionalities (2/5) Detection Resource Manager access
RM also needs to pay careful attention to the stop of service. In some cases, The docker server may stop and/or the network may shutout. It can force the stop of service, in other words break off of data distribution from SAGE2. By searching anomalies from the collected information, Resource manager detects the stop of service.

Functionalities (3/5) Selection Resource Manager
After detecting the stop of service, Resource manager should recovery SDIT functionalities. Since in order to recover the service, the function has to be transferred to a safe component that has not been damaged. Therefore, resource manager selects alternative resource. That is safety site for deploying docker and network routes for maintaining data flow. By using the collected information, RM can select appropriate alternative resources.

Functionalities (4/5) Reroute Resource Manager
Based on the selection results, the RM reroute the Network by using SDN. By reconfiguring the route using SDN, TDW is ready to receive visualization information. Reroute

Functionalities (5/5) Redeployment Resource Manager access
Finally, RM redeploy docker container to alternative node. Also, after deployment, Docker is configured to access the same storage. With Docker moved to a safe location and access the same storage, the service will start running again. As we just described, the five functions work in line with the scenario we are assuming.

Design of Resource Manager
Status Database Detector Selector Algorithm Algorithm Master Orchestrator Node Manager Existing System SDN Controller Docker Manager In order to realize the five functionalities, we designed an resource manager. The blue part is RM. RM consists of 7 module. ここまでで18分くらい? Computing nodes SDN Docker

Status Database Detector Selector Algorithm Algorithm Master Orchestrator Node Manager Existing System SDN Controller Docker Manager Master Orchestrator is the core module of RM. For achieving the five functionalities, in addition to mounting some module in RM, it is necessary to control these module properly. So, Master Orchestrator is incorporated into the design. Computing nodes SDN Docker

Status Database Detector Selector Algorithm Algorithm Master Orchestrator Node Manager Existing System SDN Controller Docker Manager The SDN controller is responsible for network status confirmation. And Docker Manager is responsible for Docker operation. SDN controller and Docker Manager are already sophisticated and existing tools. RM takes SDN controller and Docker Manager as modules. Because, This design makes it easy to control SDN and Docker depending on the situation by Master Orchestrator. Computing nodes SDN Docker

Status Database Detector Selector Algorithm Algorithm Master Orchestrator Node Manager Existing System SDN Controller Docker Manager Node manager is responsible for monitoring the status of computing resources. This is an original module we designed. Based on the instructions from the Master Orchestrator, Node Manager operates computing nodes directly, such as checking the status and entering commands. These 3 modules enable RM to put all components under control. Computing nodes SDN Docker

Status Database Detector Selector Algorithm Algorithm Master Orchestrator Node Manager Existing System SDN Controller Docker Manager Status Database holds data about all components. The reason for installing this is intended to make it easy to understand temporal status changes. These changes are an important decision factor for RM. Status Database is accessed from Detector and Selector as needed. Computing nodes SDN Docker

Status Database Detector Selector Algorithm Algorithm Master Orchestrator Node Manager Existing System SDN Controller Docker Manager Detector is used to detect stop of service. Selector is in charge of determination the backup sites of Docker and detour of network. Detector and Selector are designed to make decisions based on algorithms. These algorithms are characterized as pluggable. Because this makes it easy to change or update the algorithm. The algorithm may need to be modified to the environment to be introduced. Moreover, New algorithms are being researched daily, so new mechanisms may emerge in the future. Therefore, a pluggable method is adopted. 20:25 Computing nodes SDN Docker

1:Collection Master Orchestrator Computing SDN Docker nodes Status
Database Detector Selector Algorithm Algorithm Master Orchestrator Store Node Manager SDN Controller Docker Manager Then, Here is an introduction to the behavior of RM. At first, RM collects all components status constantly via SDN controller, Docker manager, and Node manager SDN controller checks the network reachability. Docker manager monitors whether the docker is running normally. Node manager supervises node’s information for all sites. The obtained information is stored in Status Database. Node’s status Network reachability Running status Computing nodes SDN Docker

2:Detection Master Orchestrator Computing SDN Docker nodes 1.Access
Status Database Detector Selector Algorithm Algorithm 2.Notify Master Orchestrator Node Manager SDN Controller Docker Manager The Detector works to find out the stop of service. Detector periodically accesses the Status Database to check the latest information of each component. The Detector analyzes the stored data based on the algorithm and then notice that to the Master Orchestrator if any function stop is detected, Computing nodes SDN Docker

3:Selection Master Orchestrator Computing SDN Docker nodes 2.Access
Status Database Detector Selector Algorithm Algorithm 1.Request Master Orchestrator 3.Return Node Manager SDN Controller Docker Manager After receiving a report from the Detector, the Master Orchestrator makes a request to the Selector for selecting an alternative resources. Selector searches a new network topology and an alternative computing node to deploy Docker. This selection is based on the information in Status Database and algorithm. Selector returns the results to Master Orchestrator. Computing nodes SDN Docker

4:Reroute Master Orchestrator Computing SDN Docker nodes Status
Database Detector Selector Algorithm Algorithm Master Orchestrator Request Node Manager SDN Controller Docker Manager Next, the Master Orchestrator requests the SDN controller for reconfiguration. The SDN controller reroutes the network to follow the decisions of the Selector. This reconfiguration establishes a route from the new SAGE2 application server to the TDW nodes. Also, unnecessary routes connected to the affected sites is removed. Reroute the network Computing nodes SDN Docker

Migrate to alternative node
5:Redeployment Status Database Detector Selector Algorithm Algorithm Master Orchestrator 1.Request 2.Request Node Manager SDN Controller Docker Manager Finally, the Master Orchestrator requests about redeployment of Docker. The Master Orchestrator makes a request for assignment to an alternate node through the Node Manager. After that, Master orchestrator tell the alternative nodes to Docker Manager, and commands it to migrate the Docker. Docker Manager redeploy the docker to the alternative node. Migrate to alternative node Request of Assignment Computing nodes SDN Docker

Migrate to alternative node
5:Redeployment Status Database Detector Selector Algorithm Algorithm Master Orchestrator 1.Request 2.Request Node Manager SDN Controller Docker Manager This is the explanation of the behavior of RM. By introducing RM, SDiT will be able to control each function dynamically according to the status of service, and improve continuation. Migrate to alternative node Request of Assignment Computing nodes SDN Docker

6 sites are connected by SDN network.
Current Status Now, we are working on implementation the Resource manager in the small environment. Resource manager Now, we are working on implementation the RM based on this design. We try to confirm the behavior of RM in the small environment of the laboratory. We will deploy the prototype of RM in an environment as we have shown before and do some delay measurements etc. 6 sites are connected by SDN network.

Summary We introduced a design of resource manager
for Software-Defined IT Infrastructure. Future work Construction of proposed resource manager in local environment Deployment to prototype system on PRAGMA-ENT (global environment) Summary is like this. We introduced a design of resource manager for SDIT In the future, we will complete the prototype of RM currently being created and deploy it on PRAGMA-ENT. This is all for my presentation. Thanks.

Yuki Matsui*, Yasuhiro Watashiba, Yoshiyuki Kido,

Similar presentations

Presentation on theme: "Yuki Matsui*, Yasuhiro Watashiba, Yoshiyuki Kido,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Yuki Matsui*, Yasuhiro Watashiba**, Yoshiyuki Kido**,

Similar presentations

Presentation on theme: "Yuki Matsui*, Yasuhiro Watashiba**, Yoshiyuki Kido**,"— Presentation transcript:

Similar presentations

About project

Feedback

Yuki Matsui*, Yasuhiro Watashiba, Yoshiyuki Kido,

Presentation on theme: "Yuki Matsui*, Yasuhiro Watashiba, Yoshiyuki Kido,"— Presentation transcript: