By Santi Caballé, Claudi Paniagua, Fatos Xhafa, and Thanasis Daradoumis Open University of Catalonia Barcelona - Spain Second International Workshop on.

Slides:

Advertisements

Similar presentations

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Advertisements

Systems Analysis and Design in a Changing World

Chapter 8: Evaluating Alternatives for Requirements, Environment, and Implementation.

Towards a Generic Platform for Developing CSCL Applications Using Grid Infrastructure by Santi Caballé Open University of Catalonia Barcelona, Spain with.

Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.

Technical Architectures

Task Scheduling and Distribution System Saeed Mahameed, Hani Ayoub Electrical Engineering Department, Technion – Israel Institute of Technology

Think. Learn. Succeed. Aura: An Architectural Framework for User Mobility in Ubiquitous Computing Environments Presented by: Ashirvad Naik April 20, 2010.

Requirements Specification

8 Systems Analysis and Design in a Changing World, Fifth Edition.

Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.

Course Instructor: Aisha Azeem

Mapping Techniques for Load Balancing

SERVICE BROKER. SQL Server Service Broker SQL Server Service Broker provides the SQL Server Database Engine native support for messaging and queuing applications.

Understanding and Managing WebSphere V5

By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.

The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Christopher Jeffers August 2012

Research at the Open University of Catalonia on Online Collaborative Learning Fatos Xhafa Polytechnic University of Catalonia & Open University of Catalonia.

RUP Implementation and Testing

Agent-based Device Management in RFID Middleware Author ： Zehao Liu, Fagui Liu, Kai Lin Reporter ：郭瓊雯.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

An Introduction to Software Architecture

1 ISA&D7‏/8‏/ ISA&D7‏/8‏/2013 Systems Development Life Cycle Phases and Activities in the SDLC Variations of the SDLC models.

OracleAS Reports Services. Problem Statement To simplify the process of managing, creating and execution of Oracle Reports.

DNA REASSEMBLY Using Javaspace Sung-Ho Maeung Laura Neureuter.

WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

Lecture 2 Process Concepts, Performance Measures and Evaluation Techniques.

SOFTWARE DESIGN AND ARCHITECTURE LECTURE 09. Review Introduction to architectural styles Distributed architectures – Client Server Architecture – Multi-tier.

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

CHAPTER TEN AUTHORING.

Cohesion and Coupling CS 4311

Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.

Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”

SmartNets Results Overview SmartNets SmartNets Methods.

A Grid Approach to Provide Effective Awareness to On-line Collaborative Learning Teams by Santi Caballe, Thanasis Daradoumis, Claudi Paniagua and Fatos.

UAB Dynamic Tuning of Master/Worker Applications Anna Morajko, Paola Caymes Scutari, Tomàs Margalef, Eduardo Cesar, Joan Sorribes and Emilio Luque Universitat.

Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.

Modeling VHDL in POSE. Overview Motivation Motivation Quick Introduction to VHDL Quick Introduction to VHDL Mapping VHDL to POSE (the Translator) Mapping.

SOFTWARE DESIGN. INTRODUCTION There are 3 distinct types of activities in design 1.External design 2.Architectural design 3.Detailed design Architectural.

O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin.

11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.

Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.

Behavioral Patterns CSE301 University of Sunderland Harry R Erwin, PhD.

Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.

Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Abel Carrión Ignacio Blanquer Vicente Hernández.

Mantid Stakeholder Review Nick Draper 01/11/2007.

A System Performance Model Distributed Process Scheduling.

Distributed Handler Architecture Beytullah Yildiz

Tool Integration with Data and Computation Grid “Grid Wizard 2”

ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.

Distributed Handler Architecture (DHArch) Beytullah Yildiz Advisor: Prof. Geoffrey C. Fox.

General Architecture of Retrieval Systems 1Adrienn Skrop.

Seminar On Rain Technology

Coupling and Cohesion Schach, S, R. Object-Oriented and Classical Software Engineering. McGraw-Hill, 2002.

Coupling and Cohesion Pfleeger, S., Software Engineering Theory and Practice. Prentice Hall, 2001.

Access Grid Workshop – APAC ‘05 Node Services Development Thomas D. Uram Argonne National Laboratory.

Unified Modeling Language

Software Life Cycle Models

Chapter 16: Distributed System Structures

Chapter 5 Architectural Design.

Database System Architectures

Presentation transcript:

by Santi Caballé, Claudi Paniagua, Fatos Xhafa, and Thanasis Daradoumis Open University of Catalonia Barcelona - Spain Second International Workshop on Grid Computing and its Application to Data Analysis GADA'05 Agia Napa, Cyprus – November, A Grid-aware Implementation for Providing Effective Feedback to On-line Learning Groups

GADA'052Index  Introduction: the process of embedding information and knowledge into CSCL applications.  Approach: need for structuring and processing of large amounts of group activity information.  Problem: lack of computational resources.  Solution: a Grid-aware approach based on the Master-Worker paradigm.  An application: a Grid-based prototype to process group activity log files.  Processing results: empirical analysis.  Conclusions and future work. Computer-Supported Collaborative Learning is a paradigm for research in educational technology that focuses on the use of Information and Communications Technology (ICT) as a mediation tool within collaborative methods of learning. Computer-Supported Collaborative Learning is a paradigm for research in educational technology that focuses on the use of Information and Communications Technology (ICT) as a mediation tool within collaborative methods of learning. B. Wasson (1998) In CSCL environments, the analysis of the information related to the collaborative group activity is crucial for understanding collaboration and group processes. In CSCL environments, the analysis of the information related to the collaborative group activity is crucial for understanding collaboration and group processes. P. Dillenbourg (1999)

GADA'053 Introduction (I): The process of embedding information and knowledge into CSCL applications The whole picture Four stages in event management:  Classification, processing, analysis and presentation.

GADA'054 Introduction (II): The process of embedding information and knowledge into CSCL applications Stage I: Classification  Collection of information.  Extraction of actions.  Identification of events.  Categorization according to Task performance Task performance Group functioning Group functioning Scaffolding Scaffolding  Store as system log files. Classification in synchronous environments is very similar.

GADA'055 Introduction (III): The process of embedding information and knowledge into CSCL applications Stage II: Processing  Obtain event information from large log files.  Process log files according to desired criteria. e.g. time time workspace workspace  Store processing results in a suitable database. Processing of events needs great computational power.

GADA'056 Introduction (IV): The process of embedding information and knowledge into CSCL applications Stage III: Analysis  Need for extracting complex knowledge from the database.  Define consulting criteria.  Send criteria and data to external statistics package.  Obtain useful statistical results from the analysis. External analysis offers the best existing statistical package.

GADA'057 Introduction (V): The process of embedding information and knowledge into CSCL applications Stage IV: Presentation  Predefine an XML coding to represent ad hoc statistical measurements.  Structure statistical results into XML output.  Convert XML into desired presentation format.  Present results to users. Users receive constant knowledge in terms of appropriate feedback to influence their motivation and emotional state.

GADA'058 Approach (I) Motivation  Support for real on-line environments with a large number of students and tutors that are geographically distributed.  High degree of user-user and user-system interaction generates lots of event information.  Constant provision of complex knowledge to group participants.  Need to supply efficient and useful feedback for improving the motivation, emotional state, and problem-solving abilities of groups in on-line collaborative learning.

GADA'059 Approach (II) Context at Open University of Catalonia  Group activity at Open University of Catalonia involves hundreds of students and dozens of tutors in several on-line courses.  The complexity of the learning practices entails intensive collaboration activity.  BSCW is used as a groupware system to capture group activity interaction in log files.  BSCW does not provide log file processing nor statistical analysis capabilities.  BSCW generates a huge daily single log file and does not classify nor structure data in any way.

GADA'0510 Statement of the problem Lack of computational resources  Need for processing of a huge amount of event information gathered in single log files.  Essential to constantly dispose the processing results of group activity in real-time.  Event information in log files should be partitioned in multiple log files according to particular needs.  Event information must be constantly processed in an efficient manner during the processing stage.  Lack of sufficient computational resources is the main obstacle to the constant processing of multiple data log files in real time.

GADA'0511  Obtain event information from large log files.  Structure the information according to particular needs.  Create log files of different degrees of granularity.  Process all log files at the same time.  Store results in the database. Need for the processing of all log files to be parallelized. Solution (I) Redefining the processing stage

GADA'0512 Solution (II) A Grid-based solution  Grid technology provides broad access to massive information and computational resources.  In this context, Grid computing paradigm overcomes the lack of computational resources to process a large amount of event information. overcomes the lack of computational resources to process a large amount of event information. allows processing of the log files taking advantage of the parallelism inherent in the distributed nature of Grid. allows processing of the log files taking advantage of the parallelism inherent in the distributed nature of Grid. provides load balance in the processing of log files of different granularity. provides load balance in the processing of log files of different granularity.  Master-Worker paradigm using Planetlab platform, a Grid-based approach for processing log files.

GADA'0513 Solution (III) Master-Worker paradigm Solution (III) Master-Worker paradigm  Distinguishes two types of processors: master: performs the control and coordination tasks. master: performs the control and coordination tasks. workers: perform most of the computational work. workers: perform most of the computational work.  Advantages: flexibility: workers can be implemented in different ways. flexibility: workers can be implemented in different ways. scalability: workers can be easily added. scalability: workers can be easily added. separation of concerns: master does coordination and workers do specific tasks. separation of concerns: master does coordination and workers do specific tasks.  Target: parallel applications with weak synchronization and reasonably large grain size.

GADA'0514 Solution (IV) Architecture The architecture of an application for processing log files.

GADA'0515 Solution (V) Implementation (I)  The workers receive and do the following task (MWTask) : address of the location of the log file; name of the log file; size of the log file; address of the location where the processing routine is found. url of the database where the processed information will be stored;  The master processor (MWDriver) is programmed as follows: while (true) do check for new log files generated from the Collaborative Learning Application Server; update the list of the for the new incoming log files; for each new log file generate a task; submit the newly generated;

GADA'0516 Solution (VI) Implementation (II)  The worker processor (MWWorker) is programmed as follows: receive the task; receive the specified log file from the specified location in the task description; run the processing routine on the log file; send the master the task’s report (execution time,…) on completion; send the database the processing results;  Efficiency issues: weak synchronization between master and worker ensures the application runs without loss of performance. weak synchronization between master and worker ensures the application runs without loss of performance. log files with different granularity allow an efficient load balance among workers and minimizes data transmission. log files with different granularity allow an efficient load balance among workers and minimizes data transmission. number of workers can be adapted dynamically when a new resource appears. number of workers can be adapted dynamically when a new resource appears.

GADA'0517 A Grid prototype (I) An application for processing log files A Grid prototype (I) An application for processing log files  EventExtractor : an ad hoc application for extracting event information from BSCW converts event information into well-formatted data. converts event information into well-formatted data. stores the extraction results in a database. stores the extraction results in a database. needs a lot of time to process sequentially. needs a lot of time to process sequentially.  MW model: appropriate in this context given that log files of different granularity are processed. log files of different granularity are processed. workers are not synchronized between them. workers are not synchronized between them. communication load between master and workers are low. communication load between master and workers are low.  Planetlab platform: using a real Grid environment by installing the Globus Toolkit 3 Grid service container, by installing the Globus Toolkit 3 Grid service container, and deploying the prototype on Planetlab. and deploying the prototype on Planetlab.

GADA'0518 A Grid prototype (II) Master-Worker algorithm (I): overview A Grid prototype (II) Master-Worker algorithm (I): overview  A minimal Grid implementation made up of: the worker as a Grid service that does the main work by the next steps: the worker as a Grid service that does the main work by the next steps: wraps the EventExtractor routine,wraps the EventExtractor routine, publishes an interface that the master calls in order to dispatch a task,publishes an interface that the master calls in order to dispatch a task, passes a string representation of the events to be processed, andpasses a string representation of the events to be processed, and returns a data structure containing performance information.returns a data structure containing performance information. After completion the task, the worker is put back into a queue of idle workers the master first obtains the event log file to be processed, the available workers, the task size to be dispatched to workers and the number of workers to use that put in an idle queue. Then enters the next loop: the master first obtains the event log file to be processed, the available workers, the task size to be dispatched to workers and the number of workers to use that put in an idle queue. Then enters the next loop: reads a specific number of events from a event log file,reads a specific number of events from a event log file, calls an idle worker and sends it the events to be processed,calls an idle worker and sends it the events to be processed, The master exits the loop when all events in the current log file have been read and all tasks to be dispatched have been finalized.

GADA'0519 A Grid prototype (III) Master-Worker algorithm (II): the Master A Grid prototype (III) Master-Worker algorithm (II): the Master  The Master implements the EventExtractorMaster interface with a single operation to call the worker’s processEvents operation returns performance statistics about the execution. returns performance statistics about the execution.  The EventExtractorMasterImp class aggregates an instance of EventExtractorMasterDispatcher to dispatch all tasks to available workers.

GADA'0520 A Grid prototype (IV) Master-Worker algorithm (III): the Task A Grid prototype (IV) Master-Worker algorithm (III): the Task private void _dispatchEventsToWorker(String events, long nEvents,double workerDBInsertTime, EventExtractorMasterStatsBean masterStats) throws Exception { EventExtractorWorker worker = null; worker = m_queue.getNextWorker(); this.beforeDispatch(worker); EventExtractorWorkerStatsBean workerStats = worker.processEvents(events.toString(), workerDBInsertTime); this.afterDispatch(worker); this.decrementPendingDispatchs(); } This operation synchronously sends a sequence of events (single task) to an available worker. available worker.

GADA'0521 A Grid prototype (V) Master-Worker algorithm (IV): the Task A Grid prototype (V) Master-Worker algorithm (IV): the Task  Two strategies to dispatch tasks to workers by blocking up to the queue of idle workers is empty. by blocking up to the queue of idle workers is empty. by implementing the queue of idle workers with the round-robin scheme. by implementing the queue of idle workers with the round-robin scheme.

GADA'0522 A Grid prototype (VI) Master-Worker algorithm (V): the Worker A Grid prototype (VI) Master-Worker algorithm (V): the Worker  The worker grid service implements the EventExtractorWorker interface which has only a single operation:  The worker grid service implements the EventExtractorWorker interface which has only a single operation: processEvents(String events, double dbInsertTimeInMs);  The implementation parses the events passed in order to extract the required information returns a data structure with performance information about the task executed (elapsed time, number of events and bytes processed). processEvents returns a data structure with performance information about the task executed (elapsed time, number of events and bytes processed).

GADA'0523 A Grid prototype (VII) Test battery A Grid prototype (VII) Test battery  An ad hoc test battery was designed made up of: exhaustive collection of log files exhaustive collection of log files from the spring term of a course with 140 students arranged in 5- member groups and 2 tutors.from the spring term of a course with 140 students arranged in 5- member groups and 2 tutors. a selected sample of a few log files a selected sample of a few log files as a representative stratum of file size and event complexity.as a representative stratum of file size and event complexity.  All test battery was processed by the EventExtractor on single-processor nodes of Planetlab involving usual configurations. involving usual configurations. with different work load. with different work load. repeating the execution several times. repeating the execution several times.

GADA'0524 Experimental results (I) Sequential approach Experimental results (I) Sequential approach Comparison scale for 8 representative log files Results of over 100 log files processed  Sequential processing shows that the processing time is linear on the log file size processed.

GADA'0525 Experimental results (II) Parallel approach (I) Experimental results (II) Parallel approach (I)  The parallel processing results were obtained by running tests for different task sizes and number of workers running tests for different task sizes and number of workers observing efficiency and speed-up for each set of workers observing efficiency and speed-up for each set of workers Observed speed-up and efficiency for 5-event task and different number of workers Observed speed-up and efficiency for 5-event task and different number of workers

GADA'0526 Experimental results (III) Parallel approach (II) Experimental results (III) Parallel approach (II)  Reasonable speed up is achieved in every test however, parallel efficiency tends to decrease with the number of workers. however, parallel efficiency tends to decrease with the number of workers. Observed speed-up with increasing number of workers Observed speed-up with increasing number of workers

GADA'0527 Experimental results (IV) Analysis of the results Experimental results (IV) Analysis of the results  Apart from very small task sizes, the speed up observed showed the feasibility of the parallelization. small task sizes were affected by the transmission time. small task sizes were affected by the transmission time.  The more workers used in our tests the further to the maximum was the speed up achieved trade off between number of workers and task size. trade off between number of workers and task size.  Results were a little biased due to the homogeneous behaviour observed in Planetlab they should be adjusted to the dynamic workload of a real Grid. they should be adjusted to the dynamic workload of a real Grid.  Results are dependent on the low complesity of the BSCW’s lof files event complexity is the key to take advantage of the Grid. event complexity is the key to take advantage of the Grid.

GADA'0528 Conclusions and future work  Efficient embedding of information and knowledge into group activity is a crucial factor for the success of the online collaborative learning activity.  Strong need for computational resources to process large amounts of group activity log data.  Grid-aware application based on the Master-Worker paradigm for processing log files of group activity in an efficient yet simple manner.  According to the results, the benefits of Grid enhances depending on the volume and complexity of event log files to be processed.  We plan to improve our prototype in terms of communication master-workers, fault-tolerance and dynamic discovery of idle workers.

GADA'0529 Thank you ! Questions?