Presentation is loading. Please wait.

Presentation is loading. Please wait.

By Santi Caballé, Claudi Paniagua, Fatos Xhafa, and Thanasis Daradoumis Open University of Catalonia Barcelona - Spain Second International Workshop on.

Similar presentations


Presentation on theme: "By Santi Caballé, Claudi Paniagua, Fatos Xhafa, and Thanasis Daradoumis Open University of Catalonia Barcelona - Spain Second International Workshop on."— Presentation transcript:

1 by Santi Caballé, Claudi Paniagua, Fatos Xhafa, and Thanasis Daradoumis Open University of Catalonia Barcelona - Spain Second International Workshop on Grid Computing and its Application to Data Analysis GADA'05 Agia Napa, Cyprus – November, 1-2 2005 A Grid-aware Implementation for Providing Effective Feedback to On-line Learning Groups

2 GADA'052Index  Introduction: the process of embedding information and knowledge into CSCL applications.  Approach: need for structuring and processing of large amounts of group activity information.  Problem: lack of computational resources.  Solution: a Grid-aware approach based on the Master-Worker paradigm.  An application: a Grid-based prototype to process group activity log files.  Processing results: empirical analysis.  Conclusions and future work. Computer-Supported Collaborative Learning is a paradigm for research in educational technology that focuses on the use of Information and Communications Technology (ICT) as a mediation tool within collaborative methods of learning. Computer-Supported Collaborative Learning is a paradigm for research in educational technology that focuses on the use of Information and Communications Technology (ICT) as a mediation tool within collaborative methods of learning. B. Wasson (1998) In CSCL environments, the analysis of the information related to the collaborative group activity is crucial for understanding collaboration and group processes. In CSCL environments, the analysis of the information related to the collaborative group activity is crucial for understanding collaboration and group processes. P. Dillenbourg (1999)

3 GADA'053 Introduction (I): The process of embedding information and knowledge into CSCL applications The whole picture Four stages in event management:  Classification, processing, analysis and presentation.

4 GADA'054 Introduction (II): The process of embedding information and knowledge into CSCL applications Stage I: Classification  Collection of information.  Extraction of actions.  Identification of events.  Categorization according to Task performance Task performance Group functioning Group functioning Scaffolding Scaffolding  Store as system log files. Classification in synchronous environments is very similar.

5 GADA'055 Introduction (III): The process of embedding information and knowledge into CSCL applications Stage II: Processing  Obtain event information from large log files.  Process log files according to desired criteria. e.g. time time workspace workspace  Store processing results in a suitable database. Processing of events needs great computational power.

6 GADA'056 Introduction (IV): The process of embedding information and knowledge into CSCL applications Stage III: Analysis  Need for extracting complex knowledge from the database.  Define consulting criteria.  Send criteria and data to external statistics package.  Obtain useful statistical results from the analysis. External analysis offers the best existing statistical package.

7 GADA'057 Introduction (V): The process of embedding information and knowledge into CSCL applications Stage IV: Presentation  Predefine an XML coding to represent ad hoc statistical measurements.  Structure statistical results into XML output.  Convert XML into desired presentation format.  Present results to users. Users receive constant knowledge in terms of appropriate feedback to influence their motivation and emotional state.

8 GADA'058 Approach (I) Motivation  Support for real on-line environments with a large number of students and tutors that are geographically distributed.  High degree of user-user and user-system interaction generates lots of event information.  Constant provision of complex knowledge to group participants.  Need to supply efficient and useful feedback for improving the motivation, emotional state, and problem-solving abilities of groups in on-line collaborative learning.

9 GADA'059 Approach (II) Context at Open University of Catalonia  Group activity at Open University of Catalonia involves hundreds of students and dozens of tutors in several on-line courses.  The complexity of the learning practices entails intensive collaboration activity.  BSCW is used as a groupware system to capture group activity interaction in log files.  BSCW does not provide log file processing nor statistical analysis capabilities.  BSCW generates a huge daily single log file and does not classify nor structure data in any way.

10 GADA'0510 Statement of the problem Lack of computational resources  Need for processing of a huge amount of event information gathered in single log files.  Essential to constantly dispose the processing results of group activity in real-time.  Event information in log files should be partitioned in multiple log files according to particular needs.  Event information must be constantly processed in an efficient manner during the processing stage.  Lack of sufficient computational resources is the main obstacle to the constant processing of multiple data log files in real time.

11 GADA'0511  Obtain event information from large log files.  Structure the information according to particular needs.  Create log files of different degrees of granularity.  Process all log files at the same time.  Store results in the database. Need for the processing of all log files to be parallelized. Solution (I) Redefining the processing stage

12 GADA'0512 Solution (II) A Grid-based solution  Grid technology provides broad access to massive information and computational resources.  In this context, Grid computing paradigm overcomes the lack of computational resources to process a large amount of event information. overcomes the lack of computational resources to process a large amount of event information. allows processing of the log files taking advantage of the parallelism inherent in the distributed nature of Grid. allows processing of the log files taking advantage of the parallelism inherent in the distributed nature of Grid. provides load balance in the processing of log files of different granularity. provides load balance in the processing of log files of different granularity.  Master-Worker paradigm using Planetlab platform, a Grid-based approach for processing log files.

13 GADA'0513 Solution (III) Master-Worker paradigm Solution (III) Master-Worker paradigm  Distinguishes two types of processors: master: performs the control and coordination tasks. master: performs the control and coordination tasks. workers: perform most of the computational work. workers: perform most of the computational work.  Advantages: flexibility: workers can be implemented in different ways. flexibility: workers can be implemented in different ways. scalability: workers can be easily added. scalability: workers can be easily added. separation of concerns: master does coordination and workers do specific tasks. separation of concerns: master does coordination and workers do specific tasks.  Target: parallel applications with weak synchronization and reasonably large grain size.

14 GADA'0514 Solution (IV) Architecture The architecture of an application for processing log files.

15 GADA'0515 Solution (V) Implementation (I)  The workers receive and do the following task (MWTask) : address of the location of the log file; name of the log file; size of the log file; address of the location where the processing routine is found. url of the database where the processed information will be stored;  The master processor (MWDriver) is programmed as follows: while (true) do check for new log files generated from the Collaborative Learning Application Server; update the list of the for the new incoming log files; for each new log file generate a task; submit the newly generated;

16 GADA'0516 Solution (VI) Implementation (II)  The worker processor (MWWorker) is programmed as follows: receive the task; receive the specified log file from the specified location in the task description; run the processing routine on the log file; send the master the task’s report (execution time,…) on completion; send the database the processing results;  Efficiency issues: weak synchronization between master and worker ensures the application runs without loss of performance. weak synchronization between master and worker ensures the application runs without loss of performance. log files with different granularity allow an efficient load balance among workers and minimizes data transmission. log files with different granularity allow an efficient load balance among workers and minimizes data transmission. number of workers can be adapted dynamically when a new resource appears. number of workers can be adapted dynamically when a new resource appears.

17 GADA'0517 A Grid prototype (I) An application for processing log files A Grid prototype (I) An application for processing log files  EventExtractor : an ad hoc application for extracting event information from BSCW converts event information into well-formatted data. converts event information into well-formatted data. stores the extraction results in a database. stores the extraction results in a database. needs a lot of time to process sequentially. needs a lot of time to process sequentially.  MW model: appropriate in this context given that log files of different granularity are processed. log files of different granularity are processed. workers are not synchronized between them. workers are not synchronized between them. communication load between master and workers are low. communication load between master and workers are low.  Planetlab platform: using a real Grid environment by installing the Globus Toolkit 3 Grid service container, by installing the Globus Toolkit 3 Grid service container, and deploying the prototype on Planetlab. and deploying the prototype on Planetlab.

18 GADA'0518 A Grid prototype (II) Master-Worker algorithm (I): overview A Grid prototype (II) Master-Worker algorithm (I): overview  A minimal Grid implementation made up of: the worker as a Grid service that does the main work by the next steps: the worker as a Grid service that does the main work by the next steps: wraps the EventExtractor routine,wraps the EventExtractor routine, publishes an interface that the master calls in order to dispatch a task,publishes an interface that the master calls in order to dispatch a task, passes a string representation of the events to be processed, andpasses a string representation of the events to be processed, and returns a data structure containing performance information.returns a data structure containing performance information. After completion the task, the worker is put back into a queue of idle workers the master first obtains the event log file to be processed, the available workers, the task size to be dispatched to workers and the number of workers to use that put in an idle queue. Then enters the next loop: the master first obtains the event log file to be processed, the available workers, the task size to be dispatched to workers and the number of workers to use that put in an idle queue. Then enters the next loop: reads a specific number of events from a event log file,reads a specific number of events from a event log file, calls an idle worker and sends it the events to be processed,calls an idle worker and sends it the events to be processed, The master exits the loop when all events in the current log file have been read and all tasks to be dispatched have been finalized.

19 GADA'0519 A Grid prototype (III) Master-Worker algorithm (II): the Master A Grid prototype (III) Master-Worker algorithm (II): the Master  The Master implements the EventExtractorMaster interface with a single operation to call the worker’s processEvents operation returns performance statistics about the execution. returns performance statistics about the execution.  The EventExtractorMasterImp class aggregates an instance of EventExtractorMasterDispatcher to dispatch all tasks to available workers.

20 GADA'0520 A Grid prototype (IV) Master-Worker algorithm (III): the Task A Grid prototype (IV) Master-Worker algorithm (III): the Task private void _dispatchEventsToWorker(String events, long nEvents,double workerDBInsertTime, EventExtractorMasterStatsBean masterStats) throws Exception { EventExtractorWorker worker = null; worker = m_queue.getNextWorker(); this.beforeDispatch(worker); EventExtractorWorkerStatsBean workerStats = worker.processEvents(events.toString(), workerDBInsertTime); this.afterDispatch(worker); this.decrementPendingDispatchs(); } This operation synchronously sends a sequence of events (single task) to an available worker. available worker.

21 GADA'0521 A Grid prototype (V) Master-Worker algorithm (IV): the Task A Grid prototype (V) Master-Worker algorithm (IV): the Task  Two strategies to dispatch tasks to workers by blocking up to the queue of idle workers is empty. by blocking up to the queue of idle workers is empty. by implementing the queue of idle workers with the round-robin scheme. by implementing the queue of idle workers with the round-robin scheme.

22 GADA'0522 A Grid prototype (VI) Master-Worker algorithm (V): the Worker A Grid prototype (VI) Master-Worker algorithm (V): the Worker  The worker grid service implements the EventExtractorWorker interface which has only a single operation:  The worker grid service implements the EventExtractorWorker interface which has only a single operation: processEvents(String events, double dbInsertTimeInMs);  The implementation parses the events passed in order to extract the required information returns a data structure with performance information about the task executed (elapsed time, number of events and bytes processed). processEvents returns a data structure with performance information about the task executed (elapsed time, number of events and bytes processed).

23 GADA'0523 A Grid prototype (VII) Test battery A Grid prototype (VII) Test battery  An ad hoc test battery was designed made up of: exhaustive collection of log files exhaustive collection of log files from the spring term of a course with 140 students arranged in 5- member groups and 2 tutors.from the spring term of a course with 140 students arranged in 5- member groups and 2 tutors. a selected sample of a few log files a selected sample of a few log files as a representative stratum of file size and event complexity.as a representative stratum of file size and event complexity.  All test battery was processed by the EventExtractor on single-processor nodes of Planetlab involving usual configurations. involving usual configurations. with different work load. with different work load. repeating the execution several times. repeating the execution several times.

24 GADA'0524 Experimental results (I) Sequential approach Experimental results (I) Sequential approach Comparison scale for 8 representative log files Results of over 100 log files processed  Sequential processing shows that the processing time is linear on the log file size processed.

25 GADA'0525 Experimental results (II) Parallel approach (I) Experimental results (II) Parallel approach (I)  The parallel processing results were obtained by running tests for different task sizes and number of workers running tests for different task sizes and number of workers observing efficiency and speed-up for each set of workers observing efficiency and speed-up for each set of workers Observed speed-up and efficiency for 5-event task and different number of workers Observed speed-up and efficiency for 5-event task and different number of workers

26 GADA'0526 Experimental results (III) Parallel approach (II) Experimental results (III) Parallel approach (II)  Reasonable speed up is achieved in every test however, parallel efficiency tends to decrease with the number of workers. however, parallel efficiency tends to decrease with the number of workers. Observed speed-up with increasing number of workers Observed speed-up with increasing number of workers

27 GADA'0527 Experimental results (IV) Analysis of the results Experimental results (IV) Analysis of the results  Apart from very small task sizes, the speed up observed showed the feasibility of the parallelization. small task sizes were affected by the transmission time. small task sizes were affected by the transmission time.  The more workers used in our tests the further to the maximum was the speed up achieved trade off between number of workers and task size. trade off between number of workers and task size.  Results were a little biased due to the homogeneous behaviour observed in Planetlab they should be adjusted to the dynamic workload of a real Grid. they should be adjusted to the dynamic workload of a real Grid.  Results are dependent on the low complesity of the BSCW’s lof files event complexity is the key to take advantage of the Grid. event complexity is the key to take advantage of the Grid.

28 GADA'0528 Conclusions and future work  Efficient embedding of information and knowledge into group activity is a crucial factor for the success of the online collaborative learning activity.  Strong need for computational resources to process large amounts of group activity log data.  Grid-aware application based on the Master-Worker paradigm for processing log files of group activity in an efficient yet simple manner.  According to the results, the benefits of Grid enhances depending on the volume and complexity of event log files to be processed.  We plan to improve our prototype in terms of communication master-workers, fault-tolerance and dynamic discovery of idle workers.

29 GADA'0529 Thank you ! Questions?


Download ppt "By Santi Caballé, Claudi Paniagua, Fatos Xhafa, and Thanasis Daradoumis Open University of Catalonia Barcelona - Spain Second International Workshop on."

Similar presentations


Ads by Google