Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Log Data Analytics with Hadoop

Similar presentations


Presentation on theme: "Web Log Data Analytics with Hadoop"— Presentation transcript:

1 Web Log Data Analytics with Hadoop
Presented by Yang-Syuan Chen,

2 Outline Analyzing Web Application Log Files with Hadoop
Introduction to Cloud Computing Hadoop: An Overview System Architecture and Implementation Result of Analyzing Web Application Log LASyM: A Learning Analytics System for MOOCs Learning Analytics for MOOCs LASyM: Architecture, Implementation and Evaluation Big Log Analysis for E-Learning Ecosystem Characteristics of Big Log Analysis Logging Architecture for E-learning Ecosystem Applications of Logging Architecture Conclusion

3 Analyzing Web Application Log Files to Find Hit Count through the Utilization of Hadoop Mapreduce in Cloud Computing Environment Sayalee Narkhede, Trupti Baraskar, Debajyoti Mukhopadhyay Conference on IT in Business, Industry and Government (CSIBIG), March. 8-9, 2014, pp.1-7

4 Introduction to Cloud Computing
Cloud computing is a kind of internet-based computing, where shared resources and information are provided to computers and other devices on-demand. Characteristic: low cost hardware, storage capacity, increase in computing power and huge data size. The main challenge in the cloud is how to effectively store, query, analyze, and utilize immense datasets. Solution: MapReduce model & Hadoop Log files contain tons of information which is useful for making business decisions and future assessment.

5 Hadoop: An Overview Hadoop HDFS MapReduce Open-source framework
Distributed processing of massive data sets on clusters. HDFS Split the file into blocks which allocated in the nodes. Duplication mechanism gives reliability and availability regardless of node failures. MapReduce MapReduce delivers a mechanism for programmers to process the data sets on a distributed system. Fig. 1 MapReduce Framework

6 System Architecture and Implementation
The system is composed of two phases involving log preprocessing and analysis phase. Preprocessing Analysis Fig. 2 System Workflow Fig. 3 Preprocessed Log File

7 System Architecture and Implementation
Fig. 4 System Architecture Fig. 5 Results of Analysis

8 Result of Analyzing Web Application Log
Fig. 7 Performance of Different Clusters Fig. 6 Hits for Each City (Bar Chart) Hits for Each Quarter of the Year(Pie Chart)

9 LASyM: A Learning Analytics System for MOOCs
Yassine Tabaa,  Abdellatif Medouri International Journal of Advanced Computer Science and Applications, Vol. 4, No. 5, 2013, pp

10 Learning Analytics for MOOCs
An online course aimed at unlimited participation and open access via the web. Two features: Open accessibility & Scalability Learning Analytics The measurement, collection, analysis and reporting of data about learners. For purposes of understanding and optimizing learning and the environments in which it occurs.

11 Learning Analytics for MOOCs
MOOCs' Big Data Coursera in 2012 3.1 million students 332 courses EdX in 2014 2.5 million students Over 200 courses Our Solution LASyM, a Learning Analytics System for MOOCs. Fig. 8 Lifecycle of MOOCs’ big data

12 Learning Analytics for MOOCs
MOOC Student Patterns Based on the Phil classification of student types in a coursera-MOOC style, we redefine selected groups in the following modified classification list: Ghosts Observers Non-completers Passive Participants Active Participants “at-risk student”

13 Learning Analytics for MOOCs
A method to identify "at-risk" students in MOOC environments. Two principal characteristics: Interaction Persistence Engagement Degree 持續性表明使用者在時間方面專注課程的穩定度。 互動性指標是用學生對於特定課程的互動來測量。 ED為了辨認學習者是否是一個潛在的「有風險的」人,這將導致EDc(s)的值介於[0, 1]之間。 Fig. 9 Method of identify “at-risk” learners

14 LASyM: Architecture, Implementation and Evaluation
Experimental setup Based on Hadoop 1 resource manager 12 nodes Data Integrator Based on MapReduce application 本節開始於描述實驗環境與基礎架構也就是佈署的LASyM的元件。然後,為了展現所提出系統的有效性,一個小規模的腳本實作基於MapReduce的應用程式辨識已設置的「有風險的」學習者。 Fig. 10 LASyM Architecture

15 LASyM: Architecture, Implementation and Evaluation
We executed the developed MapReduce-based application into LASyM in different number of parallel nodes. Fig. 11 Learning analytics speedup using LASyM

16 Big Log Analysis for E-Learning Ecosystem
Qinghua Zheng, Huan He, Tian Ma, Ni Xue, Bing Li, Bo Dong IEEE 11th International Conference on e-Business Engineering (ICEBE) , Nov. 5-7, 2014, pp

17 Characteristics of Big Log Analysis
There are three challenges to take full advantage of e-Learning log data: Multi-dimension data Massive log data with various sources, formats and applications of the ecosystem. Complexity and variety of log analysis. 根據以上分析的log data特性,顯著的是log data揭露了大數據的特性,因此,採用大數據的分析方法與工具於log data上是很合理且可行的。

18 Characteristics of Big Log Analysis

19 Logging Architecture for E-learning Ecosystem
The logging architecture consists of five modules. Collection Module: 包含分散在各層中的收集器,紀錄不同物件產生的log並一般化收集到的資料。 Transport  Module: 通常log被收集時是分別暫存的,立刻進行處理會帶來很大的成本,不利於有效分析big log data,所以要傳輸到儲存模組或計算模組作集中計算。 Storage Module Raw Data Storage system: 有順序的儲存歷史資料並永久作為未來資料挖掘與分析,需要可擴展性以應付大量的log data,並支援資料密集的計算。 Result  Data  Storage system: 為了提供Stream computing[4]和服務模組的高I/O效能,要有能快速存取資料的能力。 Computation Module Data-intensive Computing: 分析大量的原始資料以挖掘有價值的資訊 StreamComputing: 處理每個即時進來的資料,旨在快速地收到回應或發現例外。 Service Module: 為不同的使用者提供服務,在功能性上可以分成三個種類: 歷史資料統計分析、即時監測服務和預測服務。 Fig. 12 Logging architecuture for e-Learning ecosystem

20 Applications of Logging Architecture
Computing students' admission and attendance situations. Fig. 13 An implemented logging architecture based on BlueSky ecosystem

21 Applications of Logging Architecture
Results of log analytics Raw data size: 17 GB (15,489,655 rows) Results of log analytics data size: 1.2 GB Fig. 14 Statistics for number of students attending class Fig. 15 Statistics for total number of all online students

22 Analyzing Web Application Log Files with Hadoop
Conclusion Analyzing Web Application Log Files with Hadoop Big Log Analysis for E-Learning Ecosystem LASyM: A Learning Analytics System for MOOCs The Correlation Web app analytics Log analysis Hadoop E-Learning

23 Conclusion Hadoop reduced the latency time to analyze the huge amount of data. The data analytics life cycle from the architectures: Collection, Transport, Storage, Computation and Service The architecture which covers e-learning ecosystem is more complexity but also provides more analysis services.


Download ppt "Web Log Data Analytics with Hadoop"

Similar presentations


Ads by Google