Automatization of the Stream Mining Process Lovro Šubelj, Zoran Bosnić, Matjaž Kukar, Marko Bajec CAiSE 2014, Thessaloniki, Greece Laboratory for Data Technologies
Laboratory for Data Tehnologies 2 Industry specific adoption layer Occapi TM RR3 – Open Intelligent Communication Platform Smart House/ Building/ City Smart Energy/ Grid Smart Traffic/ Lights/ Transport eTolling eHealth RR1 – Intelligent Infrastructure Telecom Operators RR2 - Services and Things management SME Asset & Time mgmt Motivation
Laboratory for Data Tehnologies 3 BigData Real Time Processing CEP Prediction Open connectors BAM, Dashboards IoT Platforms
Laboratory for Data Tehnologies 4
+ 5 Copyright (c) 2013 FRI-LPT, FE-LTFE % 0.008% 0.006% 0.004% 0.002% 0.000% Real time Future
Laboratory for Data Tehnologies Objective To capture expert knowledge To computerize the stream mining process 6
Laboratory for Data Tehnologies Approach Observe experts at work; Identify the main activities in the stream mining process – focus on the activities where the experts’ knowledge is crucial; Acquire expert knowledge; Prototype an expert system; Evaluate on different datasets; 7
Laboratory for Data Tehnologies Process 8
Laboratory for Data Tehnologies Prototype 9
Laboratory for Data Tehnologies Prototype 10
Laboratory for Data Tehnologies Evaluation Experimental framework: –Standard statistics (classification: CA, Kappa, F, Rand index; regression: MAE, MAPE, RMSE, Pearson); –Performance comparison: Q-statistics Datasets: –Flight delay prediction (USA, ); –Electricity market price (New South Wales, Australia) –Electric energy consumption (Portugal); –Solar energy forecast (USA, Oclahoma) 11
Laboratory for Data Tehnologies Flight delay prediction 12
Laboratory for Data Tehnologies Electricity marketplace 13
Laboratory for Data Tehnologies Electric energy consumption 14
Laboratory for Data Tehnologies Solar energy forecast 15
Laboratory for Data Tehnologies Conclusions For stream mining expert knowledge is required; The expert knowledge is sufficiently routinized and can be captured as explicit knowledge and computerized; Important finding for the development of IS on the field of big data, IoT and similar. Further work: –Full deployment of the meta learner (different learning techniques possible); –Evaluation on more datasets; –Testing in real settings (time complexity, required resources, problem scalability…); 16