ENHANCEMENT OF BIG DATA INTEGRATION METHOD MAISARAH BINTI ZORKEFLEE 814594.

Slides:



Advertisements
Similar presentations
Evaluation of electronic resources. Review of Internet quality issues Nearly anyone can publish information on the Internet so –academic journals sit.
Advertisements

Dept of Biomedical Engineering, Medical Informatics Linköpings universitet, Linköping, Sweden A Data Pre-processing Method to Increase.
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
MOLEDINA-1 CSE 5810 CSE5810: Intro to Biomedical Informatics The Role of AI in Clinical Decision Support Saahil Moledina University of Connecticut
Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching ANSHUL VARMA FAISAL QURESHI.
Web search results clustering Web search results clustering is a version of document clustering, but… Billions of pages Constantly changing Data mainly.
1 SOFTWARE LIFE-CYCLES Beyond the Waterfall. 2 Requirements System Design Detailed Design Implementation Installation & Testing Maintenance The WATERFALL.
ID 2050 Lecture #3. The Project Proposal (this year) Introduction Introduces the Big Picture of your IQP, plus it also introduces (in move 5) the topics.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
Data Mining By Archana Ketkar.
QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts,
Data Mining – Intro.
Michele Dupuis, Senior Officer Knowledge Integration SSHRC Knowledge Mobilization: An Overview of SSHRC’s policies and practices March 31, 2014.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Data quality challenges in the Canadensys network of occurrence records: examples, tools, and solutions Christian Gendreau, David Shorthouse & Peter Desmet.
Web Usage Mining Sara Vahid. Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample.
A Computational Framework for Multi-dimensional Context- aware Adaptation Vivian Genaro Motti LILAB – Louvain Interaction Laboratory Université catholique.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Data Mining Chun-Hung Chou
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Data Mining GyuHyeon Choi. ‘80s  When the term began to be used  Within the research community.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
BACKGROUND KNOWLEDGE IN ONTOLOGY MATCHING Pavel Shvaiko joint work with Fausto Giunchiglia and Mikalai Yatskevich INFINT 2007 Bertinoro Workshop on Information.
By: Cherry Dale D. Daumar. Introduction Computer and internet have changed our lives and even our society since they were first introduced. Computers.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Protecting Sensitive Labels in Social Network Data Anonymization.
Module 5 A system where in its parts perform a unified job of receiving inputs, processes the information and transforms the information into a new kind.
1 Relevance Ranking in the Scholarly Domain Dr. Tamar Sadeh LIBER Conference Tartu, Estonia, June 2012 Dr. Tamar Sadeh LIBER Conference Tartu, Estonia,
1 Discovery of Temporal Patterns in Course-of-Disease Medical Data Jorge C. G. Ramirez Ph.D. Candidate Lynn L. Peterson and Diane J. Cook Supervising Professors.
Multi-Criteria Routing in Pervasive Environment with Sensors Santhanakrishnan, G., Li, Q., Beaver, J., Chrysanthis, P.K., Amer, A. and Labrinidis, A Department.
B3AS Joseph Lewthwaite 1 Dec, 2005 ARL Knowledge Fusion COE Program.
Co-funded by the European Community eContentplus programme The “Protected Areas” scenario of the HUMBOLDT project Roderic Molina GISIG NATURE-SDIplus Good.
Extending Traditional Algorithms for Cyber-Physical Systems Sumeet Gujrati and Gurdip Singh Kansas State University.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Author: B. C. Bromley Presented by: Shuaiyuan Zhou Quasi-random Number Generators for Parallel Monte Carlo Algorithms.
1 A Historical Perspective on Conceptual Modelling (Based on an article and presentation by Janis Bubenko jr., Royal Institute of Technology, Sweden. June.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
Supervised Clustering of Label Ranking Data Mihajlo Grbovic, Nemanja Djuric, Slobodan Vucetic {mihajlo.grbovic, nemanja.djuric,
Enabling the Future Service-Oriented Internet (EFSOI 2008) Supporting end-to-end resource virtualization for Web 2.0 applications using Service Oriented.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Identifying Entity Relationships in News Reports 27. January 2010 Martin Jačala, Jozef Tvarožek Faculty of Informatics and Information Technology Slovak.
GeoSpatial and GeoTemporal Informatics for dynamic and complex systems May Yuan.
Semantic Overlay Networks in P2P systems A. Crespo, H. Garcia-Molina Speaker: Pavel Serdyukov Tutor: Jens Graupmann.
Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.
Analyzing Work: One Approach Problem: What’s missing currently? What does the work aim to understand? Approach: How does the work go about addressing the.
External Communications Working Group Molly E Brown, NASA GSFC with WG team.
Data Mining with Big Data IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014 Xiangyu Cai ( )
Rule-Based Method for Entity Resolution IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING JANUARY 2015.
SPIHT algorithm combined with Huffman encoding Wei Li, Zhen Peng Pang, Zhi Jie Liu, 2010 Third International Symposium on Intelligent Information Technology.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Exploring Social Influence via Posterior Effect of Word-of-Mouth Recommendations Junming Huang, Xue-Qi Cheng, Hua-Wei Shen, Tao Zhou, Xiaolong Jin WSDM.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
VERA AULIA ( ).  Oil palm is one of the major edible oil traded in the global market.  Oil palm tree will start to produce fruits within three.
FAST DYNAMIC MAGNETIC RESONANCE IMAGING USING LINEAR DYNAMICAL SYSTEM MODEL Vimal Singh, Ahmed H. Tewfik The University of Texas at Austin 1.
SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo Study of high-dimensional data for data integration.
Warren Shen, Xin Li, AnHai Doan Database & AI Groups University of Illinois, Urbana Constraint-Based Entity Matching.
Data Mining – Intro.
Building a tailored list
Databases, Ontologies and Text mining Session Introduction Part 2
CFA: A Practical Prediction System for Video Quality Optimization
Introduction to TIMAN: Text Information Managemetn & Analysis
Areas of Research Xia Jiang Assistant Professor
Data Warehousing and Data Mining
Image Enhancement in the
NASA ROSES 2007: Decision Support through Earth Science Research Results Improving an Air Quality Decision Support System through the Integration of Satellite.
Data Integration for Relational Web
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Evaluate the integral {image}
Presentation transcript:

ENHANCEMENT OF BIG DATA INTEGRATION METHOD MAISARAH BINTI ZORKEFLEE

STRUCTURE OF PRESENTATION Introduction Problem Tree Problem Statement Significance of Study Research Questions Research Objectives References

INTRODUCTION What is data integration? Data integration can be defined as combination of data from different sources and be presented to the users in unified form (Calvanese & Giacomo, 2005).

INTRODUCTION Where does data integration been used? It has been used in several domains such as websites, education, social networks and astronomy (Dong & Srivastava, 2013).

INTRODUCTION Why data integration is used? Data integration provides convinience to the users that need fast, current and clean data (Louie, Mork, Martin-Sanchez, Halevy & Tarczy-Hornoch, 2007)

INTRODUCTION How does research in data integration is significant? The arising issue among the researchers of data integration community is big data integration which is different from traditional data integration (Dong & Srivastava, 2013).

PROBLEM TREE Big Data Incomplete Data Heterogeneous data sources Non-uniform quality requirements Inconsistency Data Temporal inconsistency Spatial inconsistency Text inconsistency High- Dimensional Data Set Entity-name clustering Entity-name matching Overlapping data Contain closely related Missing dataLosing value

Big DataIncomplete Data Heterogeneous data sources Non-uniform quality requirements Inconsistency Data Temporal inconsistency Spatial inconsistency Text inconsistency High- Dimensional Data Set Entity-name clustering Entity-name matching Overlapping data Contain closely related Missing dataLosing value

PROBLEM STATEMENT Incomplete data Heterogeneity data sources Quality data analysis

SIGNIFICANCE OF STUDY Contribution to the development of big data integration in the domain of education.

RESEARCH QUESTIONS How to integrate heterogeneous data sources? How to increase the quality of data analysis? How to evaluate the methods performance?

RESEARCH OBJECTIVES To find out the method to integrate heterogenous data sources To enhance the method to increase the quality of data analysis. To evaluate the performance of the enhanced methods by comparing the algorithm from previous method.

REFERENCES Calvanese, D., & De Giacomo, G. (2005). Data integration: A logic-based perspective. AI magazine, 26(1), 59. Dong, X. L., & Srivastava, D. (2013). CONFERENCE: Big Data Integration. ICDE Conference 2013, pp. 1245–1248. Louie, B., Mork, P., Martin-Sanchez, F., Halevy, A., & Tarczy-Hornoch, P. (2007). Data integration and genomic medicine. Journal of biomedical informatics, 40(1), 5–16.

THANK YOU