Tomáš Jurníček, Jakub Jůza, Lenka Kmeťová

Slides:



Advertisements
Similar presentations
Big Data Training Course for IT Professionals Name of course : Big Data Developer Course Duration : 3 days full time including practical sessions Dates.
Advertisements

New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Information is the essence. We all work with information. Our work depends on it. …accurate …valid …reliable …current We depend on it. It comes in different.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
Data Mining – Intro.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Data Mining: A Closer Look
Data Mining.
Introduction Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Introduction Facebook How does Facebook use your data? Where do you think.
Author : Jochen Dijrre, Peter Gerstl, Roland Seiffert Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
WHT/ HPCC Systems Flavio Villanustre VP, Products and Infrastructure HPCC Systems Risk Solutions.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Big Data and Hadoop and DLRL Introduction to the DLRL Hadoop Cluster Sunshin Lee and Edward A. Fox DLRL, CS, Virginia Tech 21 May 2015 presentation for.
Big Data Course Plans at Purdue Ananth Iyer. Big Data/Analytics Coursera course on Big Data by Bill Howe claims that Big Data involves issues of
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Tyson Condie.
WORKSHOP- BIG DATA ANALYTICS Israeli Social Protest Osher Arbib Winter Tel-Aviv University 1.
CS525: Big Data Analytics Machine Learning on Hadoop Fall 2013 Elke A. Rundensteiner 1.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.
A semantic based methodology to classify and protect sensitive data in medical records Flora Amato, Valentina Casola, Antonino Mazzeo, Sara Romano Dipartimento.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Data Mining Basics. “Copyright and Terms of Service Copyright © Texas Education Agency. The materials found on this website are copyrighted © and trademarked.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Content Analytics – Uncovering Critical Insight YellowBrix 3/2/20161.
Big Data Yuan Xue CS 292 Special topics on.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Big Data Javad Azimi May First of All… Sorry about the language  Feel free to ask any question Please share similar experiences.
Data Science Interview Questions 1.What do you mean by word Data Science? Data Science is the extraction of knowledge from large.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
© Prentice Hall, 2005 Excellence in Business CommunicationChapter Planning Business Reports and Proposals.
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
SNS COLLEGE OF TECHNOLOGY
SAS users meeting in Halifax
Data Analysis.
MapReduce Compiler RHadoop
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Taking a Tour of Text Analytics
Data Mining, Data Science, Big Data
Tutorial: Big Data Algorithms and Applications Under Hadoop
Big-Data Fundamentals
Lesson 13 - Cleaning Data Lesson 14 - Creating Summary Tables
Introduction C.Eng 714 Spring 2010.
IBM Content and Predictive Analytics for Healthcare How it works
Big Data Machine Learning using Apache Spark MLlib
Power of Social Media Analytics
Applications of Data Mining in Software Engineering
Social Media Data Mining
DATA ANALYTICS AND TEXT MINING
Data Warehousing and Data Mining
Prepared by: Mahmoud Rafeek Al-Farra
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Lesson 13 - Cleaning Data Lesson 14 - Creating Summary Tables
Charles Tappert Seidenberg School of CSIS, Pace University
CSE591: Data Mining by H. Liu
Presentation transcript:

Tomáš Jurníček, Jakub Jůza, Lenka Kmeťová Big text data mining Tomáš Jurníček, Jakub Jůza, Lenka Kmeťová

Introduction Text data analysis Sophisticaded analytic methods Information extraction from data

Big data and data mining datasets of large size and complexity Companies have large amounts​ of data Data needs to be analyzed Problem: natural language Data mining Data cleaning Data integration Data selection Mining methods Evaluating results

Methods Information extraction Categorization Clustering Visualization Key phrases and relations Unstructured text Categorization Assign categories to documents Clustering Using clusters Visualization Present data in a form understable for humans Summarization Long documents Expressing only core information

Tools Large companies like Facebook or LinkedIn work on open-source projects. For example: Apache Hadoop - for data-heavy distributed applications Apache S4- for continuous processing of data streams Storm (Twitter) - for streaming distributed data Open source tools for Big Data Mining: Apache Mahout, R, MOA,…

Nursing records A specific area of use for Big data mining Electronic Medical Record (EMR) = information about patients This data is not used to its full potential. information is written in an unstructured style expressions are highly subjective -> Data mining is more complicated

Nursing records Result analyzed by KeyGraph associations and frequent terms that represent basic concepts in the data

Future There are a lot of challanges: Statistical significance – quality of statistical resultst for large sets of data Distributed mining – more parallelize methods Time evolving data - data is changing in conjuction with time Hidden big data – a lot of data is unlabeled and unstructured. Currently, only 3% of data is usable for data mining!

Conclusion We are at the beginning of a new era, when Big text data mining will allow to discover new, currently unknown, knowledge.