Download presentation
Presentation is loading. Please wait.
1
Tomáš Jurníček, Jakub Jůza, Lenka Kmeťová
Big text data mining Tomáš Jurníček, Jakub Jůza, Lenka Kmeťová
2
Introduction Text data analysis Sophisticaded analytic methods
Information extraction from data
3
Big data and data mining
datasets of large size and complexity Companies have large amounts of data Data needs to be analyzed Problem: natural language Data mining Data cleaning Data integration Data selection Mining methods Evaluating results
4
Methods Information extraction Categorization Clustering Visualization
Key phrases and relations Unstructured text Categorization Assign categories to documents Clustering Using clusters Visualization Present data in a form understable for humans Summarization Long documents Expressing only core information
5
Tools Large companies like Facebook or LinkedIn work on open-source projects. For example: Apache Hadoop - for data-heavy distributed applications Apache S4- for continuous processing of data streams Storm (Twitter) - for streaming distributed data Open source tools for Big Data Mining: Apache Mahout, R, MOA,…
6
Nursing records A specific area of use for Big data mining
Electronic Medical Record (EMR) = information about patients This data is not used to its full potential. information is written in an unstructured style expressions are highly subjective -> Data mining is more complicated
7
Nursing records Result analyzed by KeyGraph
associations and frequent terms that represent basic concepts in the data
8
Future There are a lot of challanges:
Statistical significance – quality of statistical resultst for large sets of data Distributed mining – more parallelize methods Time evolving data - data is changing in conjuction with time Hidden big data – a lot of data is unlabeled and unstructured. Currently, only 3% of data is usable for data mining!
9
Conclusion We are at the beginning of a new era, when Big text data mining will allow to discover new, currently unknown, knowledge.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.