Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data and text mining: the search for unknown knowns Geoffrey Bilder UKSG, 2007

Similar presentations


Presentation on theme: "Data and text mining: the search for unknown knowns Geoffrey Bilder UKSG, 2007"— Presentation transcript:

1

2 Data and text mining: the search for unknown knowns Geoffrey Bilder UKSG, 2007 gbilder@crossref.org

3 "Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know."

4 The Mining Metaphor

5

6 Gold Mining

7 Diamond Mining

8 Data Mining

9 Data Mining- What it isn’t

10 ≠ Information Retrieval

11 ≠ Information Extraction

12 ≠ Information Analysis

13 ++ Information Retrieval Information Extraction Information Analysis

14 Data Mining new, previously unknown information

15 And so what is text data mining?

16 Text Mining

17

18 ++ Information Retrieval Information Extraction Information Analysis

19

20 Crucial question for publishers is: “If ‘hiding’ information in unstructured text is a problem- then shouldn’t we be exploring new ways to “publish”?

21 So how did we get here?

22 The word tobacco originates from the Taino indians. There is no I in the word Team. The book captured the zeitgeist of the time. I am sure that I turned the gas off.

23 The book captured the zeitgeist of the time. I am sure that I turned the gas off.

24

25

26 Semantic Web “Light”

27

28

29

30

31

32 But we can do more...

33 The web as a database

34 TitleAuthorISBN-13Publisher Labyrinths Jorge Luis Borges 978- 0811200127 New Directions HopscotchJulio Cortazar 978- 0394752846 Pantheon The Aleph Jorge Luis Borges 978- 0140286809 Penguin... The Relational Model

35 TitleAuthorISBN-13Publisher Labyrinths Jorge Luis Borges 978- 0811200127 New Directions HopscotchJulio Cortazar 978- 0394752846 Pantheon The Aleph Jorge Luis Borges 978- 0140286809 Penguin... Rows represent things

36 TitleAuthorISBN-13Publisher Labyrinths Jorge Luis Borges 978- 0811200127 New Directions HopscotchJulio Cortazar 978- 0394752846 Pantheon The Aleph Jorge Luis Borges 978- 0140286809 Penguin... Columns are properties

37 TitleAuthorISBN-13Publisher Labyrinths Jorge Luis Borges 978- 0811200127 New Directions HopscotchJulio Cortazar 978- 0394752846 Pantheon The Aleph Jorge Luis Borges 978- 0140286809 Penguin... The book has an author “Jorge Luis Borges” The thing’s property SubjectPredicateObject

38 The book has an author “Jorge Luis Borges” SubjectPredicateObject URI

39 http://www.amazon.com/isbn/978-0140286809 has an author http://www.wikipedia.com/borges RDF: Resource Description Framework

40 Journal A Journal B Wiki Blog Personal Website OPAC

41 Journal A Journal B Wiki Blog Personal Website OPAC

42

43 PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX foaf: http://xmlns.com/foaf/0.1/ SELECT DISTINCT ?name WHERE { ?x rdf:type foaf:Person. ?x foaf:name ?name } ORDER BY ?name SPARQL http://api.ingentaconnect.com/content/cabi/nrr/latest?format=rss

44

45

46 RSS 1.0 FRBR Creative Commons FOAF Geo SKOS

47 The Early Modern Internet

48 Data Mining = With the goal of discovering new, previously unknown information Information retrieval + Information extraction + Information analysis...

49 Data Mining = Text Data Mining = With the goal of discovering new, previously unknown information Complex data extraction layer + data mining Information retrieval + Information extraction + Information analysis...

50

51

52

53

54 Why do we publish text?

55 Thank You gbilder@crossref.org


Download ppt "Data and text mining: the search for unknown knowns Geoffrey Bilder UKSG, 2007"

Similar presentations


Ads by Google