Some questions -What is metadata? -Data about data
Some questions -How do we know it is metadata? -Intuition or marked as metadata
Some questions -How does a machine know that it reads metadata? -Marked as metadata, formalized in e.g. RDF(S) or OWL
Some questions -How can we extract metadata? -Manually -Known places in structured documents
Some questions -How ca we use metadata? -Annotate data -Finding relationships (later)
Some questions -How do we annotate data with metadata? -Manually (e.g. write XML tags) -Identify instances automatically, then machine annotates
Some questions -Problems with automatic identification -Disambiguation -Same name, different entities -Which “Christopher Thomas”? -Same entity, different role -“Christopher Thomas” can be an entity in the LSDIS ontology and also in the Friendster FOAF ontology. Not yet merged.
Taxonomies -What is a taxonomy? -From Greek ταξινομία from the words taxis = order and nomos = law -Hierarchical classification of things -Mathematically, a taxonomy is a tree structure of classifications for a given set of objects
Ontologies -What is an Ontology? -In computer science, an ontology is the attempt to formulate an exhaustive and rigorous conceptual schema within a given domain, a typically hierarchical data structure containing all the relevant entities and their relationships and rules (theorems, regulations) within that domainrelationships
Machine Learning -What is Machine Learning? -an area of artificial intelligence concerned with the development of techniques which allow computers to "learn"
–supervised learning --- where the algorithm generates a function that maps inputs to desired outputs. One standard formulation of the supervised learning task is the classification problem: the learner is required to learn (to approximate the behavior of) a function which maps a vector into one of several classes by looking at several input-output examples of the function. Machine Learning techniques
–unsupervised learning --- which models a set of inputs: labeled examples are not available. –reinforcement learning --- where the algorithm learns a policy of how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback that guides the learning algorithm. Machine Learning techniques
Classification –Supervised Learning –Reinforcement Learning –Artificial Neural Networks –Nearest Neighbor/Bayesian approaches Group entities around a point of reference
Machine Learning techniques Clustering –Unsupervised –Try to find functions that split a dataset in a meaningful way –Needs an evaluation function that tells what is meaningful and what is not.