Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zvika Marx, Extending Ontology ‘s.

Similar presentations


Presentation on theme: "Zvika Marx, Extending Ontology ‘s."— Presentation transcript:

1 Zvika Marx, zvika@taykey.com Extending Ontology ‘s

2 ‘ 42 ’ wins weekend Box Office Adam Scott wins The Master’s Last day for Tax Returns Dish Network bids for Sprint Trend detection identifies, in real time, topics and trends that are currently hot in the social network chatter. Hours    Occurrence volume   

3 Purpose of trend detection  Real-time advertising in social media – o Use hot, preferably new (& not yet expansive) search terms or terms reflecting interest  Around social media campaigns there are several interesting projects not reviewed in this talk: o Automated bidding o Automated campaign management o …

4 A trend is characterized by a particular term, or a group of terms, that has an occurrence frequency peak within some time window.  Terms = phrases (n-grams) given in a pre-defined table.  Additional purposes of terms: o Network of related terms to add to the ones directly identified (co-occurrence based) o Segmentation – different populations are interested in different terms Taykey term-system Tax Returns Adam Scott Battle Droid The Empire Strikes Back Naboo AT-AT walker George Darth Vader Coruscant Star Wars

5 New term discovery  Main idea: similarity to existing terms o Accept (or flood as candidate) if average similarity among 10 top similarities passes some threshold  + Rules -- exclude patterns such as: o [ ] “1.2 ghz” o [in ] “in london”

6 Features: context phrases Term candidate : From the statistics of repeating patterns in text (number of occurrences should pass several thresholds...) extract feature vector, representing the candidate 

7 Features: processing  Features (as shown) are two-word and three- word combinations, of enough occurrences with enough existing (=training) terms  “semi-stopword” list: o exclude features made solely of the list’s words  Cleaning – o most non-alphanumeric characters deleted o numbers/digits replaced with a symbol

8 New-term discovery setting  Feature vectors are extracted for o every training term (few month sample) o every n-gram in recent (few days) data  TF-IDF weights o where “document” = a training term feature- vector  Cosine similarity (reflects high proportion of shared features)

9 New-term discovery example New-term candidate: Similar terms: Common features: I’m watching Witches of East End “02 x 10 the fall of the house of Beauchamp” = “marked as seen” in Portuguese season number X episode number

10 Taykey term classification *classes* ~ entity type:  'event', 'city', ‘other location‘, 'art-piece', … *subjects* ~ domain (multi-tag possible):  'fashion', 'consumer electronics', 'science‘, … Classification helps in population segmentation & campaign match

11 Term auto-classification  Multi-class classification to “classes” and to “subjects”  Maximum likelihood o feature vectors as in term discovery  Applies to: o discovered new term candidates o terms not categorized previously o existing-vs-calculated conflicts, for example – in terms that change their meaning over time requires manual examination

12 Term auto-classification results  Classes: ~80%; Subjects: ~85% accuracy (vs. ~92% if tested on already classified terms)  It looks as if some of those terms were left unclassified for a reason… one example : o A sasaeng fan is an excessively obsessed fan of the Hallyu wave (= South Korean pop culture, which became increasingly popular since late 90s). It was miss-classified as ‘sports’ (rather than ‘lifestyle’). The classifier is mislead by features that include the sport-related word ‘fan’ :

13 Fine-grain classification  We started exploring sets of terms that characterize more specific population segments o instead of “sports” --- “football”, “baseball”, …  Ongoing experiment: seed Taykey sports terms with “baketball” vs. “other sports” tags o Seed taken from freebase.com  Information gain feature selection top “basketball” features:

14 Thank You, Zvika Marx, zvika@taykey.com


Download ppt "Zvika Marx, Extending Ontology ‘s."

Similar presentations


Ads by Google