Download presentation
Presentation is loading. Please wait.
1
An Introduction to Triple Scoring (WSDM Cup T2)
Meng Jiang TA: Huan Gui Slides: OR you can visit and find the slides and a complementary dataset (Freebase feature).
2
Task Definition (Data)
Profession Nationality Train (.train) 515 pairs Barack Obama, Politician, 7 Barack Obama, Law professor, 1 Lady Gaga, Singer-songwriter, 7 Lady Gaga, Fashion Designer, 3 162 pairs George Santayana, United States of America, 6 George Santayana, Spain, 5 Leslie Cheung, Hong Kong, 7 Leslie Cheung, Canada, 3 Candidates (.list) 200 Baseball Manager/player Fashion Designer/Model Film (Art) Director 100 China/France/Spain (Chinese? French? Spanish?) United States of America (USA? US? U.S.? America?) England/Scotland/Wales/Northern Ireland/United Kingdom (UK?) .all 499,244 pairs 318,779 pairs Wiki 385,426 persons 33,159,354 sentences: (every person >= 3) Freebase Available for all Person (Entity) – Ontology ID – Type
3
How to Evaluate Your Software?
Learning: for every person/value, find K-length vector v(.). Testing: similarity(v(person), v(value)) in [0,1]: score(0-7)/7 Ev. 1: Cross validation. Ev. 2: Human judgment. Profession Nationality .train Training 80%: 412 pairs Testing 20%: 103 pairs Acc. ± Std. (10 times) Training 80%: 130 pairs Testing 20%: 32 pairs Profession Nationality .train Training 100%: 515 pairs Training 100%: 162 pairs .all Testing: 499,244 pairs Acc. (100 pairs by TAs) Testing: 318,779 pairs
4
What You Have for Learning
Weighted small sparse matrix (.train): Person by Value, 0-1 Binary large sparse matrix (.all): Person by Value, 0/1 Methods: Non-negative matrix factorization, Support Vector Decomposition, Network embedding, etc. Optimization function? K Person (few) Person (few) Value (few) Value (few) K K Person (full) Person (full) Value (full) Value (full) K
5
S1: Learning Freebase Features
Freebase feature dataset: Barack Obama m.02mjmr award.award_winner people.measured_person organization.organization_member award.ranked_item government.u_s_congressperson base.type_ontology.non_agent film.person_or_entity_appearing_in_film base.type_ontology.physically_instantiable tv.tv_personality base.liveuspoliticians.topic people.appointer user.robert.default_domain.my_favorite_things government.political_appointer music.featured_artist user.neothilic.default_domain.funny_guy symbols.name_source base.type_ontology.agent education.honorary_degree_recipient base.crime.topic base.nobelprizes.nobel_prize_winner medicine.diagnostic_test base.samplepro.topic base.type_ontology.abstract people.person base.politicalconventions.topic base.qualia.topic internet.blogger base.famouspets.pet_owner base.academia.topic user.robert.x2008_presidential_election.candidate government.polled_entity business.board_member common.topic base.poldb.topic base.type_ontology.animate base.schemastaging.person_extra base.todolists.topic award.award_nominee business.employer internet.social_network_user user.narphorium.people.topic biology.animal_owner government.us_president broadcast.producer base.ovguide.topic fictional_universe.person_in_fiction base.x2011internationalyearforpeopleofafricandescent.topic base.creativemindsatwork.topic base.cannapedia.topic influence.influence_node organization.organization_founder base.politicalconventions.convention_speaker base.coinsdaily.design base.duiattorneys.topic book.author base.firsts.first fictional_universe.fictional_character base.crime.lawyer base.qualia.recreational_drug_user tv.tv_program_guest base.politicalconventions.primary_candidate base.nobelprizes.topic user.robert.us_congress.topic base.mybase4.topic base.popstra.organization base.popstra.sww_base base.litcentral.named_person base.litcentral.topic government.politician user.robert.default_domain.presidential_candidate user.robert.x2008_presidential_election.topic user.colin.default_domain.twitter_topic base.propositions.proposition_issue base.schemastaging.government_position_held_extra base.firsts.topic base.famouspets.topic book.book_subject user.loveyou2madly.default_domain.famous_author visual_art.art_subject base.inaugurations.topic event.public_speaker base.endorsements.endorsee base.saturdaynightlive.topic music.composer people.family_member base.blackhistorymonth.topic book.poem_character music.artist base.schemastaging.context_name royalty.chivalric_order_officer base.popstra.topic celebrities.celebrity base.schemastaging.topic base.popstra.celebrity base.tagit.concept base.saturdaynightlive.person_impersonated_on_snl film.film_subject architecture.building_occupant base.popstra.company user.jamie.sunlight.legislator music.group_member base.inaugurations.inauguration_speaker base.propositions.topic base.politicalconventions.presidential_nominee user.narphorium.people.nndb_person media_common.quotation_subject
6
S1: Learning Freebase Features
Binary large sparse matrix: Person by Freebase-feature, 0-1 How to integrate the three matrices? K Person (full) Person (full) Feature (many) K Feature (many) K Person (few) Person (full) Person (full) Person (full) Value (full) Value (few) Value (full) K Feature (many)
7
S2. Learning with Text Truth: (Adeyto, France, 7), (Adeyto, Germany, 1) Sentences: [Adeyto] ( France ) . [Adeyto] ( born 1976 ) , French singer-songwriter , actress and director . [Adeyto] was born in Strasbourg , France , to a German father and French mother . word2vec [Mikolov et al. NIPS’13] : “Distributed Representations of Words and Phrases and their Compositionality” Skip-gram model architecture: input, projection, output Hierarchical Softmax Negative sampling Subsampling of frequent words
8
Pipeline Enrich the type vocabulary (manually attaching freq. words)
Linking/unifying value candidates in the text [Adeyto] ( [France] ) . [Adeyto] ( born 1976 ) , [French] [singer-songwriter] , [actress] and [director] . [France] [Singer-songwriter] , [Actress] and [Film Director] [Adeyto] was born in Strasbourg , [France] , to a [German] father and [French] mother . Embedding (word2vec, etc.) for K-length vectors Profession Nationality Candidates (.list) 200 Baseball Manager/player Fashion Designer/Model Film (Art) Director 100 China/France/Spain (Chinese? French? Spanish?) United States of America (USA? US? U.S.? America?) England/Scotland/Wales/Northern Ireland/United Kingdom (UK?)
9
Type-Aware Factorization/Embedding
Typed sentences [$Person:Adeyto] ( [$Nation:France] ) . [$Person:Adeyto] ( born 1976 ) , [$Nation:France] [$Profession:Singer-songwriter] , [$Profession:Actress] and [$Profession:Film Director] . [$Person:Adeyto] was born in Strasbourg , [$Nation:France] , to a [$Nation:German] father and [$Nation:French] mother . Factorization: rich non-negative (co-exist in sentences) Person (amlost full) Person (amlost full) Profession (almost full) Nation (almost full)
10
S3. Learning with Meta Patterns
Typed sentences [$Person:Adeyto] ( [$Nation:France] ) . [$Person:Adeyto] ( born 1976 ) , [$Nation:France] [$Profession:Singer-songwriter] , [$Profession:Actress] and [$Profession:Film Director] . [$Person:Adeyto] was born in Strasbourg , [$Nation:France] , to a [$Nation:German] father and [$Nation:French] mother . Segmentation: Meta Patterns for Precise Value [$Person] ( [$Nation] ) : 100% accuracy? [$Person] ( born [$Year] ) , [$Nation] : 100% accuracy? [$Nation] [$Profession] : 100% accuracy? [$Nation] [$Profession] , [$Profession] and [$Profession] : (find context [$Person], 100% accuracy?
11
Summary Learning how to do data-mining experiments/projects (cross-validation, evaluation, etc.) Machine learning for low-dimensional representation Selecting the best similarity/distance measure S1: Learning vectors with Freebase features S2: Learning vectors with Text (embedding, etc.) S3: Learning precise values with Meta Pattern (segmentation) Finally, how to merge results of the above to achieve the best performance?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.