EmojiNet: A Machine Readable Emoji Sense Inventory

Slides:



Advertisements
Similar presentations
Suleyman Cetintas 1, Monica Rogati 2, Luo Si 1, Yi Fang 1 Identifying Similar People in Professional Social Networks with Discriminative Probabilistic.
Advertisements

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
MICROSOFT OFFICE ACCESS 2007.
Intro Help! I have a Part-of- Speech Test and I need to study!!! Next Slide 
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Linked Sensor Data Harshal Patni, Cory Henson, Amit P. Sheth Ohio Center of Excellence in Knowledge enabled Computing (Kno.e.sis) Wright State University,
Analyzing the Social Media Footprint of Street Gangs 1 Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)Kno.e.sis 2 Center for Urban.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
The Visual Knowledge Builder: A Second Generation Spatial Hypertext Frank M. Shipman III Haowei Hsieh Preetam Maloor J. Michael Moore.
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
Classifying Business Messages on Facebook Bei Yu 1 and Linchi Kwok 2 School of Information Studies 1 David B. Falk College of Sport and Human Dynamics.
Sequencing Miss Regan. Blood Hound  Does anyone know what the Bloodhound project is?  Video 1 Video 1  Video 2 Video 2  Link to website Link to website.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data † Kno.e.sis Center Wright State University Dayton OH, USA.
Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Multiple Ontologies in.
Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection Boanerges Aleman-Meza, Meenakshi Nagarajan,
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.
Krishnaprasad Thirunarayan, Pramod Anantharam, Cory A. Henson, and Amit P. Sheth Kno.e.sis Center, Ohio Center of Excellence on Knowledge-enabled Computing,
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Mini-Project on Web Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Objectives of Control The objectives of control are:  To ensure that all data are processed  To preserve the integrity of maintained data  To detect,
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Word Sense Disambiguation Reading: Chap 16-17, Jurafsky & Martin Instructor: Rada Mihalcea.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Introduction to CL & NLP CMSC April 1, 2003.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Unsupervised Word Sense Disambiguation REU, Summer, 2009.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Usable Security – CS 6204 – Fall, 2009 – Dennis Kafura – Virginia Tech Policy Authoring Matthew Dunlop Usable Security – CS 6204 – Fall, 2009 – Dennis.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Introduction to Machine Learning Dmitriy Dligach.
LOGOS And Creating Your Own Company. Introduction You are about to open up your own business and need to come up with your own company logo because you.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.
Teacher’s View Spanish 2 students will be exploring real world Argentinian food, both commercialized (Subway) and authentic restaurants in Buenos Aires.
Word Embeddings to Enhance Twitter Gang Member Profile Identification Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)Kno.e.sis Wright.
Finding Street Gang Members on Twitter
EmojiNet: An Open Service and API for Emoji Sense Discovery
Coarse-grained Word Sense Disambiguation
Signals Revealing Street Gang Members on Twitter
EmojiNet: Building a Machine Readable Sense Inventory for Emoji
Market Intelligence Analysis
An Efficient Bit Vector Approach to Semantics-based
over Machine and Citizen Sensing
Bag-of-Visual-Words Based Feature Extraction
Bitcoin Created By: CoinSecure.in.
A Semantics-Based Measure of
Introduction to Azure Machine Learning Studio
Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph
Object Recognition Today we will move on to… April 12, 2018
A method for WSD on Unrestricted Text
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
Converter for Azure and SharePoint Converts s into SharePoint list items 24/7 Creates SharePoint list items from s
Converter for IIS and SharePoint Converts s into SharePoint list items 24/7 Creates SharePoint list items from s
Text Mining Application Programming Chapter 9 Text Categorization
Presentation transcript:

EmojiNet: A Machine Readable Emoji Sense Inventory Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University, Dayton OH, USA {sanjaya, lakshika, amit, derek}@knoesis.org Emoji symbols take on different meanings based on the context of a message as no rigid semantics are attached to emoji by design. Hence, for machines to understand emoji, a machine readable sense inventory that lists different emoji senses is needed. EmojiNet is a machine readable sense inventory which enables this. Motivation Real World Examples Laugh(N) Can’t stop laughing Anger(N) Angry as hell Costly(A) Can't buy class la Happy(N) Got all A’s but 1 Shot(N) Oooooooh shots fired! Work Hard(N) Up early on the grind Funny(A) That was damn hilarious! Kill(V) He tried to kill one of my brothers Money(N) Earn money when one register /w ur link Approach Table 1 – Emoji Data Available in Open Resources Step 1 – Extract Data from Open Web Resources and Integrate them based on Unicode Character and Image Representations Step 2 – Filter Senses and Part-of-Speech for each Emoji Sense to Populate EmojiNet Step 3 – Perform Word Sense Disambiguation to Assign Meanings to Emoji Senses For each emoji ei in EmojiNet, it records the octuple ei = (ui ci di Ki Ii Ri Hi Si), where ui is the Unicode of ei, ci is the short code name of ei, di is the description of ei, Ki is the set of keywords of ei, Ii is the set of images of ei, Ri is the set of related emoji of ei, Hi is the set of categories of ei, Si is the set of emoji senses of ei. Evaluation The Emoji Dictionary was merged with other open resources by matching the images in The Emoji Dictionary with images found in other resources. A nearest neighborhood-based image processing algorithm was used for this task and its accuracy was 98.42%. Two Word Sense Disambiguation (WSD) algorithms based on the Most Frequent Sense (MFS) and the Most Popular Sense (MPS) were used to assign meaning to emoji senses extracted from The Emoji Dictionary. Their combined WSD accuracy was 85.18%. Table 2 –Word Sense Disambiguation Statistics Table 3 – EmojiNet Statistics Emoji Resource u c d K I R H S Unicode Website   Emojipedia iEmoji Emoji Dictionary Correct Incorrect Total Noun 1,271 (83.28%) 255 (16.71%) 1,526 Verb 735 (84.00%) 140 (16.00%) 875 Adjective 725 (90.06%) 80 (9.93%) 805 2,731 (85.1 8%) 475 (14.81%) 3,206 Emoji Resource u c d K I R H S # of Emoji with each Feature 1,074 845 1,002 705 875 # of Records Stored for each Feature 8,069 28,370 9,743 8 3,206 EmojiNet at Work Let’s disambiguate the sense of the emoji in the tweets T1 and T2, where T1:Pray for my family God gained an angel today and T2: Hard to win, but we did it man Lets celebrate. EmojiNet lists two senses for and they are pray(V) and highfive(N). We extract words from sense definitions available in EmojiNet for the above two senses. They are: highfive(N) – {palm, high, hand, slide, celebrate, raise, person, head, five} and pray(V) – {worship, thanksgiving, saint, pray, higher, god, confession}. We then calculate the overlap of the words in tweets with the words extracted from the sense definitions in EmojiNet. This leads us to decide that the emoji in T1 refers to pray(V) and the emoji in T2 refers to highfive(N). Future Work Expand EmojiNet sense definitions with words extracted from tweets, using a word embeddings model trained on tweets with emoji. Evaluate the usability of EmojiNet using Emoji Sense Disambiguation and Emoji Similarity Finding tasks and expose EmojiNet as a web service. EmojiNet demo is available at http://emojinet.knoesis.org/ Reference – Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. EmojiNet: Building a Machine Readable Sense Inventory for Emoji. In 8th International Conference on Social Informatics (SocInfo 2016). Bellevue, WA, USA. Acknowledgement – We acknowledge partial support from NIDA Grant No. 5R01DA039454-02: “Trending: Social Media Analysis to Monitor Cannabis and Synthetic Cannabinoid Use”, NIH Award: MH105384-01A1: “Modeling Social Behavior for Healthcare Utilization in Depression”, and Grant No. 2014-PS-PSN-00006 awarded by the Bureau of Justice Assistance. Link to Paper Link to Demo