Download presentation
Presentation is loading. Please wait.
Published byCory Pitts Modified over 9 years ago
1
REACTION REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal
2
REACTION Grants (paid by Reaction) Sílvio Moreira (BI: Oct 1, 2010 – March 31, 2011 ) João Ramalho (BIC: Jan 1, 2011 – April 31, 2011)
3
REACTION Mining resources Development of robust linguistic resources to process different types and genres of texts knowledge resources about media personalities: recognizing and resolving references to named- entities; sentiment lexicons and grammars: detecting the polarity of opinions about media personalities annotated corpora: training different text classifiers and evaluating classification procedures
4
REACTION Mining resources POWER - Political Ontology for Web Entity Retrieval SentiLex-PT01 – Sentiment Lexicon for Portuguese SentiCorpus-PT09 – Sentiment annotated corpus of user comments to political debates
5
REACTION POWER POWER is an ontology that formalizes the domain knowledge defining a political landscape, i.e., the political actors and their roles in the political scene, their relationships and interactions. The ontology is foccused in describing: Politicians Political Institutions with different levels of authority (International, National, Regional,...) Political Associations Political Affiliations and Endorsements Elections Mandates
6
REACTION POWER Currently, the ontology describes: 587 Political actors 17 (editions) of Political Institutions 16 Political Associations 900 Mandates 1 Election 6 Candidate Lists from the Portuguese political scene
7
REACTION SentiLex-PT01 SentiLex-PT01 is a sentiment lexicon for Portuguese made up of 6,321 adjective lemmas, and 25,406 inflected forms. The sentiment entries correspond to human predicate adjectives The sentiment attributes described in SentiLex-PT01 concern: the predicate polarity, the target of sentiment, and the polarity assignment (which was performed manually or automatically, by JALC)
8
REACTION SentiLex-lem-PT01 8 6,321 lemmas abatido.PoS=Adj;TG=HUM;POL=-1;ANOT=MAN abelhudo.PoS=Adj;TG=HUM;POL=-1;ANOT=MAN abençoado. PoS=Adj;TG=HUM;POL=1;ANOT=JALC atrevido, PoS=Adj;TG=HUM;POL=0;ANOT=MAN bem-educado.PoS=Adj;TG=HUM;POL=1;ANOT=MAN brega.PoS=Adj;TG=HUM;POL=-1;ANOT=JALC violento, PoS=Adj;TG=HUM;POL=-1;ANOT=JALC Recently made publicly available on: http://xldb.fc.ul.pt/wiki/SentiLex-PT01http://xldb.fc.ul.pt/wiki/SentiLex-PT01
9
REACTION SentiLex-flex-PT01 9 25,406 inflected forms abatida,abatido.PoS=Adj;GN=fs;TG=HUM;POL=-1;ANOT=MAN abatidas,abatido.PoS=Adj;GN=fp;TG=HUM;POL=-1;ANOT=MAN abatido,abatido.PoS=Adj;GN=ms;TG=HUM;POL=-1;ANOT=MAN abatidos,abatido.PoS=Adj;GN=mp;TG=HUM;POL=-1;ANOT=MAN bem-educada,bem-educado.PoS=Adj;GN=fs;TG=HUM;POL=1;ANOT=MAN bem-educadas,bem-educado.PoS=Adj;GN=fp;TG=HUM;POL=1;ANOT=MAN bem-educado,bem-educado.PoS=Adj;GN=ms;TG=HUM;POL=1;ANOT=MAN bem-educados,bem-educado.PoS=Adj;GN=mp;TG=HUM;POL=1;ANOT=MAN brega,brega.PoS=Adj;GN=fs;TG=HUM;POL=-1;ANOT=JALC brega,brega.PoS=Adj;GN=ms;TG=HUM;POL=-1;ANOT=JALC bregas,brega.PoS=Adj;GN=mp;TG=HUM;POL=-1;ANOT=JALC bregas,brega.PoS=Adj;GN=fp;TG=HUM;POL=-1;ANOT=JALC Recently made publicly available on: http://xldb.fc.ul.pt/wiki/SentiLex-PT01http://xldb.fc.ul.pt/wiki/SentiLex-PT01
10
REACTION SentiCorpus-PT09 SentiCorpus-PT09 is a collection of comments posted by the readers of the Público newspaper to a series of 10 news articles, each covering a televised face-to-face debate between the main candidates to the 2009 parliamentary elections. The collection is composed by 2,795 comments (~8,000 sentences). 3,537 sentences, from 736 comments (27% of the corpus), were manually labeled with sentiment information. Sentiment annotation involves different relevant dimensions, such as polarity, opinion target, target mention and verbal irony.
11
REACTION SentiCorpus-PT09 The sentence is the minimum unit of analysis, but some annotations span a comment; Each sentence may convey different opinions; Each opinion may have different specific targets; The targets, which can be omitted in text, correspond to human entities; The entity mentions are classifiable into 7 syntactic-semantic categories; The opinionated sentences may be characterized according to their polarity and intensity (ranging from -2 to 2); Each opinionated sentence may have a literal or ironic interpretation.
12
REACTION
13
Main findings Real challenge in performing opinion mining in user- generated content is correctly identifying the positive opinions Positive opinions are less frequent than negative opinions (20%) Positive opinions particularly exposed to verbal irony (11%) Other opinion mining challenges are related to the entity recognition and co-reference resolution sub-tasks mentions to human targets are frequently made through pronouns, definite descriptions and nicknames. The most frequent type of mention is the person name, but it only covers 36% of the analyzed cases.
14
REACTION Next steps April 2011: POWER Populating the ontology, using text-mining approaches Internal release SentiLex-PT01 Exploring other methods and algoritms (SVM, Active Learning) for automatic polarity classification Enlarging the sentiment lexicon (verbs, predicate nouns, idiomatic expressions)
15
REACTION Next steps August 2011: POWER First release to the general public via SPARQL endpoint and web user interface SentiCorpus-PT09 Publically available Analysis and (semi-automated) annotation of a collection of documents from industrial and social media, over a period of 6 months
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.