Download presentation
Presentation is loading. Please wait.
Published byAmaya Nile Modified over 10 years ago
1
Tricks for Statistical Semantic Knowledge Discovery: A Selectionally Restricted Sample Marti A. Hearst UC Berkeley
2
Acquire Semantic Information Goal:
3
► Something on Finin
4
Tricks I Like Lots o’ Text Unambiguous Cues Rewrite and Verify
5
Trick: Lots o’ Text ► Idea: words in the same syntactic context are semantically related. Hindle, ACL’90, “Noun classification from predicate-argument structure.”
6
Trick: Lots o’ Text ► Idea: words in the same syntactic context are semantically related. Nakov & Hearst, ACL/HLT’08 “Solving Relational Similarity Problems Using the Web as a Corpus”
7
Trick: Lots o’ Text ► Idea: bigger is better than smarter! Banko & Brill ACL’01: “Scaling to Very, Very Large Corpora for Natural Language Disambiguation”
8
Trick: Lots o’ Text ► Idea: apply web-scale n-grams to every problem imaginable. Lapata & Keller, HLT/NACCL ‘04: “Web as a Baseline: Evaluating the Performance of Unsupervised Web-Based Models for a Range of NLP Tasks” MT candidate selection Article suggestion Noun compound interpretation Noun compound bracketing Adjective ordering > supervised = supervised
9
Limitation ► Sometimes counts alone are too ambiguous. Solution ► Bootstrap from unambiguous contexts.
10
Trick: Use Unambiguous Context ► … to build statistics for ambiguous contexts. Hindle & Rooth, ACL ’91“Structural Ambiguity and Lexical Relations” Example: PP attachment I eat spaghetti with sauce. Bootstrap from unambiguous contexts: Spaghetti with sauce is delicious. I eat with a fork.
11
Trick: Use Unambiguous Context ► … to identify semantic relations (lexico- syntactic contexts) Hearst, COLING ’92, “ Automatic Acquisition of Hyponyms from Large Text Corpora” Example: Hyponym Identification
12
Combine Tricks 1 and 2
13
Trick: Use Unambiguous Contexts + Lot’s O’ Text ► Combine lexico-syntactic patterns with occurrence counts. Kozareva, Riloff, Hovy, HLT-ACL’08. “Semantic Class learning form the Web with Hyponym Pattern Linkage Graphs”.
14
Trick: Use Unambiguous Contexts + Lot’s O’ Text ► Combine (usually) unambiguous surface patterns with occurrence counts. Nakov & Hearst, HLT/EMNLP’05 “Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution”. Left dash cell-cycle analysis left Possessive marker brain’s stem cell right Parentheses growth factor (beta) left Punctuation heath care, provider left Abbreviation tum. necr.(TN) factor right Concatenation heathcare reform left
15
Trick: Use Unambiguous Contexts + Lot’s O’ Text ► Identify a “protagonist” in each text to learn narrative structure Chambers & Jurafsky, ACL’08 “Unsupervised Learning of Narrative Event Chains”.
16
Trick 3: Rewrite & Verify
17
Trick: Rewrite & Verify ► Check if alternatives exist in text Nakov & Hearst, HLT/EMNLP’05 “Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution”. Example: NP bracketing Prepositional ► stem cells in the brain right ► stem cells from the brain right ► cells from the brain stem left Verbal ► virus causing human immunodeficiency left ► pain associated with arthritis migraine left Copula ► office building that is a skyscraper right
18
Trick: Use Lexical Hierarchies ► To improve generation of pseudo-words for WSD Nakov & Hearst, HLT/NAACL’03, “Category-based Pseudo-Words” ► To classify nouns in noun compounds and thus determine the semantic relations between them Rosario, Hearst, & Fillmore, ACL’02, “Descent of Hierarchy and Selection in Relational Semantics” ► To generate new (faceted) category systems Stoica, Hearst, & Richardson, NAACL/HLT’07. “Automating Creation of Hierarchical Faceted Metadata Structures”
19
Example: Recipes (3500 docs)
20
Castanet Output (shown in Flamenco)
21
Castanet Output
23
Towards New Approaches to Semantic Analysis
24
Ideas ► Inducing Semantic Grammars Boggess, Agarwal, & Davis, AAAI’91, “Disambiguation of Prepositional Phrases in Automatically Labelled Technical Text”
25
Ideas ► Use Cognitive Linguistics Hearst, ’90,’92, “Direction-Based Text Interpretation”. Talmy’s Force Dynamics + Reddy’s Conduit Metaphor Path Model Solves: Was the person in favor of or opposed to the idea:
26
Using Cognitive Linguistics ► Talmy’s Theory of Force Dynamics Talmy, “Force Dynamics in Language and Thought,” in Talmy, “Force Dynamics in Language and Thought,” in Parasession on Causatives and Agentivity, Chicago Linguistic Society 1985. Describes how the interaction of agents with respect to force is lexically and grammatically expressed. Posits two opposing entities: Agonist and Antagonist. Each entity expresses an intrinsic force: towards rest or motion. The balance of the strengths of the entities determines the outcome of the event. ► Grammatical expression includes using a claused headed by “despite” to express a weaker antagonist.
27
Using Cognitive Linguistics ► Reddy’s Conduit Metaphor Reddy, “The Conduit Metaphor – A Case of Frame Conflict in Our Language about Language,” in Reddy, “The Conduit Metaphor – A Case of Frame Conflict in Our Language about Language,” in Metaphor and Thought, Ortony (Ed), Cambridge University Press, 1979. A thought is schematized as an object which is placed by the speaker into a container that is sent along a conduit. The receiver at the other end is the listener, who removes the objectified thought from the container and thus possesses it. Inferences that apply to conduits can be applied to communication. ► “Your meaning did not come through.” ► “I can’t put this thought into words.” ► “She is sending you some kind of message with that remark.”
28
Using Cognitive Linguistics ► Combine into the Path Model Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Text- based Intelligent Systems, Jacobs (Ed), Lawrence Erlbaum Associates, 1992. If an agent favors an entity or event, that agent can be said to desire the existence or “well-being” of that entity, and vice-versa. Thus if an agent favors an entity’s triumph in a force-dynamic interaction, then the agent favors that entity or event. But: force dynamics does not have the expressive power for a sequence. Instead of focusing on the relative strength of two interacting entities, the model should represent what happens to a single entity through the course of its encounters with other entities. Thus the entity can be schematized as if it were moving along a path toward some destination or goal.
29
Using Cognitive Linguistics ► The Path Model Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Text- based Intelligent Systems, Jacobs (Ed), Lawrence Erlbaum Associates, 1992.
30
Using Cognitive Linguistics ► The Path Model Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Text- based Intelligent Systems, Jacobs (Ed), Lawrence Erlbaum Associates, 1992.
31
Using Cognitive Linguistics ► The Path Model Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Hearst, “Direction-based Text Interpretation as an Information Access Refinement,” in Text- based Intelligent Systems, Jacobs (Ed), Lawrence Erlbaum Associates, 1992.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.