Download presentation
Presentation is loading. Please wait.
1
Citances and What should our UI look like? Marti Hearst SIMS, UC Berkeley http://biotext.berkeley.edu Supported by NSF DBI-0317510 and a gift from Genentech
2
Acquiring Labeled Data using Citances
3
A discovery is made … A paper is written …
4
That paper is cited … and cited … … as the evidence for some fact(s) F.
5
Each of these in turn are cited for some fact(s) … … until it is the case that all important facts in the field can be found in citation sentences alone!
6
Citances Nearly every statement in a bioscience journal article is backed up with a cite. It is quite common for papers to be cited 30-100 times. The text around the citation tends to state biological facts. (Call these citances.) Different citances will state the same facts in different ways … … so can we use these for creating models of language expressing semantic relations?
7
Using Citances Potential uses of citation sentences (citances) creation of training and testing data for semantic analysis, synonym set creation, database curation, document summarization, and information retrieval generally.
8
Issues for Processing Citances Text span Identification of the appropriate phrase, clause, or sentence that constructs a citance. Correct mapping of citations when shown as lists or groups (e.g., “[22-25]”). Grouping citances by topic Citances that cite the same document should be grouped by the facts they state. Normalizing or paraphrasing citances For IR, summarization, learning synonyms, relation extraction, question answering, and machine translation.
9
Related Work Traditional citation analysis dates back to the 1960’s (Garfield). Includes: Citation categorization, Context analysis, Citer motivation. Citation indexing systems, such as ISI’s SCI, and CiteSeer. Mercer and Di Marco (2004) propose to improve citation indexing using citation types. Bradshaw (2003) introduces Reference Directed Indexing (RDI), which indexes documents using the terms in the citances citing them.
10
Related Work (cont.) Teufel and Moens (2002) identify citances to improve summarization of the citing paper.. Nanba et. al. (2000) use citances as features for classifying papers into topics. Related field to citation indexing is the use of link structure and anchor text of Web pages. Applications include: IR, classification, Web crawlers, and summarization.
11
Citances: Some preliminary results: Citances to a document align well with a hand-built curation. Citances are good candidates for paraphrase creation.
12
Paraphrase Creation Algorithm 1. Extract the sentences that cite the target. 2. Mark the NEs of interest (genes/proteins, MeSH terms) and normalize. 3. Dependency parse (MiniPar). 4. For each parse For each pair of NEs of interest i. Extract the path between them. ii. Create a paraphrase from the path. 5. Rank the candidates for a given pair of NEs. 6. Select only the ones above a threshold. 7. Generalize.
13
Creating a Paraphrase Given the path from the dependency parse: Restore the original word order. Add words to improve grammaticality. Bim … shown … be … following nerve growth factor withdrawal. Bim [has] [been] shown [to] be [upregulated] following nerve growth factor withdrawal.
14
Sample Sentences NGF withdrawal from sympathetic neurons induces Bim, which then contributes to death. Nerve growth factor withdrawal induces the expression of Bim and mediates Bax dependent cytochrome c release and apoptosis. The proapoptotic Bcl-2 family member Bim is strongly induced in sympathetic neurons in response to NGF withdrawal. In neurons, the BH3 only Bcl2 member, Bim, and JNK are both implicated in apoptosis caused by nerve growth factor deprivation.
15
Their Paraphrases NGF withdrawal induces Bim. Nerve growth factor withdrawal induces the expression of Bim. Bim has been shown to be upregulated following nerve growth factor withdrawal. Bim implicated in apoptosis caused by nerve growth factor deprivation. They all paraphrase: Bim is induced after NGF withdrawal.
16
BioText User Interface Discussion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.