Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extracting Recipes from Chemical Academic Papers

Similar presentations


Presentation on theme: "Extracting Recipes from Chemical Academic Papers"— Presentation transcript:

1 Extracting Recipes from Chemical Academic Papers
Lei Luo Today. I am going to give a presentation titled “Extracting Recipes from Chemical Academic Papers”. I will talk about what we have done as the domain analytics subteam on the LLNL project.

2 Extracting Recipes from Chemical Academic Papers
Chemicals Extraction Tools Results Comparison Future Work Recipes Extraction Sample Results The final goal is to extract recipes from papers. That includes extracting chemicals information, such as chemical names, synthesis parameters like temperature and ph value. It also includes extracting the complete recipes for some certain chemicals. For the chemicals extraction part, I will talk about what tools we have explored, and we results we have found, and future work. We have not got to do much work for extracting recipes, so I am only going to show some simple prototype results, and I will also talk about what I envision where and how far we can go.

3 Chemicals Extraction Tools Brat ChemTagger ChemDataExtractor
For extracting chemicals, we have explored three different tools. Namely, Brat, ChemTagger and ChemDataExtractor.

4 Chemicals Extraction Brat
Web-based tool for text annotation; that is, for adding notes to existing text documents. Needs to define three things: Top level annotation definition. Second level annotation definition. Original text file. Needs manual annotation. Brat is a Web-based tool for text annotation; that is, for adding notes to existing text documents. In order to work with Brat, we needs to have three things: Top level annotation definition, Second level annotation definition, and the Original text file. We have to provide all three of them.

5 Brat Top level annotation
This is what the top level annotation file looks like. In here, we need to define what the categories we will use to tag our target words. In this case, we have ORG, PER, LOC, MISC things.

6 Brat Second level annotation
The next thing we need to provide is the location of the words we would like to tag and what catalogues we would like to give. For example, We tag “De Morgen” as an organization and it location is 989 to 998 in the text.

7 Brat Original text file
The last thing we need is the original text file.

8 Brat Result Here is the Brat result. What it does is color-highlighting the words we manually tag. The goal of this project is to be able to automatically extract chemicals from texts. So if we need to manually pick them out. It is no use for us.

9 Chemicals Extraction ChemTagger
Phrase-based semantic NLP tool for parsing the language of chemical experiments. Takes a string as input and produces an XML document as output. Uses a combination of OSCAR4, domain-specific regex and English taggers to identify parts-of-speech. The next tool we have used is ChemTagger.

10 ChemTagger Web-based interface
It has two version. The web-based one and java source code for running it locally. This is the web-based version. In this textbox, we input the text we want to extract chemicals from, and then click “Process Text” button.

11 ChemTagger Web-based interface
The result is shown. It color-highlights all the information we might be interested. Such as Molecules, Temperature, and Quantities.

12 ChemTagger Local This is the code snippet that uses its java api.
It needs to input the a string text, and then it calls related api to produce the xml output.

13 ChemTagger Result – XML & Chemicals
Here is the xml output. We can see the root of the tree is document, then sentence, and some other tags. We can use the tag <CHEM> that stands for chemicals to extract all the chemical entities, along with their properties.

14 Chemicals Extraction ChemDataExtractor
Able to automatically extract chemical names, properties, and spectra from scientific papers. Uses machine learning, custom dictionaries, and rule-based parsing grammars. Able to resolve data interdependencies. Extracts data from tables. Another tool we have used is ChemDataExtractor.

15 ChemDataExtractor Web-based interface
It also has a web-base version and an api for Python. Here is one sample result.

16 ChemDataExtractor Local
We can also use it api locally to extract chemical information.

17 ChemTagger vs ChemDataExtractor
Example 1 The next a few slides I am going to show comparison of results from ChemTagger vs ChemDataExtractor. They are analyzing the same text file, and here are the results they give us. It seems like ChemTagger gives more chemicals, such as water. But it also picks up those non-chemical words, such as ADVANCE and pdf2.

18 ChemTagger vs ChemDataExtractor
Example 2 Here is another example. Again, ChamTagger seems to give more chemicals. But there are repetitive chemicals. Such as CZTSeLayer. There are possible non-chemicals, such as EQE.

19 ChemTagger vs ChemDataExtractor
Example 3 Here is another result. Again, ChemTagger gives us repetitive results and indetifies non-chemical names. Such as NY, CA, and Inc.

20 ChemTagger vs ChemDataExtractor
Example 4 Here is the last example. Because the text is cleaner than the previous ones. ChemTagger gives cleaner results.

21 ChemTagger vs ChemDataExtractor
Results ChemTagger identifies chemicals and the properties. ChemDataExtractor tags chemicals. ChemTagger gives repetitive chemicals. ChemTagger also tags non-chemicals. ChemDataExtractor seems to be able to handle unclean text better than ChemTagger.

22 Chemicals Extraction Near Future Work Clean the results and combine.
Chemical entities verification. Accuracy assessment. For near future work, I think we need to clean the result, such as repetitive ones and remove non-chemicals. Then, combine the results from the two tools. We also need to verify if words that get picked up are really chemicals. We can verify this against some chemical database, such as PubChem. To assess the tools’ performance, we need to do accuracy assessment. We manually annotated some text and compare with the results from them.

23 Recipes Extraction Sample Recipe
We have not looked into getting recipes much. Here is a simple example. This is the code snippet manipulating the xml output from ChemTagger. It extracts nodes whose tag is “ActionPhrase” from the parse tree. So for each step, there is an action which maybe like some chemical is added in another chemical.

24 Recipes Extraction Future Work More literature review.
From a large number of papers we can get many different recipes for the making the same chemical. For each paper we can extract chemicals and synthesis parameters. For future work. I think there are a couple of things we can do. First, there are not many papers on extracting recipes, so we need to try to find more papers to give us more ideas. Think about this, For a certain chemical, we can find many papers from which we can extract chemicals and their synthesis parameters

25 Recipes Extraction Future Work Build a database for chemicals.
Use data mining to see under which condition the chemical is more likely to be produced. use machine learning models by providing examples of synthesis parameters and synthesis outcomes. Then, make prediction.


Download ppt "Extracting Recipes from Chemical Academic Papers"

Similar presentations


Ads by Google