Download presentation
Presentation is loading. Please wait.
Published byBrittney Bates Modified over 6 years ago
1
Learning Usage of English KWICly with WebLEAP/DSR
Takashi Yamanoue Kagoshima University, Japan Toshiro Minami Kyushu Institute of Information Sciences & Kyushu University, Japan Ian Ruxton Kyushu Institute of Technology, Japan Wataru Sakurai University of Tsukuba, Japan (Thank you, Mr. Chairman) I’m Takashi Yamanoue from Kagoshima University, Japan. I would like to talk about “Learning Usage of English KWICly with WebLEAP/DSR”.
2
Contents Motivation (Introduction)
WebLEAP/DSR: A New implementation of WebLEAP Examples and Experiments for Evaluation Related Work Concluding Remarks This talk consists of, the introduction, WebLEAP/DSR, a new implementation of WebLEAP, Examples and Experiments for evaluation, related work and concluding remarks.
3
I. Motivation Difficulties in writing in English Spelling
→ spell checker Grammar → grammar checker Usage → Corpus Linguistic Tools Is it really used? ? It is hard work to write something. It is even harder when it is in a second language. We often cannot judge the appropriateness of sentences. We already have spell checkers and grammar checkers. However, it could happen that an expression is correct grammatically, but no native speakers actually use it. A corpus and a concordance program helps us in such cases.
4
Problems of Ordinary Corpus Linguistic Tools
Time-consuming Needs hard work in order to make a good corpus Copyright problem (often) Outdated from the beginning Tools are mainly for experts Difficult to use for ordinary learners A corpus is a large number of sample sentences. Making a corpus is time-consuming. Hard work is needed in order to make a good corpus, and to solve copyright problems. The corpus is often outdated from the beginning. Concordancers, tools for using the corpus, are mainly for experts. Many of them are difficult to use for ordinary learners.
5
A Solution Use of “Web-Corpus” = Using Web Documents as a Corpus
Maintenance free: Exists as it is Always new, reflects current status of languages A lot of applications/services are available on the Internet In order to solve these problems, we use the web documents as a corpus. We call this kind of corpus a ‘Web-corpus’. The Web-corpus is maintenance free. It exists as it is. It is always new, and it reflects current status of languages. A lot of applications/services are available on the Internet.
6
WebLEAP Shows New Features
Frequencies of phrases in the given sentence graphically Using a search engine. New Features KWIC (Key Word In Context) Domain Specification WebLEAP is a program which shows frequencies of the phrases in the given sentence to the user graphically. We added new two feature to the WebLEAP. One is KWIC, Key Word In Context. Another is Domain specification.
7
WebLEAP This figure shows the inside of the WebLEAP.
The sentence which is given by the user is decomposed into phrases by this word sequence generator. These phrase are sent to a search engine. The search engine return the corresponding pages which include frequency of the phrase. The frequency is extracted by the document analyzer. These frequencies are shown to the user graphically by the user interface.
8
II. WebLEAP/DSR: A New implementation of WebLEAP
DSR: Distributed System Recorder A Computer Assisted Teaching System,… Recording and replaying every operation. Draw, Programming, Web, WebLEAP, … WebLEAP Basic: Frequencies Graphically New(using Google Web APIs): KWIC Domain Specification WebLEAP/DSR is a new implementation of the WebLEAP. DSR is a distributed system recorder. It can be used as a computer assisted teaching system and a benchmark test tool for distributed systems. It can record and replay of users’ operation of DSR’s application programs on a distributed system. WebLEAP/DSR is an application program of DSR. By using Google Web API, a web service of the google, It can show a KWIC table of a phrase. It can specify domain of the source sentences.
9
WebLEAP Window Iput, setting, control, …
This is the WebLEAP Window of the WebLEAP/DSR. It is used to input the sentence and settings, and control the outputs. Clicking this [eval] button after inputting a sentence in this field, The draw window will shown. WebLEAP Window Iput, setting, control, …
10
Draw window, frequencies
This is the draw window. This window shows the frequencies of phrases in the input sentence graphically. A number in the colored bar shows the frequency of the phrase over the bar. A pink bar shows a low frequency. A blue bar shows a high frequency. When the user clicks a bar, for example this bar, the KWIC window is shown. Draw window, frequencies
11
KWIC (Key Word In Context) window
This is the KWIC window. This window shows the KWIC table. In these fields, the keyword which corresponds to the clicked bar at the draw window is shown in bold letters. We can see how the keyword is used in the context. When the user clicks a URL field, for example this field, the web browser window is shown. KWIC (Key Word In Context) window
12
Web browser Window This is the web browser window.
The page in this window includes the keyword and shows the context like this. Web browser Window
13
Setting Window, for domain specification, …
This is the setting window. This page is shown when the user clicks the setting button in the Webleap window. We can select a search engine that is used in the evaluation together with setting search options of the search engine. In this figure, we have selected google as the search engine and are going to set the Search domain as a search option for google. Setting Window, for domain specification, …
14
III. Examples and Experiments for Evaluation
Estimating the Appropriate Preposition Comparing English: UK vs. US We have experimented with a variety of cases. Let's have a look at two of them. One is estimating the appropriate preposition. Another is comparing English in specific countries.
15
(1) Estimating the Appropriate Preposition
Estimating the preposition for “your own risk” Let’s think about which preposition is the most appropriate for “your own risk”. Is it by? With? At? This figure shows the frequencies of “by your own risk”, “with your own risk” and “at your own risk”. The frequencies are 41, 138 and It is easy to see that “at your own risk” is the most appropriate one. Let’s think about when we couldn’t have the “at” in our mind at first. This figure shows that frequencies of “by your own risk” or “with your own risk” is too small for the frequency of “your own risk”. Then click the frequency bar which corresponds to the “your own risk”.
16
Then this KWIC window is shown
Then this KWIC window is shown. This KWIC table shows how the “your own risk” is used in each context. In this table, “at” is used in the most cases. Then we can ask the frequency for “at your own risk” and we can confirm that “at your own risk” is the most appropriate expression.
17
Comparing English: UK vs. US
English sentence in a specific English dialect. Non native English speakers are sometimes confused when she or he is writing a sentence in a specific English dialect such as British English or American English. The WebLEAP/DSR has the ability to filter the Web corpus by a domain name in the page’s URL.
18
This figure shows WebLEAP outputs for comparing two sentences “living in a flat” and “living in an apartment “ in the UK domain and the US domain. This figure shows that “living in a flat” is used much more than “living in an apartment” in the UK domain, And “living in an apartment” is used much more than “living in a flat” in the US domain.
19
IV. Related Work Satoh’s system … webcorpus
SUIKO…detects wrong sentences Applications using Google Web APIs DSR: Distributed System Recorder A Benchmarking tool for distributed systems. A Computer Assisted Teaching system P2P, reliable multicast, … Satoh’s system is similar to our system in the sense that it also uses Web documents through a search engine. This system outputs the KWIC index of a keyword, whereas our system outputs not only KWIC but also a graphical representation of the frequencies of words or phrases. WebLEAP can also specify the domain of the web-corpus. SUIKO detects wrong sentences of Japanese, It doesn’t show if an expression is really used or not. There are other applications wich use the google web apis. Most of them provide only an interface of the search engine. DSR is a distributed system recorder. It can be used as a benchmarking tool and computer assisted teaching systems.
20
V. Concluding Remarks WebLEAP: A tool for helping with writing. KWIC
Popularities of expressions. Frequencies from a Search engine. KWIC How the expression is used. Filling the lacking word. Domain specification WebLEAP/DSR An application of DSR. WebLEAP is a tool for helping with writing, by showing the user popularities of expressions. This uses a search engine in order to get frequencies of the expressions. We added two new freatures to the WebLEAP. One is KWIC and another is domain specification. By using KWIC, the user can see how the given expression is used. The user can also fill the lacking word using KWIC. WebLEAP/DSR is an application of the DSR.
21
Further Research Topics
Precision Discrimination of Native speakers to non native speakers. Differences from region to region Collaborative Writing In the next step of this research, we would like to improve the precision of the WebLEAP and to make the WebLEAP to support collaborative writing. We’d like to discriminate the native speaker’s expressions to the non native speaker’s. We’d like to know the differences of sentences more precisely from region to region.
22
Acknowledgement Google.com
We thank to google.com for putting the google web apis in public and letting us use them.
23
Thank you for Listening!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.