Download presentation
Presentation is loading. Please wait.
Published bySara Holmes Modified over 9 years ago
1
Towards a Language-Independent Universal Digital Library Sameh Alansary Magdy Nagi Noha Adly sameh.alansary@bibalex.org magdy.nagi@bibalex.org noha.adly@bibalex.orgsameh.alansary@bibalex.orgmagdy.nagi@bibalex.orgnoha.adly@bibalex.org Bibliotheca Alexandrina The Second International Conference on Universal Digital Libraries (ICUDL 2006) 17-19-2006 November, Alexandria, Egypt
2
Introduction IT made the full text libraries ’ assets available digitally (Independent of time, place and copy). UDL Digitization only does not lead to “universality” in its optimum sense. A new dimension of universality should be added: Independency of Language - Nasser Digital Library. e.g. - Million Book Project.
3
Language-dependency blocks information dissemination Language dependency holds language barriers. If it is always possible for everyone to read in everyone’s mother tongue, this will help in: 80% of books and e-materials is written in English and 20% is written in other languages. - Dissemination of knowledge. - Preservation of nationality and identity. - Preventing cultural hegemony.
4
Approaches: 1- Direct translation approach. 2- Transfer approach. 3- Interlingual approach. Translation systems have been introduced (NLP): Attempts to break language barriers Examples of Systems: - Google translation: http://www.google.ch/language_tools http://www.google.ch/language_tools - Fujitsu systems: http://www.fujitsu.com/global/services/translation http://www.fujitsu.com/global/services/translation
5
Drawback of MT systems 1- The quality of results is often inadequate. 2- Work for a limited number of language combinations. 3- Hold an overload on the network: To translate from and to only 10 languages, 10 grammars, 10 lexicons, 90 translation dictionaries and 90 sets of translation rules will be needed, plus the need for semantic processing in each language.
6
Towards a universal system for knowledge representation
7
How can we represent natural language materials in a language independent format? (a format required) What is the system suitable for representing knowledge in the format selected? (a system required) How is this system going to work? Some questions may bear in mind:
8
1- The content of the original material (meaning) must not be lost. 2- This universal format should be understandable by various platforms over the network. 3- This universal format should be decodable to any natural language. Requirements for a universal representation of knowledge:
9
UNL System
10
The Universal Networking Language (UNL) is an artificial language for computers to express information and knowledge that can be expressed in natural language. What is UNL? (1) Started in 1996, as an initiative of the UNU/IAS in Japan R&D in UNL - Development on 15 languages: Arabic, Chinese, English, French, German, Hindi, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Thai, Swahili. - Transferred to the UNDL Foundation in 2001.
11
What is UNL? (2) It expresses information or knowledge of natural language (NL) in the form of semantic network with hyper-node. The boy who works here went to school {UNL} agt(go(icl>move).@entry.@past, :01) plt(go(icl>occur).@entry.@past, school(icl>institution)) agt:01(work(icl>do), boy(icl>person.@entry)) plc:01(work(icl>do),here) {/UNL} UNL expression: Example:
12
The boy who works here went to school plt agt school(icl>institution) go(icl>move) @ entry @ past boy(icl>person) @ entry work(icl>do ) here agt plc :01 UNL-hyper graph
13
The UNL System System Formalism Components Knowledge representation
14
The UNL-system components UNL LANGUAGE SERVER Enconverter = Deconverter ( EnCO) (EnCO) Language Server UNL Hindi Internet UNL Proxy Language Server UNL Japanese Language Server UNL Chinese DeCO UNL document Language Server UNL Arabic Language Server UNL Spanish USER EnCO DeCOEnCO Language Server UNL English DeCOEnCO DeCOEnCO UNL Editor 1 2 3 UNL Viewer
15
UNL Language Server Web Server with UNL document UNL NL A) Language servers: Natural Language UNL EnConverter DeConverter UNL-language Dictionary Knowledge Base Concurrence Dictionary Generation Rules Analysis Rules
16
B) UNL Tools: 1- UNL viewer. 2- UNL editor. 3- UNL verifier. C) UNL Proxy Server: Searches for UNL at the web, send it to the language server and displays it on the user’s chosen language. Searches for UNL at the web, send it to the language server and displays it on the user’s chosen language.
17
Natural Language texts Annotation Editor Annotated Natural Language texts Universal Parser UNL Verifier UNL KB Web server HTML+XML UW Dictionary Grammatical Rules Word Dictionary Co- Occurrence Dictionary EnConverter UNL Document UNL Document DeConverter Natural Language texts Mechanism of conversion between NL and UNL
18
UNL as a formal language: How does it represent knowledge? 1- Universal words (UW): to represent concepts. Example: boy(icl>person) hear(icl>perceive(agt>person,obj>thing)) 2- relations: 38 semantic relations can be distinguished. Example: agt, aoj, bas, con, coo, dur, … etc. 3- Attributes: to express subjectivity of the speaker. Example: @past, @emphasis, @def, @not, … etc.
19
4- Knowledge base (UNLKB). Define the Universal Word. Provide linguistic knowledge of concepts
20
Ibrahim Shihata UNL Arabic Center (ISUAC) It is established at Bibliotheca Alexandrina. It is responsible for designing, implementing, and maintaining the various components of the Arabic language server. The Arabic language server will be capable of: - Enconverting the Arabic texts to the universal format. - Deconverting the universal materials produce by other language centers to Arabic.
21
The Achievements of the ISUAC A) Arabic language resources and tools. B) Developing tools. C) Arabic language-based universal materials.
22
1- The Arabic Dictionary: It is a repository of information for all UNL Arabic grammars. A) Arabic language resources and tools: Dictionary Universal words (Vocabulary of UNL) Head Words (Vocabulary of Arabic) Linguistics Features (Linguistic info about HWs)
23
2- Arabic EnConversion Rules: Arabic EnConversion Rules are able to: 1- Perform morphological analysis to extract concepts the Arabic words refer to. 2- Assign exact semantic relation between concepts as being expressed in the context of the Arabic sentence. It is responsible for Enconverting Arabic to UNL.
24
Simulation of how Enconverter works ولد جمال عبد الناصر في 15 يناير 1918 في 18 شارع قنوات حي باكوس بالإسكندرية. ولد / /جمال عبد الناصر/ /في/ /15/يناير/ /1918/ /في/ /18/ /شارع/ /قنوات/ /حي/ /باكوس/ /بال/إسكندرية/. delete plc tim obj tim mod plc mod plc
25
UNL Network:
26
3- Arabic DeConversion Rules: It is responsible for generating Arabic sentences out of UNL networks. Arabic DeConversion Rules are able to: 1- Select Arabic words that represent universal concepts. 2- Arrange the concepts of the UNL network in a syntactically well-formed sentence.
27
Simulation of how the Deconverter works outcome(icl>resul). @entry description(icl> action) Egypt collaboration(icl>action) scientist(icl>scho lar).@entrry scholar(icl>person) More (aoj>thing) prominent(aoj>thing) بونابرت صاحب مرموق عالم باحث 150 أكثر 1798 مصر تعاون محصل وصف accompany(agt>thi ng,obj>thing) 150 1798 Bonaparte(iof>person) Egypt ة من و الذين وا obj aoj mod agt and obj bas tim gol agt aoj مصر وصف مصر محصلة تعاون أكثر من 150 باحث و عالم مرموق الذين صاحبوا بونابرت في 1789 إلى مصر فيإلى
28
4- A Corpus for Modern Standard Arabic: A representative sample (100 Millions) that reflects the empirical usage of Modern Standard Arabic. It plays a principle role in enhancing and updating both EnConversion and DeConversion rules.
29
B) Developing tools: 1- Integrated Development Environment (IDE)
31
2- Corpus analysis software (GATE)
32
C) Arabic language-based universal materials. Library of Alexandria: the Fourth Pyramid. Abou Simple: The Temple of the Sun. Nasser Digital Library The Encyclopaedia of Famous Persons
33
An example of an Arabic Sentence in UNL (universal) format
35
وكان جمال عبد الناصر الابن الأكبر لعبد الناصر حسين الذي ولد في عام 1888 في قرية بني مر في صعيد مصر في أسرة من الفلاحين، ولكنه حصل على قدر من التعليم سمح له بأن يلتحق بوظيفة في مصلحة البريد بالإسكندرية، وكان مرتبه يكفي بصعوبة لسداد ضرورات الحياة. {unl} aoj(son(icl>person):0I.@def.@entry,Gamal Abdel Nasser(iof>person):00) mod(son(icl>person):0I.@def.@entry,Abd El-Naser Hosen(iof>person):23.@topic) aoj(old(aoj>thing):1J,son(icl>person):0I.@def) man(old(aoj>thing):1J,most(icl>how):15) obj(born(obj>thing):31.@past,Abd El-Naser Hossain(iof>person):23.@topic) and(get(agt>thing,obj>thing):6S.@past.@contrast,born(obj>thing):31.@past) scn(born(obj>thing):31.@past,family(icl>group):5Q) plc(born(obj>thing):31.@past,village(icl>region):4D) tim(born(obj>thing):31.@past,year(icl>period):3M) mod(year(icl>period):3M,1888:41) plc(village(icl>region):4D,upper Egypt(iof>place):58) mod(village(icl>region):4D,Bani Morr(iof>village):4S) mod(family(icl>group):5Q,farmer(icl>person):65.@pl.@def) obj(get(agt>thing,obj>thing):6S.@past.@contrast,degree(icl>abstract thing):7N) agt(allow(agt>thing,gol>thing,obj>thing):8M.@past,degree(icl>abstract thing):7N) mod(degree(icl>abstract thing):7N,education(icl>activity):82.@def) gol(allow(agt>thing,gol>thing,obj>thing):8M.@past,join(agt>person,obj>thing):9I.@present) obj(allow(agt>thing,gol>thing,obj>thing):8M.@past,his(pos>he):97) and(suffice(aoj>thing,obj>thing):CM.@present,join(agt>person,obj>thing):9I.@present) obj(join(agt>person,obj>thing):9I.@present,job(icl>work):A7) plc(job(icl>work):A7,postal service{icl>service ):AN) plc(postal service{icl>service ):AN,Alexandria(iof>city):BB) aoj(suffice(aoj>thing,obj>thing):CM.@present,salary(icl>money):BV) mod(salary(icl>money):BV,his(pos>he):CB) obj(suffice(aoj>thing,obj>thing):CM.@present,satisfy(agt>thing,obj>thing):DQ) man(suffice(aoj>thing,obj>thing):CM.@present,hardly:DA) obj(satisfy(agt>thing,obj>thing):DQ,demand(icl>wants):E6.@pl.@def) mod(demand(icl>wants):E6.@pl.@def,life(icl>activity):EV.@def) {/unl} Language - Independent Format
36
Is it going to work this way?!! Are there language servers ready to work? Is the Arabic language server able to enconvert Arabic texts to universal format? Is it also able to deconvert the universal materials back to Arabic? What about Arabic?? Are the universal materials deconvertable to other languages?
37
A proof of the concept
38
UNL-based Library Information System (UNL-LIS) It is a system to search in a digital library catalogs. uilt on the UNL KI, therefore: It is built on the UNL KI, therefore: - Query is in Natural Language (two languages) -Answer is also in Natural Language (7 languages)
39
Question in NL Answer in UNL Question in UNL UNL LIS Core Architecture LIS MARC21 Records MARC21 Importing Process UNL KB Encyclopedia Concepts Definitions User Question Language Server Enco rules + Dic Enconversion Process Query Engine Deconversion Process Answer in NL Language Server Deco rules + Dic
40
Demo: Screen Shots
41
1. Enter query 2. Press to search Encyclopedia 4. View results here (Naguib Mahfouz). Click for more information. 3. Specify result's language (Arabic) 5. A link to the UNL document {unl} agt(begin(agt>thing,obj>action):12.@past.@entry, Naguib Mahfouz(iof>person):0N.@topic) obj(begin(agt>thing,obj>action):12.@past.@entry, writing(icl>action):18) tim(begin(agt>thing,obj>action):12.@past.@entry, year old:1S.@past) aoj(year old:1S.@past, Naguib Mahfouz(iof>person):0N.@topic) qua(year old:1S.@past, 17) plc(born(aoj>thing):00, Cairo(iof>city):08) aoj(born(aoj>thing):00, Naguib Mahfouz(iof>person):0N.@topic) tim(born(aoj>thing):00, 1911:0H) {/unl} [/S] ;;Time 1.4 Sec ;;Done! {unl} and(write(agt>thing,obj>thing):1K.@past.@entry, publish(agt>thing,obj>thing):0K.@past) obj(write(agt>thing,obj>thing):1K.@past.@entry, novel(icl>tale):1B.@pl.@topic) tim(write(agt>thing,obj>thing):1K.@past.@entry, before(icl>how(obj>thing)):1S) aoj(more(icl>additional):1A, novel(icl>tale):1B.@pl.@topic) qua(novel(icl>tale):1B.@pl.@topic, 10:16) [/S]
46
Conclusion
47
Conclusion Independency of language is a very important dimension that should be considered in storing and retrieving texts for a UDL The UNL system is a promising formalism for representing knowledge in a universal format. The ISAUC less than 2 years old, however, it is one of the very active language centres in designing and implementing UNL materials and tools. The UNL LIS has proved feasibility of the concept of language independency.
48
Thank You Any question is welcomed.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.