Presentation is loading. Please wait.

Presentation is loading. Please wait.

+ Introduction to the Digitization of Hanguk Bulgyo Chonso Bo Kwang Han, Young Sik Hong, Keum Suk Lee, Yong Kyu Lee, Soon Il Hwang, Jae Soo Lee Institute.

Similar presentations


Presentation on theme: "+ Introduction to the Digitization of Hanguk Bulgyo Chonso Bo Kwang Han, Young Sik Hong, Keum Suk Lee, Yong Kyu Lee, Soon Il Hwang, Jae Soo Lee Institute."— Presentation transcript:

1 + Introduction to the Digitization of Hanguk Bulgyo Chonso Bo Kwang Han, Young Sik Hong, Keum Suk Lee, Yong Kyu Lee, Soon Il Hwang, Jae Soo Lee Institute of Electronic Buddhist Texts & Culture Content Dongguk University 동국대학교

2 + 2/22 DONGGUK UNIVERSITY content 1. Introduction 2. Development Process 3. Missing Character Management System 4. Web Search System 5. Database Analysis 6. Achievement and Challenge

3 + 3/22 DONGGUK UNIVERSITY 1. Introduction Institute of Electronic Buddhist Texts & Culture Content  Aim: the digitization of Korean Buddhist manuscripts and of Buddhist Culture Content.  History: established by Ven. Bogwang in 1997 together with professors in the dept. of Buddhist Studies, of Seon (zen) Studies, of Indian Philosophy and of Computer Engineering.  Recent Works: -Digitization of Hanguk Bulgyo Chonso (1999-2007) - Digital Reconstruction of Angkor Wat (2006) - Hangul Tripitaka Retrieval System (2001- )

4 + 4/22 DONGGUK UNIVERSITY Hanguk Bulgyo Chonso ( 韓國佛敎全書 )  Series of books collecting all kinds of the Korean Buddhist manuscripts written by monks, nuns and scholars from the period of the three kingdoms up to modern times.  History: -Dongguk University Press started to collect and publish authentic Korean Buddhist Manuscripts (1970). -ten volumes of the series were published (1989). -four appendices have been published so for (1990- ). 1. Introduction

5 + 5/22 DONGGUK UNIVERSITY 1. Introduction Digitization of Hanguk Bulgyo Chonso  Aim: constructing Digital Database for Hanguk Bulgyo Chonso and providing Internet Retrieval Service for those who are interested in Korean Buddhism.  History -Pilot Project for the Digitization of Hanguk Bulgyo Chonso within the limited number of texts (1999). -Digitization of Hanguk Bulgyo Chonso supported by Dongguk University (2000-2007).

6 + 6/22 DONGGUK UNIVERSITY 2. Development Process Pilot Project for the Digitization of Hanguk Bulgyo Chonso  Aim: digitizing the manuscripts written by National Preceptor Daegak (1055-1101) and Bojo (1158-1210) in the fourth volume of Hanguk Bulgyo Chonso and constructing Web Search System for the texts.  System Developed in the Pilot Project -Unicode based texts Input System -Unicode based texts Editing System -Unicode based Database System -Web Search System.

7 + 7/22 DONGGUK UNIVERSITY 2. Development Process Main Project for the Digitization of Hanguk Bulgyo Chonso  Organization Missing Character Management Team Search System Developing Team Database Constructing Team Text Input Team EBTI in Dongguk

8 + 8/22 DONGGUK UNIVERSITY 2. Development Process Main Project for the Digitization of Hanguk Bulgyo Chonso  Digitization Process - Text Input: two volumes a year (2000-2007). -Improving and updating Missing Character Management System, Web Search System and Database Constructing System.  System developed and updated in the Main Project - Missing Character Management Solution. - Automatic Index Creating Solution. - Web Search Program.

9 + 9/22 DONGGUK UNIVERSITY 3. Missing Character Management System Missing Character Management Solution  Missing characters are Chinese characters not appeared in Internet due to the limited number of Chinese Character in Unicode.  Missing characters are saved as an image file (bmp) and appeared in Web pages as an XML format.

10 + 10/22 DONGGUK UNIVERSITY 3. Missing Character Management System Diverse ways of searching Missing Characters  By the Korean Sound  By the 214 Historical Radicals  By the total number of strokes

11 + 11/22 DONGGUK UNIVERSITY 3. Missing Character Management System Missing Character Register Interface  Missing Character Register Interface are designed to manage the missing character database and to support diverse search methods.

12 + 12/22 DONGGUK UNIVERSITY 3. Missing Character Management System Extended Missing Character Register Interface  Missing Character Register Interface has been updated and extended in order to input and manage missing characters efficiently.  Solution added: Character ID search function, Modification Tool for missing character images.

13 + 13/22 DONGGUK UNIVERSITY 3. Missing Character Management System Modification Tool for missing character images  Missing characters can be improved through modification Tool for missing character images.

14 + 14/22 DONGGUK UNIVERSITY 3. Missing Character Management System Statistics of Missing Characters  3,318 missing characters are registered in the missing character database for Hanguk Bulgyo Chonso.  The frequency of missing characters classified by the four stroke of the Historical Chinese Radicals.

15 + 15/22 DONGGUK UNIVERSITY 3. Missing Character Management System Future Challenge for Missing Character Management System  Improving Missing Character Management Interface.  Offering diverse ways of searching missing characters.  Improving search speed.  Utilizing this solution in the diverse digitization project for the ancient manuscripts written in the old Chinese character. - The same solution is used in Hangul Tripitaka Retrieval System.

16 + 16/22 DONGGUK UNIVERSITY 4. Web Search System Feature of Hanguk Bulgyo Chonso Search System  Providing Four types of search methods: - Search by Keywords - Search by Page - Search by Title - Search by Stroke  User Friendly Interface: all search menu can be accessed via the main Web page.  Easy to access: offering easy to use assistance for computer illiteracy.

17 + 17/22 DONGGUK UNIVERSITY 4. Web Search System Hanguk Bulgyo Chonso Web Search System Main search Page Web TierData Tier Keyword Search index_keyword.asp Text Database (UNICODE, XML) Page Search index_page.asp Title Search index_title.asp Stroke Search index_stroke.asp user request Missing character (Image File) Image (Image File)

18 + 18/22 DONGGUK UNIVERSITY 4. Web Search System Search By Keyword 1. Select Volume Number

19 + 19/22 DONGGUK UNIVERSITY 4. Web Search System Search By Keyword 2. Type Keyword At the moment it works only with Korean.

20 + 20/22 DONGGUK UNIVERSITY 4. Web Search System Search By Keyword 3. Result Page

21 + 21/22 DONGGUK UNIVERSITY 4. Web Search System Search By Page 1. Select Volume Number

22 + 22/22 DONGGUK UNIVERSITY 4. Web Search System Search By Page 2. Type Page Number

23 + 23/22 DONGGUK UNIVERSITY 4. Web Search System Search By Page 3. Result Page

24 + 24/22 DONGGUK UNIVERSITY 4. Web Search System Search By Title 1. Select Volume Number

25 + 25/22 DONGGUK UNIVERSITY 4. Web Search System Search By Title 2. Title List is displayed

26 + 26/22 DONGGUK UNIVERSITY 4. Web Search System Search By Title 3. Select Title

27 + 27/22 DONGGUK UNIVERSITY 4. Web Search System Search By Title 4. Result Page

28 + 28/22 DONGGUK UNIVERSITY 4. Web Search System Search By Stroke 1. Select Stroke Number (Select Stroke Five)

29 + 29/22 DONGGUK UNIVERSITY 4. Web Search System Search By Stroke 2. Keyword List is displayed

30 + 30/22 DONGGUK UNIVERSITY 4. Web Search System Search By Stroke 3. Select Keyword (Select 加敎 )

31 + 31/22 DONGGUK UNIVERSITY 4. Web Search System Search By Stroke 4. Result Page

32 + 32/22 DONGGUK UNIVERSITY 4. Web Search System Hanguk Bulgyo Chonso  Hanguk Bulgyo Chonso can be searched via the Homepage of Institute of Electronic Buddhist Texts & Culture Content in Dongguk University http://ebtc.dongguk.ac.kr  Hanguk Bulgyo Chonso CD-ROM was produced to celebrate the 100 th Anniversary of Dongguk University.

33 + 33/22 DONGGUK UNIVERSITY 5. Database Analysis Hanguk Bulgyo Chonso Database  Microsoft SQL Server 2000 DBMS has been used to build the Relational Database of Hanguk Bulgyo Chonso.  The texts of Hanguk Bulgyo chonso are tagged in the format of XML before saving into Hanguk Bulgyo Chonso Database.

34 + 34/22 DONGGUK UNIVERSITY 5. Database Analysis Chinese Title Info Korean Title list Taxt Data (line by line) Column/Page/note Info Korean Keyword list Chinese Keyword list Keyword turn up (volume) Keyword turn up (whole) edocdata14 edocdata4 edocdata3 edocdata2 edocdata 1 ~ 14 Idx_keyword_index ekeyword hkeyword tag_jmok_table tag_hjmok_list edocdata14 edocdata4 edocdata3 edocdata2 keyword_index 1 ~ 14 Text taggedInformation Database Table

35 + 35/22 DONGGUK UNIVERSITY 5. Database Analysis Database Size Database Size (617MB) Text 26.0% Keyword Index 73.9% Title Index 0.1% textKeywrod Index Title Index

36 + 36/22 DONGGUK UNIVERSITY 5. Database Analysis Database Size in detail Title TableTable RowTable Size (KB) Text Text table 1~14885,011160,344 Total885,011160,344 Keyword Index Chinese Keyword List58,2452,952 Korean Keyword List58,2451,736 Integrated Keyword Index6,595,732225,520 Volume Keyword Index 1~146,595,732226,128 Total13,307,954456,336 Title Index Korean Title Index32656 Chinese Title Index1,576392 Chinese Title Index Summary1,57648 Total3,478496 Total14,196,443617,176

37 + 37/22 DONGGUK UNIVERSITY 5. Database Analysis Text Database Size according to period Silla 20.3% / Goryeo 22.6% / Joseon 28.4% / Appendix 28.6% DB Size according to Period (160 MB) Silla 20.3% Goryeo 22.6% Joseon 28.4% Appendix 28.8% Silla Goryeo Joseon Appendix

38 + 38/22 DONGGUK UNIVERSITY 5. Database Analysis Keyword Database Size according to period Keyword Database Size (226MB) Silla 25.6% Goryeo 20.6% Joseon 23.4% Appendix 30.4% Silla Goryeo Joseon Appendix

39 + 39/22 DONGGUK UNIVERSITY 5. Database Analysis Top 10 Keywords (one word) in Hanguk Bulguy Chonso TOP 10 Keywords 86.07% 有(유)有(유) 者(자)者(자) 無(무)無(무) 故(고)故(고) 如(여)如(여) 三(삼)三(삼) 中(중)中(중) 法(법)法(법) 所(소)所(소) 生(생)生(생) 기타

40 + 40/22 DONGGUK UNIVERSITY 5. Database Analysis Top 10 Keywords in each volume

41 + 41/22 DONGGUK UNIVERSITY 5. Database Analysis Top 10 Keywords (one word) KeywordsNumber of OccurrenceOccurrence Rate 1 有(유)有(유) 125,5521.90 2 者(자)者(자) 121,5791.84 3 無(무)無(무) 120,9111.83 4 故(고)故(고) 100,9721.53 5 如(여)如(여) 82,1551.25 6 三(삼)三(삼) 79,3431.20 7 中(중)中(중) 75,6501.15 8 法(법)法(법) 74,9901.14 9 所(소)所(소) 72,1471.09 10 生(생)生(생) 66,1861.00 Total919,48513.94

42 + 42/22 DONGGUK UNIVERSITY 5. Database Analysis Top 10 Keywords (two words) in Hanguk Bulguy Chonso Top 10 Keywords 98.46% 菩薩 ( 보살 ) 如是 ( 여시 ) 一切 ( 일체 ) 第二 ( 제이 ) 衆生 ( 중생 ) 第三 ( 제삼 ) 分別 ( 분별 ) 煩惱 ( 번뇌 ) 差別 ( 차별 ) 如來 ( 여래 ) 기타

43 + 43/22 DONGGUK UNIVERSITY 5. Database Analysis Top 10 Keywords in each volume

44 + 44/22 DONGGUK UNIVERSITY 5. Database Analysis Top 10 Keywords (two words) KeywordNumber of OccurrenceOccurrence Rate 1 菩薩 ( 보살 ) 16,4210.25 2 如是 ( 여시 ) 16,1170.24 3 一切 ( 일체 ) 15,7690.24 4 第二 ( 제이 ) 10,1880.15 5 衆生 ( 중생 ) 9,2140.14 6 第三 ( 제삼 ) 7,1900.11 7 分別 ( 분별 ) 6,9320.11 8 煩惱 ( 번뇌 ) 6,7710.10 9 差別 ( 차별 ) 6,6060.10 10 如來 ( 여래 ) 6,3490.10 Total101,5571.54

45 + 45/22 DONGGUK UNIVERSITY 5. Database Analysis Top 10 Keywords (three words) in Hanguk Bulguy Chonso

46 + 46/22 DONGGUK UNIVERSITY 5. Database Analysis Top 10 Keywords in each volume

47 + 47/22 DONGGUK UNIVERSITY 5. Database Analysis Top 10 Keywords (three words) KeywordNumber of OccurrenceOccurrence Rate 1 如何是 ( 여하시 ) 1,2390.019 2 一切法 ( 일체법 ) 1,1970.018 3 作麽生 ( 자마생 ) 1,0850.016 4 阿彌陀 ( 아미타 ) 1,0090.015 5 彌陀佛 ( 미타불 ) 9290.014 6 無分別 ( 무분별 ) 9060.014 7 善男子 ( 선남자 ) 9020.014 8 補特伽 ( 보특가 ) 8870.013 9 三摩地 ( 삼마지 ) 8260.013 10 曹溪宗 ( 조계종 ) 8000.012 총계 9,7800.148

48 + 48/22 DONGGUK UNIVERSITY 5. Database Analysis TOP 10 Keywords 99.93% 阿彌陀佛 ( 아미타불 ) 補特伽羅 ( 보특가라 ) 遍計所執 ( 변계소집 ) 發菩提心 ( 발보리심 ) 無分別智 ( 무분별지 ) 三世諸佛 ( 삼세제불 ) 瑜伽師地 ( 유가사지 ) 毗鉢舍那 ( 비발사나 ) 八萬四千 ( 팔만사천 ) 波羅蜜多 ( 바라밀다 ) 기타 Top 10 Keywords (four words) in Hanguk Bulguy Chonso

49 + 49/22 DONGGUK UNIVERSITY 5. Database Analysis Top 10 Keywords in each volume

50 + 50/22 DONGGUK UNIVERSITY 5. Database Analysis Top 10 Keywords (four words) KeywordNumber of OccurrenceOccurrence Rate 1 阿彌陀佛 ( 아미타불 ) 8630.013 2 補特伽羅 ( 보특가라 ) 7620.012 3 遍計所執 ( 변계소집 ) 6640.010 4 發菩提心 ( 발보리심 ) 3740.006 5 無分別智 ( 무분별지 ) 3220.005 6 三世諸佛 ( 삼세제불 ) 3200.005 7 瑜伽師地 ( 유가사지 ) 3160.005 8 毗鉢舍那 ( 비발사나 ) 2920.004 9 八萬四千 ( 팔만사천 ) 2800.004 10 波羅蜜多 ( 바라밀다 ) 2790.004 총계 4,4720.068

51 + 51/22 DONGGUK UNIVERSITY 5. Database Analysis Top 5 Keywords (one words) according to Period SillaRateGoryeoRateJoseonRate 故(고)故(고) 0.63% 者(자)者(자) 0.46 % 無(무)無(무) 0.40 % 有(유)有(유) 0.56% 無(무)無(무) 0.37 % 者(자)者(자) 0.35 % 者(자)者(자) 0.55% 有(유)有(유) 0.35 % 有(유)有(유) 無(무)無(무) 0.50% 故(고)故(고) 0.30 % 人(인)人(인) 0.32 % 如(여)如(여) 0.37% 中(중)中(중) 0.26 % 山(산)山(산) 0.25 %

52 + 52/22 DONGGUK UNIVERSITY 5. Database Analysis Top 5 Keywords (two words) according to Period SillaRateGoryeoRateJoseonRate 菩薩 ( 보살 ) 0.09 % 如是 ( 여시 ) 0.04 % 衆生 ( 중생 ) 0.04 % 第二 ( 제이 ) 0.07 % 菩薩 ( 보살 ) 0.04 % 一切 ( 일체 ) 0.03 % 一切 ( 일체 ) 0.06 % 第二 ( 제이 ) 0.03 % 如來 ( 여래 ) 0.03 % 如是 ( 여시 ) 0.06 % 一切 ( 일체 ) 0.03 % 菩薩 ( 보살 ) 0.03 % 分別 ( 분별 ) 0.05 % 第一 ( 제일 ) 0.03 % 大師 ( 대사 ) 0.03 %

53 + 53/22 DONGGUK UNIVERSITY 5. Database Analysis Top 5 Keywords (three words) according to Period SillaRateGoryeoRateJoseonRate 一切法 ( 일체법 ) 0.013 % 如何是 ( 여하시 ) 0.012 % 阿彌陀 ( 아미타 ) 0.011 % 無分別 ( 무분별 ) 0.007 % 作麽生 ( 자마생 ) 0.012 % 彌陀佛 ( 미타불 ) 0.011 % 無自性 ( 무자성 ) 0.006 % 華嚴經 ( 화엄경 ) 0.004 % 金剛山 ( 금강산 ) 0.004 % 善男子 ( 선남자 ) 0.006 % 法華經 ( 법화경 ) 0.003 % 一切法 ( 일체법 ) 0.004 % 薩婆多 ( 살파다 ) 0.005 % 善知識 ( 선지식 ) 0.003 % 如來禪 ( 여래선 ) 0.004 %

54 + 54/22 DONGGUK UNIVERSITY 5. Database Analysis The Outcome of DB Established  DB Size: 878MB  Number of Keywords : 58,245  Number of Keyword Occurrence : 6,595,732 Feature of DB Analysis  Identifying top keywords according to stroke  Identifying top keywords according to Period  Clarifying change of top keywords according to Period

55 + 55/22 DONGGUK UNIVERSITY 6. Achievement and Challenge Technological Achievement  Missing Character Management System for the words not appeared in Internet due to the limited number of Chinese Character in Unicode.  Database Constructing system for the ancient manuscripts written in the old Chinese character.  Web Search System for the ancient manuscripts written in the old Chinese character.

56 + 56/22 DONGGUK UNIVERSITY 6. Achievement and Challenge Cultural and Academic Achievement  Developing Digitizing Solution for the Korean Cultural heritage.  Offering Web Search Service for anyone who is interested in Korean Buddhism.  Developing the way to build up Digital library for the ancient manuscripts.  Offering Korean Buddhist Texts for scholars all over the world.

57 + 57/22 DONGGUK UNIVERSITY 6. Achievement and Challenge Future Challenge  Constructing English-based Keyword Search module.  Developing Solution to ensure the compatibility between its missing character graphic files and the extended special fonts developed and loaded in various word processor, such as Microsoft Word.

58 + 58/22 DONGGUK UNIVERSITY 6. Achievement and Challenge Hanguk Bulgyo Chonso CD-ROM  Database constructed by Microsoft Access (Simplified DBMS) in order for easy distribution.  Windows Search System modified from Web Search System  English User Interface and English-based English keywords Search module will be developed and loaded in the near future.


Download ppt "+ Introduction to the Digitization of Hanguk Bulgyo Chonso Bo Kwang Han, Young Sik Hong, Keum Suk Lee, Yong Kyu Lee, Soon Il Hwang, Jae Soo Lee Institute."

Similar presentations


Ads by Google