Preserving Our Past and Present for the Future Generations Raj Reddy Carnegie Mellon University June 15, 2017 Keynote Speech at International Conference on Digital Library and Knowledge Zhejiang University, Hangzhou, China
Preserving Our Cultural Heritage for Future Generations Preserving Culture Much is Freely Available on the web Wikipedia: Donation Model New Trend: Free Online Access and pay for physical copy Neural Networks and Deep Learning by Michael Nielsen http://neuralnetworksanddeeplearning.com Except Commercially Published Copyrighted Knowledge Most Important Works are Copyrighted and must be preserved! Books, Newspapers, Music, Movies and Video, and Paintings Multiple Media Now All Accessible In Digital Representation Online Digital Access Instantly Available To Anyone, Anywhere in the World, in Any Language, and Searchable, Findable And Processable By Humans And Intelligent Agents PDF/A Future Generations? 10 to 1000 generations? Next Millennium: Y3K – 40 Generations?
Preserving Our Past: CADAL and The Million Book Project Y2K: Technology for Large Scale scanning became available around 2000 No need to cutup books OCR over 99* correct Not orphan languages Pre-1923 manuscipts from libraries in US, China and India MOU between CMU and ZJU signed 2002 Scanned over Million Books in China Google Scanning of Books started around 2007 Over 20 million scanned but not accessible pending litigation We Also Need to Continue Preservation of the Past and Expand to Newspapers, Music, Movies, Paitings and Software
Preserving Our Present: Accessing and Archiving Born Digital Content At present, there is no cogent plan for saving born-digital entities for future generations. Unlike webpages, copyright protected objects are being lost forever except for a few best sellers. GRAND CHALLENGE: Create a Digital Archive Of All Born Digital Content: Books, newspapers, paintings, music, movies, software, etc. from now in perpetuity Instantly Available to Anyone, Anywhere In The World, in any language and searchable, findable and processable by humans and intelligent agents. Requires a Government Ordinance within which all the born-digital content is captured and archived before it is irretrievably lost.
The Main Bottleneck: Copyright Laws Incompatible with Speed of Progress in Information Age Different Countries have Different policies Some Countries Have Compulsory Licensing Status not always Knowable Life+ 50 years Mickey Mouse law Some Content Not Digital
Universal Copy Right Summit Fixed Copyright Term for 100 years from the Date of Creation Require Registration and Self Archiving and Renewal Every 10 years Life + 50 and other Arcane laws superseded For options such as Royalties for number of access Online Global Copyright Registry OPT-IN: Owner can reduce the Copyright Period Public Domain Works Orphan and/or Abandoned works and Government Publications Digital Depository All Media that Enjoys Copyright Protection: Books, Music, Movies, Newspapers, Paintings Ordinance requiring all publishers of all media to submit a digital copy (along with usually required physical copies) to the National Archive of the Country.
Technical Issues How to Better Preserve Our Culture, Heritage and Creative Works. Establish a Digital Cultural Conservancy of the World A Thousand Year Archive of All Books, Newspapers, Music, Movies, and Paintings How Can We Be Sure We Can Read or View Books, Newspapers, Music, Movies, Paintings etc. Created Long Ago? PDF/A In a World Where File Formats Change Monthly, How Can We Retrieve and Experience Archived Content? Format conversion Tools VM Ware Who pays to maintain our ability to access artifacts? Stakeholders: Government Libraries Media eCommerce companies Authors and Creators (of Books, Movies, Music, Newspapers, Paintings etc)
National Digital Archive of China (NDAC) Establish National Digital Archives of China as part of State Administration of Cultural Heritage Industry and Universities may provide Technology, Training and Management support Government shall provide space, personnel, equipment and operating costs of INDAC. Digital Depository for All Born Digital Copyrighted Works Software and Tools to enable drag-drop-and click submission of the copyrighted work to the NDA without Leaving the Office/Home Such material would be released to the public only when the work is out-of-copyright or when requested by the author/publisher on an opt-in basis.
PDF/A: ISO Standard for Archiving Digital Content PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features ill-suited to long-term archiving, such as font linking (as opposed to font embedding) and encryption.[1] The ISO requirements for PDF/A file viewers include color management guidelines, support for embedded fonts, and a user interface for reading embedded annotations.
Orphan Works and Public Lending Right Orphan and abandoned works problem State Usually Inherits the IP Define that A Creative Work is “Orphan” When It is Generating Little or No Revenue to The Creator. A Creative Work is Considered “Abandoned” When Attempts to Locate Owners Of Unclaimed Works Through Letters and Newspaper Announcements are Unsuccessful. An “Opt-in Process” May Be Provided If There is an Incentive to Receive Royalty for Every Access Analogous To Public Lending Right For Paying Royalties To Authors In UK And Other Countries.
Value Added Services of NDAC To Enhance the Income to Authors and Creators National Digital Archives of China may Undertake Value Added Services of Access and Distribution such as Find Copyrighted Content by Discovery and Search Tools Enable buying and Selling Copyrighted Content thru eCommerce Enable Dynamic Pricing based on Market Demand Pay Royalty to Authors and Creators that Borrow Copyrighted Content under Public Lending Right Discoverability by Search eCommerce using Alibaba and Amazon-like Services Market Pricing Public Lending Right
Next Steps Set-up a Cloud based Server to Serve as Global Digital Archive Government to Approve Establishment of an Archive for National Digital Depository of Copy-righted Works replacing the Current Physical Depository Define Meta Data for Digital Works: Ease of Use for Self Archiving by Authors Self Archiving https://en.wikipedia.org/wiki/Self-archiving but not free for 100 years unless placed in Public Domian
Conclusions and Action Items IKCEST, the UNESCO International Knowledge Center for Engineering Science and Technology may convene a International Summit on Universal Copyright Policies, to discuss and establish Terms and Conditions of a Universal Copyright Law. A Global Copyright Registry A Digital Depository at National Digital Archives Compulsory Licensing of Orphan works While respecting the rights of authors and creative artists. Establish National Digital Archives of China as part of State Administration of Cultural Heritage With Responsibility of Acquisition, Preservation and Access to All Born-Digital Copyrighted Content Including Books, Newspapers, Music , Movies and Videos, Paintings Establish scanning equipment, computers, petabytes of storage, and Software tools for Management and Preservation