Presentation is loading. Please wait.

Presentation is loading. Please wait.

October 1, 1999 Two Catalysts for Qualitative Change Richard Snodgrass.

Similar presentations


Presentation on theme: "October 1, 1999 Two Catalysts for Qualitative Change Richard Snodgrass."— Presentation transcript:

1 October 1, 1999 Two Catalysts for Qualitative Change Richard Snodgrass

2 October 1, 1999 SGB MeetingRichard T. Snodgrass2 Location City and State, 2000 BCE Longitude, 1773 CE GPS + cell phone, 1999 CE

3 October 1, 1999 SGB MeetingRichard T. Snodgrass3 Confluences Underlying technologies –Highly accurate atomic clocks –Geosynchronous satellites –Advances in micro-circuitry –Proliferation of cell phones Demonstrated need Catalyst: companies able to produce in quantity at low price Qualitative change

4 October 1, 1999 SGB MeetingRichard T. Snodgrass4 The Vision The ACM Computing Portal A web-based repository of bibliographic information –contains information on all papers and books in the computing literature –contains a pointer to the digitized version, if available

5 October 1, 1999 SGB MeetingRichard T. Snodgrass5 Objectives Qualitatively increase the effectiveness of scientific research into computing Continue to place ACM as the premier scientific and educational organization for computing Increase service of ACM and the SIGs to the scientific community Provide a concrete illustration of the scope of computer science

6 October 1, 1999 SGB MeetingRichard T. Snodgrass6 Presentation Components –Bibliographic Entries –Abstracts and Keywords –Full Text and Bitmapped Images –Citation Linking Demonstration Realizing the Computing Portal –Revisit the components The Next Step

7 October 1, 1999 SGB MeetingRichard T. Snodgrass7 Step 1: Bibliographic Entries Collect all bibliographic entries from all computer science journals, conferences, workshops, technical bulletins, and books. –Over the period from 1940 to 2000, then continuing –Approximately 1M entries –Provide free searching on the web. –Provide citations in multiple formats: HTML, BiBTeX, refer, Word, XML,...

8 October 1, 1999 SGB MeetingRichard T. Snodgrass8 Step 2: Abstracts and Keywords Collect keywords, and later, abstracts, for all entries. Copyright restrictions on some abstracts?

9 October 1, 1999 SGB MeetingRichard T. Snodgrass9 Step 3: Full Text and Images Collect full text of each available paper and book for – use in searching –to develop classification maps and lexicons –other analyses

10 October 1, 1999 SGB MeetingRichard T. Snodgrass10 Step 3, cont. Encourage acquisition of digitized version of each paper in web-accessible digital libraries (e.g., the ACM DL) –Collect bit-mapped image of each page of each paper to retain formatting, equations, and figures. –Each paper can then be reproduced as an exact copy. –Can provide structure on full text sections, figures, citations in running prose

11 October 1, 1999 SGB MeetingRichard T. Snodgrass11 Step 4: Citation Linking Start with full text of paper’s bibliography. Out linking: identify bibliographic entry of papers referenced by the paper In linking: identify bibliographic entries of papers referencing the paper Use for citation analysis, knowledge diffusion studies

12 October 1, 1999 SGB MeetingRichard T. Snodgrass12

13 October 1, 1999 SGB MeetingRichard T. Snodgrass13 Demonstration

14 October 1, 1999 SGB MeetingRichard T. Snodgrass14 Papers with “wavelet”

15 October 1, 1999 SGB MeetingRichard T. Snodgrass15

16 October 1, 1999 SGB MeetingRichard T. Snodgrass16

17 October 1, 1999 SGB MeetingRichard T. Snodgrass17

18 October 1, 1999 SGB MeetingRichard T. Snodgrass18

19 October 1, 1999 SGB MeetingRichard T. Snodgrass19

20 October 1, 1999 SGB MeetingRichard T. Snodgrass20

21 October 1, 1999 SGB MeetingRichard T. Snodgrass21

22 October 1, 1999 SGB MeetingRichard T. Snodgrass22

23 October 1, 1999 SGB MeetingRichard T. Snodgrass23 INSPEC

24 October 1, 1999 SGB MeetingRichard T. Snodgrass24

25 October 1, 1999 SGB MeetingRichard T. Snodgrass25 Some Numbers 5300 10 13.6 290 377 Years remaining of lifetime for the average SIG $ per member (over required fund balance) $M total SIG fund balance (over required) $K per SIG fund balance (over required) SIG members lost last year (52.1K  46.8K, > 10%)

26 October 1, 1999 SGB MeetingRichard T. Snodgrass26 Step 1: Bibliographic Entries Propose that each SIG be responsible for ensuring correctness of relevant entries. relevance based on SIG interests reduce overlap between SIGs Software for provided to SIGs –data entry, validation, conversion –presentation (HTML, BiBTex, …, XML) –searching –precomputed lists (e.g., bibliographic home page for every author)

27 October 1, 1999 SGB MeetingRichard T. Snodgrass27 Stage 1: Bibliographic Entries 1M entries / 36 SIGs = 30K entries per SIG –e.g., SIGMOD: approximately 50K entries Many resources –DBLP: 2^17 (130K) entries –Propose that ACM donate the ACM Guide to Computing Literature: 300K entries –Collection of Computer Science Bibliographies: 930K entries

28 October 1, 1999 SGB MeetingRichard T. Snodgrass28 Step 2: Keywords and Abstracts May need copyright permission, negotiated by ACM HQ Collection of CS bibliographies has 100K abstracts

29 October 1, 1999 SGB MeetingRichard T. Snodgrass29 Step 3: Full Text and Bitmapped Images Full text is used for searching and citation linking in the Computing Portal. Bit-mapped images, stored in a Digital Library, is used to display and print actual paper. Propose SIGs fund populating entire ACM Digital Library. –PDF files containing encapsulated TIFF and OCRed full text –99% accuracy –$1.25 per page –Could go to SGML or XML, 99.9% accuracy: $8-$10 per page.

30 October 1, 1999 SGB MeetingRichard T. Snodgrass30 Populating ACM DL 1991-1998 already in DL Journals: about 110K pages Conferences –1985-1990: 76K pages –pre-1985: about 200K pages Newsletters –120K pages Total: 500K pages at $600K –$20K per SIG

31 October 1, 1999 SGB MeetingRichard T. Snodgrass31 Step 3: Full Text, cont. ACM papers: 500K pages, or about 40K papers –This represents perhaps 5% of total of 1M papers. For remaining conference proceedings and journals –Offer URL into their DL in exchange for full text, only for searching ACM Computing Portal provides valuable entry into their DL, enhancing their revenue stream. –Offer full CD Rom package at cost in exchange for inclusion in CD Rom and use of full text for searching. –Pay for digitization out of conference profits –SIGs pay for integration: $0.25 - $0.50 per page.

32 October 1, 1999 SGB MeetingRichard T. Snodgrass32 Step 3: Full Text, cont. Use standard IR indexing and search techniques on full text. Partner with DL and IR research efforts to come up with new search strategies. Search software provided to each SIG

33 October 1, 1999 SGB MeetingRichard T. Snodgrass33 Step 4: Citation Linking Manual out-linking –about $5-$6 per paper, or $0.30 per page of digitized text Can be done semi-automatically for much less, if the appropriate linking software is developed In-linking is simply a database search. All bibliographic entries must be present.

34 October 1, 1999 SGB MeetingRichard T. Snodgrass34 SIG-Specific Portals Possibly provide CD Roms, containing the relevant portion of ACM CS Portal, to members of the SIG especially useful for international members, or those working at home or traveling. Web-based Portals –some papers hosted on ACM server (clearly labeled as to source), with copies of papers provided for a fee –URL to other DLs

35 October 1, 1999 SGB MeetingRichard T. Snodgrass35 Open Architecture Free searching via web interface, including full text search, at ACM site and SIG portals Bibliographic data and full text available for other search engines, and for use in research in information retrieval, knowledge propagation, and other disciplines Portal should be available for mirroring, on both geographical and institutional bases Encourage digitization of corpus

36 October 1, 1999 SGB MeetingRichard T. Snodgrass36 Previous Efforts SIGDA CD Rom Project –9 CD Roms –$1.5M project –SGML, proprietary display software on CD Rom POPL CDRom –10 years of POPL, given out as a SIGPlan member benefit –PDF files Many conferences distribute CD-ROMs of papers

37 October 1, 1999 SGB MeetingRichard T. Snodgrass37 Previous Efforts, cont. SIGMOD Anthology –10 CD Roms (later 1-2 DVD Roms), $105K –SIGMOD, PODS, KDD, VLDB, ICDE, SSDBM, COMAD,... –SIGMOD Record, Data Engineering Bulletin –TODS, VLDB Journal –Given out as member benefit SIGMOD DiSC yearly CD-ROM –1999: 2 CD-ROMs, about $30K per year –all relevant conferences and workshops for that year, ancillary material, such as powerpoint presentations, audio, video –Given out as a member benefit (Consumer Reports model)

38 October 1, 1999 SGB MeetingRichard T. Snodgrass38

39 October 1, 1999 SGB MeetingRichard T. Snodgrass39 The Next Step SGB Portal Committee –Determine appropriate data format(s). –Negotiate coverage of corpus among SIGs. –Identify appropriate software (paper incorporation, search, citation linking, notification). –Specify new infrastructure to be developed. –Propose specific projects for opportunity fund consideration. –Work with Pubs Board e.g., interaction with Computing Reviews, CoRR. –Work with DL and IR research communities. –Identify new capabilities.

40 October 1, 1999 SGB MeetingRichard T. Snodgrass40 SGB Portal Committee Rick Snodgrass (University of Arizona, CS), chair Steve Cunningham (Cal State University-Stanislaus, CS) Carol Hutchins (Courant Institute of Math. Sci. Library) Bob Krovetz (NEC Research Institute) Michael Ley (University of Trier, CS) Andreas Paepcke (Stanford University) Kathy Preas (KP Pubs on CDROM) Bernie Rous (ACM Publications) Charles Viles (Univ. of North Carolina, Info and Lib Sci)

41 October 1, 1999 SGB MeetingRichard T. Snodgrass41 Individual SIG Commitments Collect and capture SIG-relevant bibliographic entries, abstracts, and keywords, in appropriate format. Allocate funds to populate the ACM DL: journals, conference and workshop proceedings, SIG newsletter. –Roughly $20K for each SIG –SIGDA matching funds: $50K Negotiate with steering committees of associated conferences and workshops.

42 October 1, 1999 SGB MeetingRichard T. Snodgrass42 SGB Opportunities Use opportunity fund to subsidize content development in areas associated with low-fund SIGs. –SIG would request allocation for CSP content development. –Propose that SGB would then control accessibility of created material. Use opportunity fund to subsidize infrastructure. –software acquisition and development/customization –Such proposals would originate at the CSP editorial board.

43 October 1, 1999 SGB MeetingRichard T. Snodgrass43 ACM HQ Commitments Donate entries from ACM Guide to Computing Literature. Negotiate cross-use agreements with associated societies. Acquire full text of books copyrighted by ACM. Provide hardware and software to host CSP. Provide staff to manage CSP, with content provided by SIGs.

44 October 1, 1999 SGB MeetingRichard T. Snodgrass44 ACM HQ Opportunities Integrate CSP with CoRR Provide print and CD-ROM versions of the expanded ACM Guide to Computing Literature Fully populated DL Increased visibility of ACM

45 October 1, 1999 SGB MeetingRichard T. Snodgrass45 The ACM Computing Portal Free searchable access to the entire computer science corpus Links to a fully populated ACM DL and to other DLs Capability to purchase papers and to register queries Possibly ancillary SIG-provided benefits, such as CD-ROMs and SIG-specific portals

46 October 1, 1999 SGB MeetingRichard T. Snodgrass46 Confluences Underlying technologies –Inexpensive scanning, OCR, disk space, high capacity CD-ROM and DVD-ROM, and widely available www access Demonstrated need Catalysts: SIG Governing Board, ACM Council, ACM Publications Board, HQ staff Qualitative change


Download ppt "October 1, 1999 Two Catalysts for Qualitative Change Richard Snodgrass."

Similar presentations


Ads by Google