Professor C. Lee Giles David Reese Professor – IST; graduate Professor - CSE Adjunct Professor – Princeton, Pennsylvania, Columbia, Pisa, Trento Graduated.

Slides:



Advertisements
Similar presentations
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Advertisements

Department of Mathematics and Computer Science
1 CS 502: Computing Methods for Digital Libraries Lecture 16 Web search engines.
The Semantic Web Week 1 Module Content + Assessment Lee McCluskey, room 2/07 Department of Computing And Mathematical Sciences Module.
Doctoral Research and Education for Informaticians at John Yen, Mary Beth Rosson, and Henry C. Foley College of Information Sciences and.
C. Lee Giles David Reese Professor, College of Information Sciences and Technology Graduate Professor of Computer Science and Engineering Courtesy Professor.
Department of Computer and Information Science The Norwegian University of Science and Technology.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Multimedia Databases (MMDB)
Dr. Zhen Jiang Computer Science Department West Chester University
Online Autonomous Citation Management for CiteSeer CSE598B Course Project By Huajing Li.
Information Literacy support and research strategy skills Mary Joan Crowley DISG Library, Engineering Faculty, Sapienza, University of Rome * all images.
Artificial Intelligence
The New Digital World and the Transformation of Information and Libraries Patricia L. Thibodeau Associate Dean Library Services & Archives Oct. 26, 2011.
Master Thesis Defense Jan Fiedler 04/17/98
Overviews of ITCS 6161/8161: Advanced Topics on Database Systems Dr. Jianping Fan Department of Computer Science UNC-Charlotte
Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST Kick-off.
1 Bridging the gap between the paper past and digital future.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
CSA Discovery Services!! Community of Scholars PapersInvited COS Funding Opportunities.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
The Interplay Between Mathematics/Computation and Analytics Haesun Park Division of Computational Science and Engineering Georgia Institute of Technology.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Iana Atanassova Research: – Information retrieval in scientific publications exploiting semantic annotations and linguistic knowledge bases – Ranking algorithms.
A multidisciplinary graduate program in the Dietrich School dedicated to Applied Artificial Intelligence (AI) Program Goals Provide an outstanding interdisciplinary.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
C. Lee Giles David Reese Professor, College of Information Sciences and Technology Graduate Professor of Computer Science and Engineering Courtesy Professor.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Feb 21-25, 2005ICM 2005 Mumbai1 Converting Existing Corpus to an OAI Compliant Repository J. Tang, K. Maly, and M. Zubair Department of Computer Science.
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Organization of Information LSIS Summer II (2005)
Reference Management Module I: Introduction By Rehema Chande-Mallya(PhD)
Richard Marciano Professor, University of Maryland iSchool Affiliate Professor, Computer Science Director, Digital Curation Innovation Center (DCIC) University.
Why Should You Apply to Graduate School? Masters Degree
Google Scholar and ShareLaTeX
PhD at CSE: Overview CSE department offers Doctoral degree in the Computer Science (CS) or Computer Engineering areas (CpE) at both MS to PhD and BS to.
Recent Trends in Text Mining
Recommendation in Scholarly Big Data
Digital Video Library - Jacky Ma.
Automatic Video Shot Detection from MPEG Bit Stream
Computer Science Department, University of Missouri, Columbia
School of Computer Science & Engineering
Efficient Image Classification on Vertically Decomposed Data
INFORMATION COMPRESSION, MULTIPLE ALIGNMENT, AND INTELLIGENCE
Lecture 15: Guest Lecture Professor Gordon Wetzstein
Major ILS disciplines What does iSchools like SILS study?
The 4th Industrial Revolution (4IR)
中国计算机学会学科前沿讲习班:信息检索 Course Overview
Panagiotis G. Ipeirotis Tom Barry Luis Gravano
Searching for and Accessing Information
Lecture 17: Guest Lecture Professor Gordon Wetzstein
Basic Intro Tutorial on Machine Learning and Data Mining
Efficient Image Classification on Vertically Decomposed Data
机器感知与智能教育部重点实验室学术报告 Key Laboratory of Machine Perception (Minister of Education) Peking University Scalable, Robust and Integrative Algorithms for Analyzing.
IL Step 3: Using Bibliographic Databases
Introduction of KNS55 Platform
CS6700 Advanced AI Prof. Carla Gomes Prof. Bart Selman
Publishing Solutions for Contemporary Scholars: The Library as Innovator and Partner Sarah E. Thomas University Librarian Cornell University Ithaca, NY.
Networked Information Resources
BENEFITS OF COMMUNICATIONS SOCIETY MEMBERSHIP
Jiangbin Zheng’s Brief Biography
McGraw-Hill Technology Education
Developing Institutional Data Repositories
Designing a Multi-channel Legal Knowledge Service Center Using Data Analysis and Contact Center Technology Amy J.C. Trappey1 Fu-Chiang Hsu1 Adam J-L. Hou1.
Automatic Handwriting Generation
Image Processing and Multi-domain Translation
IST 511 Information Management: Information and Technology
Artificial Intelligence and Trusted Microelectronics
Presentation transcript:

Professor C. Lee Giles David Reese Professor – IST; graduate Professor - CSE Adjunct Professor – Princeton, Pennsylvania, Columbia, Pisa, Trento Graduated over 30 PhDs Published over 600 papers with nearly 40,000 citations and h-index of 95, most use machine and deep learning and AI Intelligent and specialty search engines; cyberinfrastructure for science, academia and government; big data; deep learning Modular, scalable, robust, automatic science and technology focused cyberinfrastructure and search engine creation and maintenance Large heterogeneous data and information systems Specialty science and technology search engines for knowledge discovery & integration CiteSeerx (all scholarly documents – initial focus on computer science) (NSF funded) MathSeer (new math search engine) (Sloan funded) BBookX, ( Book generation, Question generation) (TLT funded) Scalable intelligent tools/agents/methods/algorithms Information, knowledge and data integration Information and metadata extraction; entity recognition Pseudocode, tables, figure, chemical formulae, equations, & names extraction Unique search, knowledge discovery, information integration, data mining algorithms Text in wild – machine reading, deep learning Strong collaboration record. Lockheed-Martin, FAST, Raytheon, IBM, Ford, Alcatel-Lucent, Smithsonian, Internet Archive, DARPA, Yahoo, Dow Chemical NSF, Sloan, Mellon 1

My work on neural networks Over 100 papers on NNs International Neural Network Society Dennis Gabor Award IEEE Computational Intelligence Society Pioneer Award in Neural Networks. Taught the first Neural Networks course at Princeton (1994) NN interests and pubs Text in the wild Compression (we beat Google) Recurrent neural networks as automata & grammars Recurrent neural network verification Neural networks in information retrieval and education

Millions of hits daily 1/2 million download PDFs daily (180M annual) 2nd most attacked site at Penn State

Automatic Metadata Information Extraction (IE) - CiteSeerX title, authors, affiliations, abst Header Table Databases Search index PDF Text IE Figure Converter Formulae Body Citations Many other open source academic document metadata extractors available – recent JCDL workshop, metadata hackathon, JCDL tutorial 2016

Deep Learning End-to-End Scene Text Reading Typical Pipeline Dozen papers in prestigious AI and Computer Vision conferences Funded by NSF Expedition

Hybrid Deep Compression Standard method Ours Design an iterative, RNN-based hybrid estimator for decoding instead of using transformations. Replaces dequantizer and inverse encoding transform modules with a function approximator. Neural decoder is single layer RNN with 512 units. An iterative refinement algorithm learns an iterative estimator of this function approximator Exploits both causal & non-causal information to improve low bit rate reconstruction. Applies to any image decoding problem Handles a wide range of bit rate values Uses multi-objective loss function for image compression. Uses a new annealing schedule - i.e annealed stochastic learning rate. Achieved +0.971 dB gain over Google neural model on Kodak Test set. Ororbia, Mali, DCC ‘19

Compression system - Google Model diagram for single iteration i of shared recurrent neural network (RNN) architecture [Toderici ‘15 , Toderici ‘16]

Grammatical Inference - RNNs Extract grammar rules from trained RNNs for Verification, Wang, AAAI VNN, ‘19

“The future ain’t what it used to be “The future ain’t what it used to be.” Yogi Berra, catcher/philosopher, NY Yankees. For more information http://clgiles.ist.psu.edu https://en.wikipedia.org/wiki/Lee_Giles giles@ist.psu.edu Why not use a deep learner?