Professor C. Lee Giles David Reese Professor – IST; graduate Professor - CSE Adjunct Professor – Princeton, Pennsylvania, Columbia, Pisa, Trento Graduated over 30 PhDs Published over 600 papers with nearly 40,000 citations and h-index of 95, most use machine and deep learning and AI Intelligent and specialty search engines; cyberinfrastructure for science, academia and government; big data; deep learning Modular, scalable, robust, automatic science and technology focused cyberinfrastructure and search engine creation and maintenance Large heterogeneous data and information systems Specialty science and technology search engines for knowledge discovery & integration CiteSeerx (all scholarly documents – initial focus on computer science) (NSF funded) MathSeer (new math search engine) (Sloan funded) BBookX, ( Book generation, Question generation) (TLT funded) Scalable intelligent tools/agents/methods/algorithms Information, knowledge and data integration Information and metadata extraction; entity recognition Pseudocode, tables, figure, chemical formulae, equations, & names extraction Unique search, knowledge discovery, information integration, data mining algorithms Text in wild – machine reading, deep learning Strong collaboration record. Lockheed-Martin, FAST, Raytheon, IBM, Ford, Alcatel-Lucent, Smithsonian, Internet Archive, DARPA, Yahoo, Dow Chemical NSF, Sloan, Mellon 1
My work on neural networks Over 100 papers on NNs International Neural Network Society Dennis Gabor Award IEEE Computational Intelligence Society Pioneer Award in Neural Networks. Taught the first Neural Networks course at Princeton (1994) NN interests and pubs Text in the wild Compression (we beat Google) Recurrent neural networks as automata & grammars Recurrent neural network verification Neural networks in information retrieval and education
Millions of hits daily 1/2 million download PDFs daily (180M annual) 2nd most attacked site at Penn State
Automatic Metadata Information Extraction (IE) - CiteSeerX title, authors, affiliations, abst Header Table Databases Search index PDF Text IE Figure Converter Formulae Body Citations Many other open source academic document metadata extractors available – recent JCDL workshop, metadata hackathon, JCDL tutorial 2016
Deep Learning End-to-End Scene Text Reading Typical Pipeline Dozen papers in prestigious AI and Computer Vision conferences Funded by NSF Expedition
Hybrid Deep Compression Standard method Ours Design an iterative, RNN-based hybrid estimator for decoding instead of using transformations. Replaces dequantizer and inverse encoding transform modules with a function approximator. Neural decoder is single layer RNN with 512 units. An iterative refinement algorithm learns an iterative estimator of this function approximator Exploits both causal & non-causal information to improve low bit rate reconstruction. Applies to any image decoding problem Handles a wide range of bit rate values Uses multi-objective loss function for image compression. Uses a new annealing schedule - i.e annealed stochastic learning rate. Achieved +0.971 dB gain over Google neural model on Kodak Test set. Ororbia, Mali, DCC ‘19
Compression system - Google Model diagram for single iteration i of shared recurrent neural network (RNN) architecture [Toderici ‘15 , Toderici ‘16]
Grammatical Inference - RNNs Extract grammar rules from trained RNNs for Verification, Wang, AAAI VNN, ‘19
“The future ain’t what it used to be “The future ain’t what it used to be.” Yogi Berra, catcher/philosopher, NY Yankees. For more information http://clgiles.ist.psu.edu https://en.wikipedia.org/wiki/Lee_Giles giles@ist.psu.edu Why not use a deep learner?