The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign RankFP : A Framework for Rank Formulation and Processing Hwanjo Yu, Seung-won.

Slides:



Advertisements
Similar presentations
PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
Advertisements

Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
Supporting Ad-Hoc Ranking Aggregates Chengkai Li (UIUC) joint work with Kevin Chang (UIUC) Ihab Ilyas (Waterloo)
1 Building a Dictionary of Image Fragments Zicheng Liao Ali Farhadi Yang Wang Ian Endres David Forsyth Department of Computer Science, University of Illinois.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Desheng Liu, Maggi Kelly and Peng Gong Dept. of Environmental Science, Policy & Management University of California, Berkeley May 18, 2005 Classifying.
George Lee User Context-based Service Control Group
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.
Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.
OCFS: Optimal Orthogonal Centroid Feature Selection for Text Categorization Jun Yan, Ning Liu, Benyu Zhang, Shuicheng Yan, Zheng Chen, and Weiguo Fan et.
Automated Inventory Management Chao Li CS 491A Winter 06.
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
Query-Based Outlier Detection in Heterogeneous Information Networks Jonathan Kuck 1, Honglei Zhuang 1, Xifeng Yan 2, Hasan Cam 3, Jiawei Han 1 1 University.
Scalable Text Mining with Sparse Generative Models
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
A Time Based Approach to Musical Pattern Discovery in Polyphonic Music Tamar Berman Graduate School of Library and Information Science University of Illinois.
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Practical Issues with SVM. Handwritten Digits:
EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois.
“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Keyword Query Routing.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Dual Transfer Learning Mingsheng Long 1,2, Jianmin Wang 2, Guiguang Ding 2 Wei Cheng, Xiang Zhang, and Wei Wang 1 Department of Computer Science and Technology.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.
Supporting Ranking and Clustering as Generalized Order-By and Group-By Chengkai Li (UIUC) joint work with Min Wang Lipyeow Lim Haixun Wang (IBM) Kevin.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax.
NTU & MSRA Ming-Feng Tsai
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Organizing Structured Web Sources by Query Schemas: A Clustering Approach Bin He Joint work with: Tao Tao, Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
Matching References to Headers in PDF Papers Tan Yee Fan 2007 December 19 WING Group Meeting.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
1 Agenda TMA02 M876 Block 4. 2 Model of database development data requirements conceptual data model logical schema schema and database establishing requirements.
Roughly overview of Support vector machines Reference: 1.Support vector machines and machine learning on documents. Christopher D. Manning, Prabhakar Raghavan.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Ranking and Learning 290N UCSB, Tao Yang, 2014
Deep Predictive Model for Autonomous Driving
Seung-won Hwang, Kevin Chen-Chuan Chang
Lecture 18: SQL and UFDs.
Martin Rajman, EPFL Switzerland & Martin Vesely, CERN Switzerland
Disambiguation Algorithm for People Search on the Web
Panagiotis G. Ipeirotis Luis Gravano
Learning to Rank with Ties
Presentation transcript:

The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign RankFP : A Framework for Rank Formulation and Processing Hwanjo Yu, Seung-won Hwang, Kevin Chen-Chuan Chang

AIM 2 The Context: AIMing to the Top Enabling ad-hoc ranking in data retrieval Rank Formulation Rank Processing select * from houses order by [ranking function F] limit 3 ranked results query top-3 houses

AIM 3 Problem: Enabling Ad-hoc Ranking To enable ad-hoc ranking, we observe two major barriers: Usability: Ranking should be “user-friendly”, for ordinary users to easily specify their ranking criteria Efficiency: Ranking should be “DB-friendly” to be amenable to efficient processing  We propose a framework combining user-friendly formulation and DB-friendly processing.

AIM 4 Our Insight: Combining Usability and Efficiency We combine qualitative model for usability and quantitative model for efficiency Qualitative model  Query condition is represented as a relative ordering of objects  User-friendly by alleviating user from specifying the absolute score on each object  Example > Quantitative model  Query condition is represented as a mapping F of objects into absolute numerical scores  DB-friendly, by attaining the absolute score on each object  Example F( )=0.9 F( )=0.5

AIM 5 Our Solution: RankFP (RANK Formulation and Processing) For usability, we propose a qualitative formulation front-end which enables rank formulation by ordering samples For efficiency, we learn a quantitative ranking function F which is readily expressible using order by clause in SQL sample S (unordered) Sample Selection: generate new S Function Learning: learn new F ranking R* over S Over S: R F  R* ? no yes F ranking function Rank Formulation Rank Processing ranked results processing of Q Q: select * from houses order by F limit k

AIM 6 Implementation SVM Learner order by F sampled top results interface PostgreSQL Top-k results if R F  R* ?

AIM 7 Task 1: Rank Formulation Front-end (Ranking  Classification) Challenge: Unlike a conventional learning problem of classifying objects into groups, we need to learn a function inducing a desired ordering of all objects Solution: Transform ranking into a classification on pairwise differences [Herbrich2000] and adopt learning algorithms (e.g., SVM) to learn pairwise classification function F learning algorithms: a binary classifier 1 - F a - b b - c c - d d - e a - c … … ranking view: c > b > d > e > a c b d e a classification view: pairwise diff. classification [Herbrich2000] R. Herbrich, et. al. Large margin rank boundary for ordinal regression. MIT Press, 2000.

AIM 8 Task 2: Rank Processing Back-end (Classification  Ranking) Challenge: While the classification is for each pair of objects, we need to efficiently rank the entire database. Solution: We develop duality connecting a pairwise classification function F, also as a global per-object ranking function. Suppose the rank function F is linear Classification View: Ranking View: F(u i -u j )>0  F(u i )- F(u j )>0  F(u i )> F(u j ) b d e a c F(a-b)? F(a-c)? F(a-d)? ….. F Rank with F(. ) e.g., F(c)>F(b)>F(d)>… Further: Optimization of Top-k Order-by [SIGMOD’05]

AIM 9 Conclusion: Summary To support ranking for data retrieval, we develop RankFP, an iterative learning and processing framework, combining: Usability: Developing a learning front-end, which enables qualitative rank formulation Efficiency: Transforming the classification to a global rank function for efficient processing

AIM 10 Thank You! For more information: The AIM Project: