Selected Applications of Transfer Learning 杨强,Qiang Yang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong http://www.cse.ust.hk/~qyang 1
Case 1: 目标变化 目标迁移 Target Class Changes Target Transfer Learning Training: 2 class problem Testing: 10 class problem. Traditional methods fail Solution: find out what is not changed bewteen training and testing
Our Work Cross-Domain Learning Translated Learning TrAdaBoosting (ICML 2007) Co-Clustering based Classification (SIGKDD 2007) TPLSA (SIGIR 2008) NBTC (AAAI 2007) Translated Learning Cross-lingual classification (in WWW 2008) Cross-media classification (In NIPS 2008) Unsupervised Transfer Learning Self-taught clustering (ICML 2008) We have published several papers about cross-domain transfer learning in top conferences in the past year, including ICML, AAAI, KDD, PKDD, SIGIR, WWW. One of our papers won PKDD 2007 Best Student Paper Award last year. 3
Our Work (cont) Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu. Translated Learning. In Proceedings of Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008), December 8, 2008, Vancouver, British Columbia, Canada. (Link) Xiao Ling, Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu. Cross-Domain Spectral Learning. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM KDD 2008), Las Vegas, Nevada, USA, August 24-27, 2008. 488-496 (PDF) Wenyuan Dai, Qiang Yang, Gui-Rong Xue and Yong Yu. Self-taught Clustering. In Proceedings of the 25th International Conference on Machine Learning (ICML 2008), Helsinki, Finland, 5-9 July, 2008. 200-207 (PDF) Wenyuan Dai, Qiang Yang, Gui-Rong Xue and Yong Yu. Boosting for Transfer Learning. In Proceedings of The 24th Annual International Conference on Machine Learning (ICML'07) Corvallis, Oregon, USA, June 20-24, 2007. 193 - 200 (PDF) Wenyuan Dai, Gui-Rong Xue, Qiang Yang and Yong Yu. Co-clustering based Classification for Out-of-domain Documents. In Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM KDD'07), San Jose, California, USA, Aug 12-15, 2007. Pages 210-219 (PDF) Dou Shen, Jian-Tao Sun, Qiang Yang and Zheng Chen. Building Bridges for Web Query Classification. In Proceedings of the 29th ACM International Conference on Research and Development in Information Retrieval (ACM SIGIR 06). Seattle, USA, August 6-11, 2006. Pages 131-138. (PDF)
Query Classification and Online Advertisement ACM KDDCUP 05 Winner SIGIR 06 ACM Transactions on Information Systems Journal 2006 Joint work with Dou Shen, Jiantao Sun and Zheng Chen
QC as Machine Learning Inspired by the KDDCUP’05 competition Classify a query into a ranked list of categories Queries are collected from real search engines Target categories are organized in a tree with each node being a category 6
Related Works Query Classification/Clustering Document/Query Expansion Classify the Web queries by geographical locality [Gravano 2003]; Classify queries according to their functional types [Kang 2003]; Beitzel et al. studied the topical classification as we do. However they have manually classified data [Beitzel 2005]; Beeferman and Wen worked on query clustering using clickthrough data respectively [Beeferman 2000; Wen 2001]; Document/Query Expansion Borrow text from extra data source Using hyperlink [Glover 2002]; Using implicit links from query log [Shen 2006]; Using existing taxonomies [Gabrilovich 2005]; Query expansion [Manning 2007] Global methods: independent of the queries Local methods using relevance feedback or pseudo-relevance feedback 7
Target-transfer Learning in QC Classifier, once trained, stays constant Target Classes Before Sports, Politics (European, US, China) Target Classes Now Sports (Olympics, Football, NBA), Stock Market (Asian, Dow, Nasdaq), History (Chinese, World) How to allow target to change? Application: advertisements come and go, but our querytarget mapping needs not be retrained! We call this the target-transfer learning problem
Solutions: Query Enrichment + Staged Classification Solution: Bridging classifier 9
Step 1: Query enrichment Textual information Category information Full text Title Snippet Category 10
Step 2: Bridging Classifier Wish to avoid: When target is changed, training needs to repeat! Solution: Connect the target taxonomy and queries by taking an intermediate taxonomy as a bridge 11 11
Bridging Classifier (Cont.) How to connect? The relation between and The relation between and Prior prob. of The relation between and 12 12
Category Selection for Intermediate Taxonomy Category Selection for Reducing Complexity Total Probability (TP) Mutual Information 13 13
Experiment ─ Data Sets & Evaluation ACM KDDCUP Starting 1997, ACM KDDCup is the leading Data Mining and Knowledge Discovery competition in the world, organized by ACM SIG-KDD. ACM KDDCUP 2005 Task: Categorize 800K search queries into 67 categories Three Awards (1) Performance Award ; (2) Precision Award; (3) Creativity Award Participation 142 registered groups; 37 solutions submitted from 32 teams Evaluation data 800 queries randomly selected from the 800K query set 3 human labelers labeled the entire evaluation query set Evaluation measurements: Precision and Performance (F1) We won all three. a 14 / 68 14
Result of Bridging Classifiers Performance of the Bridging Classifier with Different Granularity of Intermediate Taxonomy Using bridging classifier allows the target classes to change freely no the need to retrain the classifier! 15
Summary: Target-Transfer Learning Intermediate Class classify to Query Similarity Target class
Cross-Domain Learning Input Output Here is a example about concept-domain transfer. For example, in the task of classifying air-plain and car, the training data are all about cartoon, while the test data are all real objects. We know cartoon objects and real objects share the key features about the target objects, while they also have a large amount of distinguished features. The difficulty of this problem is how to find the key features which can describe both cartoon and real objects. 17
Case 1 Source Target Target and source domains Many labeled instances Few labeled instances Target and source domains Same feature representation Same classes Y (binary classes) Different P(X,Y) distribution
TrAdaBoost = Transfer AdaBoost (cont.) Given Insufficient labeled data from the target domain (primary data) Labeled data following a different distribution (auxiliary data) The auxiliary data are weaker evidence for building the classifier Target training source + target Uniform weights (X) In ICML 2007, we have one paper which solve the problem of using a large amount of auxiliary data to help the classification in a different domain. We use the boosting theory to model our algorithm, so our algorithm is called transfer adaboost (or tradaboost). The basic idea is if we only have very few target training data, we don’t know where the decision boundary is exactly. If we have some auxiliary data, these data may help us to find the decision boundary. But the auxiliary data are in low quality, and may tell us something wrong. For example, in the third figure, the decision boundary based on auxiliary data misclassifies the target training data. So, we have to correct the mistake. 19
TrAdaBoost = Transfer AdaBoost (cont.) Misclassified examples: increase the weights of the misclassified target data decrease the weights of the misclassified source data Our idea is to adjust the weights of misclassified examples. Specifically, increase the weights of the misclassified primary data decrease the weights of the misclassified auxiliary data as in this figure So that, the decision boundary moves towards the right direction. We model our algorithm under the boosting theory. So, we can prove the error bound of our algorithm as show in this theorem. 20
TrAdaBoost = Transfer AdaBoost (cont.) Performance
Transfer Learning in Sensor Network Tracking Received-Signal-Strength (RSS) based localization in an Indoor WiFi environment. Access point 2 Mobile device Access point 1 Access point 3 -30dBm -70dBm -40dBm (location_x, location_y) Where is the mobile device? 22
Distribution Changes The mapping function f learned in the offline phase can be out of date. Recollecting the WiFi data is very expensive. How to adapt the model ? Time Night time period Day time period 23
Transfer Learning in Wireless Sensor Networks Transfer across time Transfer across space Transfer across device
Latent Space based Transfer Learning (Spatial Transfer) Transfer Localization Models across Space [Pan, Yang et al. AAAI 08] Some labeled data collected in Area A and unlabeled data in B; Only a few labeled data collected in Area B; Want to: Construct a localization model of the whole area (Area A and Area B)
Transfer across time Area: 30 X 40 (81 grids) Six time periods: LeMan: Static mapping function learnt from offline data; LeMan2: Relearn the mapping function from a few online data LeMan3: Combine offline and online data as a whole training data to learn the mapping function. Area: 30 X 40 (81 grids) Six time periods: 12:30am--01:30am 08:30am--09:30am 12:30pm--01:30pm 04:30pm--05:30pm 08:30pm--09:30pm 10:30pm--11:30pm 26
Transfer knowledge via latent manifold learning Labeled WiFi Data Labeled WiFi Data Latent Manifold Knowledge Propagation
VIP Recommendation in Tencent Weibo Properties: Friendship relations in Tencent QQ, which is the largest instant messenge network 1. Data Sparsity: limited neighbors for most users Knowledge Transfer 2. Heterogeneous Links: symmetric friendship vs. asymmetric following 3. Large Data: 1 billion users and tens of billion links 28
Social Relation based Transfer (SORT) VIP Recommendation Based on One's 1. X: Friendship on QQ 2. S1: User Following Relations on Tencent Weibo 3. S2: VIP Following Relations on Tencent Weibo
Social App Recommendation in Tecent Qzone Other Applications Social App Recommendation in Tecent Qzone Qzone (http://qzone.qq.com) is the largest social network in China. Video Recommendation in Tencent Video Four types of auxiliary data 1. binary ratings 2. social networks 3. context 4. video content Rating Prediction 30
Activity Recognition With sensor data collected on mobile devices Location GPS, Wifi, RFID Context: location, weather, etc. From GPS, RFID, Bluetooth, etc. Various models can be used Non-sequential models: Naïve Bayes, SVM … Sequential models: HMM, CRF …
Activity Recognition: Input & Output (Vincent Zheng, A* Sg) Context and locations Time, history, current/previous locations, duration, speed, Object Usage Information Trained AR Model Training data from calibration Calibration Tool: VTrack Output: Predicted Activity Labels Running? Walking? Tooth brushing? Having lunch? http://www.cse.ust.hk/~vincentz/Vtrack.html 32
Datasets: MIT PlaceLab http://architecture. mit. edu/house_n/placelab MIT PlaceLab Dataset (PLIA2) [Intille et al. Pervasive 2005] Activities: Common household activities 33
Cross Domain Activity Recognition [Zheng, Hu, Yang, Ubicomp 2009] Challenges: A new domain of activities without labeled data Cross-domain activity recognition Transfer some available labeled data from source activities to help training the recognizer for the target activities. Cleaning Indoor Laundry Dishwashing 34 34
How to use the similarities? Example: sim(“Make Coffee”, “Make Tea”) = 0.6 <Sensor Reading, Activity Name> Example: <SS, “Make Coffee”> Example: Pseudo Training Data: <SS, “Make Tea”, 0.6> Similarity Measure THE WEB Target Domain Pseudo Labeled Data Source Domain Labeled Data Weighted SVM Classifier 35 35
Calculating Activity Similarities How similar are two activities? Use Web search results TFIDF: Traditional IR similarity metrics (cosine similarity) Example Mined similarity between the activity “sweeping” and “vacuuming”, “making the bed”, “gardening” 36