Finding the Hidden Scenes Behind Android Applications Joey Allen Mentor: Xiangyu Niu CURENT REU Program: Final Presentation 7/16/2014.

Slides:



Advertisements
Similar presentations
1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Social media == new source of information and the ground for social interaction Twitter: Noisy and content-sparse data Question: Can we carve out fine.
Information retrieval – LSI, pLSI and LDA
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
Content Management & Hashtag Recommendation IN P2P OSN By Keerthi Nelaturu.
Title: The Author-Topic Model for Authors and Documents
LDA Training System 8/22/2012.
Final Project Presentation Name: Samer Al-Khateeb Instructor: Dr. Xiaowei Xu Class: Information Science Principal/ Theory (IFSC 7321) TOPIC MODELING FOR.
Statistical Topic Modeling part 1
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Generative Topic Models for Community Analysis
Tweetool ( version) Final Report Yilei Qian Computer Science University of Southern California A Twitter Recommend System.
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Topic Modeling with Network Regularization Md Mustafizur Rahman.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.
Latent Dirichlet Allocation a generative model for text
Modeling User Rating Profiles For Collaborative Filtering
British Museum Library, London Picture Courtesy: flickr.
Models for Authors and Text Documents Mark Steyvers UCI In collaboration with: Padhraic Smyth (UCI) Michal Rosen-Zvi (UCI) Thomas Griffiths (Stanford)
Topic Trends from CiteSeer Data Michal Rosen-Zvi Padhraic Smyth Mark Steyvers.
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
1Ort ML A Figures and References to Topic Models, with Applications to Document Classification Wolfgang Maass Institut für Grundlagen der Informationsverarbeitung.
1 Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi,
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará
DFCI Boston: Using the Weighted Histogram Analysis Method (WHAM) in cancer biology and the Yeast Protein Databank (YPD); Latent Dirichlet Analysis (LDA)
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Building Face Dataset Shijin Kong. Building Face Dataset Ramanan et al, ICCV 2007, Leveraging Archival Video for Building Face DatasetsLeveraging Archival.
Online Learning for Latent Dirichlet Allocation
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
2009 IEEE Symposium on Computational Intelligence in Cyber Security 1 LDA-based Dark Web Analysis.
Permission-based Malware Detection in Android Devices REU fellow: Nadeen Saleh 1, Faculty mentor: Dr. Wenjia Li 2 Affiliation: 1. Florida Atlantic University,
1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
27. May Topic Models Nam Khanh Tran L3S Research Center.
Integrating Topics and Syntax -Thomas L
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
PCI th Panhellenic Conference in Informatics Clustering Documents using the 3-Gram Graph Representation Model 3 / 10 / 2014.
 Goal recap  Implementation  Experimental Results  Conclusion  Questions & Answers.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Topic Modeling using Latent Dirichlet Allocation
Project 2 Latent Dirichlet Allocation 2014/4/29 Beom-Jin Lee.
Latent Dirichlet Allocation
Department of Automation Xiamen University
Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences Lu Bai,
Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:
Web-Mining Agents Topic Analysis: pLSI and LDA
Topical Scientific Community —A combined perspective of topic and topology Jin Mao Postdoc, School of Information, University of Arizona Sept 4, 2015.
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
Making Sense of Large Volumes of Unstructured Responses K. M. P. N. Jayathilaka Department of Statistics University of Colombo.
Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
The topic discovery models
The topic discovery models
People-LDA using Face Recognition
The topic discovery models
Topic Modeling Nick Jordan.
Michal Rosen-Zvi University of California, Irvine
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Latent Dirichlet Allocation
Topic Models in Text Processing
Presentation transcript:

Finding the Hidden Scenes Behind Android Applications Joey Allen Mentor: Xiangyu Niu CURENT REU Program: Final Presentation 7/16/2014

Previous Work Crawled Google Play Store Scraped Descriptions, Author, and Categories of Applications Applied LDA Model Descriptions Permissions Applied Author Topic Model Descriptions

APPIC Framework Figure 1. Flow Chart of APPIC Framework. 1.User Requests to Download App A. 2.Description, Category, and Permissions are filtered. 3.Category is assigned to C a. 1.Embedded Topic models auto-tag the description, S a, and permissions, p a. 2.C a, S a, and p a are compared. 1.If they all match, the app is considered safe.

LDA MODEL Latent Dirichlet Allocation (LDA) is a generative probabilistic model for collections of discrete data such as a text corpora [1]. The LDA Model creates topics that are distributions over words. The words in a document can then be compared to a set of topics, and a category can be chosen for a document. Figure 2. Graphical Representations of LDA Model [1].

Author Topic Model Author-topic model is a generative model for documents that extends LDA to include authorship information [2]. Authors are distributed over topics and topics are distributed over words. Figure 3. Graphical Model of Author-Topic Model [2].

Calculating Results User Reads Application Description Compare APPIC tags with Author’s Tags CI = Correct Inference II = Incorrect Inference CI = Correct Inference II = Incorrect Inference APPIC finds App in wrong category. (CI + 1) APPIC finds App in wrong category. (CI + 1) APPIC incorrectly categorizes application (II + 1) APPIC incorrectly categorizes application (II + 1) APPIC and author incorrectly categorize app. (II + 1) APPIC and author incorrectly categorize app. (II + 1) APPIC and author incorrectly categorize app. (II + 1) APPIC and author incorrectly categorize app. (II + 1)

LDA Results (Descriptions)

LDA Results (Permissions)

AT Results (Descriptions)

Comparison of Results Topic ModelResults LDA (3 Tags)83% LDA (2 Tags)64% Author-topic58% PLDA [3]88% [3] Topic ModelResults LDA (4 Tags)34% PDLA [3]77% [3]

Conclusion LDA performed better than AT at categorizing descriptions. More tags increase accuracy but decrease efficiency. AT model was not as accurate in categorizing applications. Useful for finding authors that create similar apps

Future Work Find a better method to calculate accuracy. Learn a different method to categorize permissions Dependencies between permissions and descriptions. Modify AT Model

D Document Author-Topic Model (Modified) β ϕ T Topic distribution over words w Word z Topic αθ A Distribution of permissions over topics x NdNd Permissions pdpd Uniform distribution of documents over permissions

References {slide #} [1] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” the Journal of machine Learning research, vol. 3, pp. 993–1022, [2] M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth, “The author-topic model for authors and documents,” in Proceedings of the 20th conference on Uncertainty in artificial intelligence, 2004, pp. 487–494. [3] Y. Yang, J. S. Sun, and M. W. Berry, “APPIC: Finding The Hidden Scene Behind Description Files for Android Apps.”