Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : STEPHEN T. O’ROURKE, RAFAEL A. CALVO and Danielle S. McNamara 2011, EST Visualizing.

Slides:



Advertisements
Similar presentations
Aggregating local image descriptors into compact codes
Advertisements

Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : KADIM TA¸SDEMIR, PAVEL MILENOV, AND BROOKE TAPSALL 2011,IEEE Topology-Based Hierarchical.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Intelligent Database Systems Lab Presenter : YU-TING LU Authors : Harun Ug˘uz 2011.KBS A two-stage feature selection method for text categorization by.
Intelligent Database Systems Lab Presenter: WU, JHEN-WEI Authors: Jorge Gorricha, Victor Lobo CG Improvements on the visualization of clusters in.
Self Organization of a Massive Document Collection
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Vector Space Model Any text object can be represented by a term vector Examples: Documents, queries, sentences, …. A query is viewed as a short document.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Intelligent Database Systems Lab Presenter: YU-TING LU Authors: Laurens van der Maaten and Geoffrey Hinton ML Visualizing non-metric similarities.
S IMILARITY M EASURES FOR T EXT D OCUMENT C LUSTERING Anna Huang Department of Computer Science The University of Waikato, Hamilton, New Zealand BY Farah.
Distributed Representations of Sentences and Documents
Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM ASSOCIATION FOR COMPUTING MACHINERY.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
1 Automatic Essay Scoring is Here and Now Online Welcome to CIT S234 Gary Greer University of Houston Downtown & Michelle Overstreet The College Board.
Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Xinxiong Chen, Yabin Zheng, Maosong Sun 2011, FCCNLL Automatic Keyphrase.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Topology Preservation in Self-Organizing Feature Maps: Exact.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Intelligent Database Systems Lab Presenter: Wu, Jhen-Wei Authors: Fabian Bürger, Josef Pauli ICPRAM. Representation Optimization with Feature Selection.
Introduction to String Kernels Blaz Fortuna JSI, Slovenija.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : E.J. Palomo, J. North, D. Elizondo, R.M. Luque, T. Watson NN Application of growing.
Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : HAI V. PHAM, ERIC W. COOPER, THANG CAO, KATSUARI KAMEI INFORMATION SCIENCES Hybrid.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller 國立雲林科技大學 National Yunlin University of.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Rival-Model Penalized Self-Organizing Map Yiu-ming Cheung.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : YUNG-MING LI, TSUNG-YING LI 2013, DSS Deriving market intelligence from microblogs.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,
Intelligent Database Systems Lab Presenter : JIAN-REN CHEN Authors : Wen Zhang, Taketoshi Yoshida, Xijin Tang 2011.ESWA A comparative study of TF*IDF,
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining concept maps from news stories for measuring civic scientific literacy in media Presenter :
Intelligent Database Systems Lab Presenter : CHANG, SHIH-JIE Authors : Ya-Han Hu, Fan Wu a, Chia-Lun Lo, Chun-Tien Tai b 2012.AIM. Predicting warfarin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Intro to Engineering Design
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
ViSOM - A Novel Method for Multivariate Data Projection and Structure Visualization Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Hujun Yin.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Community self-Organizing Map and its Application to Data Extraction Presenter: Chun-Ping Wu Authors:
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Tao Liu, Zheng Chen, Benyu Zhang, Wei-ying Ma, Gongyi Wu 2004.ICDM. Improving Text.
Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Emilio Corchado, Bruno Baruque 2012 NeurCom WeVoS-ViSOM: An ensemble summarization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.
Intelligent Database Systems Lab Presenter : YU-TING LU Authors : Hsin-Chang Yang, Han-Wei Hsiao, Chung-Hong Lee IPM Multilingual document mining.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Nonlinear Mapping for Data Structure Analysis John W.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : JAMAL A. NASIR, IRAKLIS VARLAMIS, ASIM KARIM, GEORGE TSATSARONIS KNOWLEDGE-BASED.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Pabitra Mitra Student Member 國立雲林科技大學 National Yunlin University.
Similarity Measures for Text Document Clustering
A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets Ashok Sharma, Robert Podolsky, Jieping.
Dimension reduction : PCA and Clustering
Multidimensional Scaling
Hypothesis Testing: The Difference Between Two Population Means
Topic 5: Cluster Analysis
Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement Rie Kubota Ando. Latent semantic space: Iterative.
Presentation transcript:

Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : STEPHEN T. O’ROURKE, RAFAEL A. CALVO and Danielle S. McNamara 2011, EST Visualizing Topic Flow in Students’ Essays

Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments

Intelligent Database Systems Lab Motivation Writing is an important learning activity, essays Visualizing is important that can help people assess and improve the quality of essays.

Intelligent Database Systems Lab Objectives This paper presents a novel document visualization technique and a measure of quality based on the average semantic distance between parts of a document.

Intelligent Database Systems Lab Methodology-Mathematical Framework In order to Visualization, so need to reduce dimension : term-by- paragraphs matrix topic model is created topic model is projected visualization of the document’s paragraphs Use NMF stop-words low frequency words stemming is applied 2-dimensional space identify features in the topic model of the document. Visualizing Topic Flow Quantifying Topic Flow term-by- sentence matrix topic model is created

Intelligent Database Systems Lab Methodology-Visualizing Topic Flow (term-by- paragraphs matrix) p1……pn i1. in i(term) j(paragraphs) If Log-Entropy is large, this word is more import Term’s Entropy in document Term’s frequency In paragraphs

Intelligent Database Systems Lab Methodology-Visualizing Topic Flow (NMF dimensionality reduction technique) Term-by-topic martix(m*r) Topic-by-paragraphs martix(r*n) Term-by-paragraphs martix (m*n) ≈ Ex.X(6,2)=w(6,3)*H(3,2) which can be approximated by minimizing the squared error of the Frobenius norm of X−WH. number of latent topics

Intelligent Database Systems Lab Methodology-Visualizing Topic Flow (2-dimensional representation) P1..Pj P Pi123 paragraph-paragraph triangular distance table Multidimensional Scaling use in Similarity comparison iterative majorization algorithm (least-squares) minimize a loss function (Stress) between the vector dissimilarities approximated distances in the low dimensional

Intelligent Database Systems Lab Methodology-Visualizing Topic Flow (Visualizing Flow ) the diameter of the grid equal to the maximum possible distance between any two paragraphs Paragraphs Next paragraphs node-link introduction conclusion Low grade High grade, Why? Because: 1. paragraphs appear close, 2. ‘introduction’ and‘conclusion’ is similar The degree of deviate from a circle

Intelligent Database Systems Lab Methodology-Quantifying Topic Flow Semantic distances between consecutive pairs of sentences or paragraphs Double average over all the pairs of sentences or paragraphs DI <=0, indicates a random topic flow DI> 0, indicates the presence of topic flow.

Intelligent Database Systems Lab Experiment - Evaluation 1: Flow and Grades (Experiment Dataset) Dataset:120 essays written for assignments by undergraduate students at Mississippi State University Essay grades :1-6 level Subset:High:67(1-3)Low:53(3.2-6) k(number of topic):5 Average wordAverage sentence Average paragraphy Each essay726.60(114.37)40.03(8.29)5.55(1.32)

Intelligent Database Systems Lab Experiment - Evaluation 1: Flow and Grades (Measuring Topic Flow ) less present using either of the dimensionality reduction techniques P<0.05 P>0.05

Intelligent Database Systems Lab Experiment - Evaluation 1: Flow and Grades (Measuring Topic Flow ) Measure the correlation

Intelligent Database Systems Lab Experiment - Evaluation 2: Supporting Assessment(Methodology) 1.inter-rater agreement that the tutors had with two expert raters. 2. The two tutors independently marked assignments with map and no map hypothesized : Essay’s agreement can be subjectively assessed faster, more accurately, and more consistently with map. answer

Intelligent Database Systems Lab Experiment - Evaluation 2: Supporting Assessment(Essay Subset Preparation ) The 40 essays remaining were divided into two subsets of 20 essays each according to the MASUS procedure to assess subest1subest2

Intelligent Database Systems Lab Experiment - Evaluation 2: Supporting Assessment(Results) Rater1:native English speaker Rater2: non-native English speaker In order to eliminate the effect of essay length

Intelligent Database Systems Lab Conclusions Tutors assess the essays faster and more accurately and consistently with the aid of topic flow visualization.

Intelligent Database Systems Lab Comments Advantages – effectively discover market intelligence (MI) for supporting decision-makers. Applications – Document visualizations.