Instructor: Prof. Louis Chauvel

Slides:



Advertisements
Similar presentations
1 Adding a statistics package Module 2 Session 7.
Advertisements

A N I NTRODUCTION TO QDA M INER: or IS QDA MINER REALLY A BETTER SOLUTION FOR MIXED METHODS RESEARCH? By Normand Péladeau President Provalis Research Corp.
VORTEX Version Software Application Sociology; Marketing research; Social-psychological research Social-medical research Staff recruitment, staff.
Introduction to Research Design Statlab Workshop, Fall 2010 Jeremy Green Nancy Hite.
Livelihoods analysis using SPSS. Why do we analyze livelihoods?  Food security analysis aims at informing geographical and socio-economic targeting 
Introduction to SPSS Allen Risley Academic Technology Services, CSUSM
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
1 Software User Documentation Don Bagert CSSE 375, Rose-Hulman October 9, 2006.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
Everything I wish I had known about research design and data analysis… Statlab Workshop Fall 2006 Kyle Hood and Frank Farach.
Data Management: Documentation & Metadata Types of Documentation.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
RESEARCH HUB AT THE UNIVERSITY LIBRARIES PENN STATE UNIVERSITY TOUR OF STATISTICAL PACKAGES.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
What is R Muhammad Omer. What is R  R is the programing language software for statistical computing and data analysis  The R language is extensively.
Creating and publishing accessible course materials Practical advise you can replicate.
Introduction to SPSS (For SPSS Version 16.0)
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
MOUSING WITH SPSS Frances Provan, Information Services, Edinburgh University Useful point and click.
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
Qualitative and Quantitative Research Quantitative Deductive: transforms general theory into hypothesis suitable for testing Deductive: transforms general.
API-208: Stata Review Session Daniel Yew Mao Lim Harvard University Spring 2013.
The WinMine Toolkit Max Chickering. Build Statistical Models From Data Dependency Networks Bayesian Networks Local Distributions –Trees Multinomial /
Content Analysis Presented by: Eric S. Riley. What we’re going to cover – Fast…  What is Content Analysis  Rough History of Content Analysis  The Procedure.
Contact: Phil Benjamin: Web site for this presentation: eden.rutgers.edu/~pmben Office hour: Wed:
DC AAPOR Summer Conference, Washington DC June 21-22, 2012 Casey Langer Tesfaye American Institute of Physics Georgetown University Free Range Research.
RazorFish Data Exploration-KModes Data Exploration utilizing the K-Modes Clustering algorithm Performed By: Hilbert G Locklear.
Analyses using SPSS version 19
Analysis Introduction Data files, SPSS, and Survey Statistics.
Content Analysis Information Architecture – 2 nd assignment Sindre Hauge Thoresen Gabriel Izquierdo.
John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio.
+ CATPAC & WordStat Anne D. Sito & Erin Sonenstein COM 633: FA 09.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
1 © 2004 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Technical Support Seminar Using the Cisco Technical Support Website.
Research Methods In Psychology Dr. Jacquelyn H. Berry Department of Psychology State University of New York at New Paltz Online course offering – debuted.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
With the support of the LPP programme of the European Union 1 This project has been funded with support from the European Commission. This publication.
Survey Training Pack Session 14 – Transferring CSPro, Access and Excel Files to SPSS.
Human Computer Interaction Lecture 21 User Support
DISPLAYING DATA.
Introduction to Linux and R
XUSOM Research Day Wednesday, 23rd November 2016
AHG Advanced Techniques for PDF Accessibility
AP CSP: Cleaning Data & Creating Summary Tables
ITCS-3190.
Digital Text and Data Processing
PRESENTATION AND DISCUSSION OF RESEARCH FINDINGS
Introduction to Python
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Statistical Analysis with Excel
Presented by: Eric S. Riley
DEPARTMENT OF COMPUTER SCIENCE
Planting Seeds of Reproducibility
Statistical Analysis with Excel
SDMX Information Model
Data Tables and Drawing Schemes
Statistical Analysis with Excel
Creating Macros in Excel
This is where R scripts will load
ICT Word Processing Lesson 5: Revising and Collaborating on Documents
Good Morning AP Stat! Day #2
Process Description Tools
Lexico-grammar: From simple counts to complex models
Stata Basic Course Lab 2.
Periodic Processes Chapter 9.
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
This is where R scripts will load
This is where R scripts will load
Presentation transcript:

Instructor: Prof. Louis Chauvel Advanced Statistical Analysis: Text analysis with Stata txttool, R R.temis and Hyperbase (15FEB2019) Instructor: Prof. Louis Chauvel

This session General references What’s the matter? Text(ual) analysis, lexicometry, text mining, … STATA tools R tools Hyperbase

References: STATA ADVANCED MANUAL: Set of references « As usual »: http://www.louischauvel.org/stata_manuel_advanced.pdf Plus more recent …

Main references Find them online on http://www.a-z.lu/ ALMOST NONE, recently, apart: Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications Author: Gary Miner, , John, IV Elder, , Andrew Fast, , Thomas Hill, , Robert Nisbet, , and Dursun Delen Too much, too heavy, too general, but it is the reference …

SEE ALSO Find this online at : https://mhealth.jmir.org/2018/4/e101/

What is to be done? Open, long, answer to a question / issue / interview Typical case: 30-100 interviews of several minutes to 1 hours+ In general you personally know your sample And you have some additional indicators on what they are Description of contents of speech / content / matter / style … to understand major cleavages through what people say A quantitative extension of qualitative research

Typical processing: Data management Clean (lower case, punctuation, quotes, ???) Format the data (different in each software) Import the data in the software (many issues) “Stopwords” and lemmatization (suppress grammatical flection) "Stemmization" (see Porter Stemmer ) Data processing Dictionary and sub-counts of words  what they speak of and who Concordance / Correspondence of words  coherence of words / people Factor & cluster analyses  contrasts and grouping of words / people … interpretations … https://tartarus.org/martin/PorterStemmer/ https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html

Typical processing: Data processing Main issue  proximity of words and of texts  we need a matrix of relations (proximity)  many solutions (differences, ratios)  relatively common solution consider the row=words column=person table of frequency add .5 frequency per cell (zeros) log the frequency compute the mean of log(fr) per row keep the difference between log(fr) and mean of log(fr) per row + and – are a good indicator of attraction/replusion of word/pers Factor & cluster analyses

Qualitative research oriented Nvivo Atlas-ti Etc Softwares Qualitative research oriented Nvivo Atlas-ti Etc Quantitative method oriented STATA and R commands Hyperbase (FR) WordStat (for STATA) TextAnalyst

Neil Gorsuch Typical case 22 U.S. Senators R&D Opening Declaration in U.S. Ass. Justice Hearings https://www.youtube.com/watch?v=RlJEXiZONrQ https://www.congress.gov/115/chrg/shrg28638/CHRG-115shrg28638.htm

Neil Gorsuch Typical case 22 U.S. Senators Opening Declaration in U.S. Ass. Justice Hearings 22 extensive transcript (10 minutes) We know their names https://eugdpr.org/ GDPR issues? = NO, it is public… In the dataset: name+ d/r = political party and transcript You love U.S. politics ? https://en.wikipedia.org/wiki/Neil_Gorsuch_Supreme_Court_nomination#Committee https://en.wikipedia.org/wiki/United_States_Senate_Committee_on_the_Judiciary#Members,_115th_Congress See texts here http://www.louischauvel.org/Gorsuch.doc https://www.youtube.com/watch?v=RlJEXiZONrQ

Raw Material

PART 1 STATA and text analysis Have a Stata 13 minimum … The long string text has almost no limitation Copy-Paste is a simple way to import data So… Important STATA ssc install module: ssc install txttool Provides Porter stemming option (stem) and counts of words (bag) The rest is usual multidimensional descriptive analysis (factor and cluster) Exemple : STATA syntax http://www.louischauvel.org/gorsuch.do WE PROCEED NOW!

PART 2 R and text analysis R.temis, a new (V2) R Package (TExt MIning Solution) https://cran.r-project.org/web/packages/R.temis/R.temis.pdf First install the latest version of R-Studio (with the latest version of R) Install the package R.temis Additional formatting requirements Exemple : R-script http://www.louischauvel.org/gorsuch.R http://www.louischauvel.org/Rtemis_FR.docx https://rtemis.hypotheses.org/ https://cran.r-project.org/web/packages/R.temis/index.html WE PROCEED NOW!

PART 3 HYPERBASE and text analysis available for free here for free http://ancilla.unice.fr/ The + Free, robust, appropriate for multilingual contexts But old, French, and at some point you have to go back to part I = STATA

Main references Find them online on http://www.a-z.lu/ https://www.stata-journal.com/sjpdf.html?articlenum=dm0077 http://ancilla.unice.fr/bases/manuel.pdf