Real-Time Tweet Analysis W/ Maltego Carbon 3.5.3

Slides:



Advertisements
Similar presentations
Creating Collaborative Partnerships
Advertisements

Real-Time Tweet Analysis W/ Maltego Carbon 3.5.3
NHnetWORKS December 14,  Facebook is a global Social Networking website that is operated and privately owned by Facebook, Inc.  Users can add.
By Lee Betancourt Director of Communications and Public Relations Jane Myers Public Relations, Communications and Social Media Coordinator Social Media.
WHAT IS SOCIAL MEDIA SAYING ABOUT YOU? MAKE IT WORK FOR YOUR CAREER SARA MEANEY PARTNER, VICE PRESIDENT COMET BRANDING – HANSON DODGE CREATIVE.
Web 2.0: Concepts and Applications 5 Connecting People.
Web 2.0: Concepts and Applications 5 Connecting People.
Frank Yu Australian Bureau of Statistics Unstructured Data 1.
Information | Analytics | Expertise SOCIAL MEDIA INTELLIGENCE Practical Strategies for Using Social Media to Enhance Security AUGUST 2014 © 2014 IHS IHS.
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
IVITA Workshop Summary Session 1: interactive text analytics (Session chair: Professor Huamin Qu) a) HARVEST: An Intelligent Visual Analytic Tool for the.
TC2-Computer Literacy Mr. Sencer February 4, 2010.
Visualization Tools for Twitter A review and analysis of visualization tools in the Twitter domain By Joseph Vincze.
More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,
TECHNOLOGICAL ENABLERS TO ASSIST YOUR LIBRARY'S MARKETING STRATEGIES: THE POWER OF SOCIAL MEDIA PRESENTED BY MS MOSHIANE RAMAUBE MS MANDISA LAKHENI.
TAG-Org Websites 1. Why Websites ? Branding: Since it's our website, we can set the design and build the awareness of our brand. To create our own Online.
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
FROM SOCIAL TO EDUCATION USES By: Elite Educators.
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
Nobody’s Unpredictable Ipsos Portals. © 2009 Ipsos Agenda 2 Knowledge Manager Archway Summary Portal Definition & Benefits.
Social Media Social media is not new – Opinion pages – Telephone – Others? What is new – Fewer intermediaries – Individual ability to broadcast – Massive.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
OCLC Online Computer Library Center 1 Social Media and Advocacy.
Using Social Media for Fundraising and Communication with Supporters Lindsay Boyle – Communications & Research Coordinator Claire Chapman – Information.
What Is Text Mining? Also known as Text Data Mining Process of examining large collections of unstructured textual resources in order to generate new.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Building a Social Media Presence Participants will look at the BCPS social media outlets (Twitter, Facebook, Flickr, Vimeo, Instagram, blogs) and relevant.
 Smartphones – iPhone, Android, Blackberries, etc  Tablets – iPad, Android, Windows, Google, etc.  Computers Basically anything that can connect to.
Social Media & Social Networking 101 Canadian Society of Safety Engineering (CSSE)
1© 2015 IBM Corporation Unlocking the power of the API economy Client Briefing Nov.
Getting Started Telligent or SharePoint (or Hybrid)?
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Athabasca University COMP 683 Introduction to Learning and Knowledge Analytics Project Analytics Model Darin Hobbs
Data mining in web applications
Internet Business Associate v2.0
Ing. Athanasios Podaras, Ph.D
SOCIAL MEDIA Etimesgut Halk Eğitim Merkezi - Ankara.
Introduction to Information and Communication Technologies
Contextual Intelligence as a Driver of Services Innovation
Social Media Measurement Tools
Memory Standardization
Personalized Social Image Recommendation
Summary Presented by : Aishwarya Deep Shukla
Business in a Connected World
Power of Social Media Analytics
Engaging Audiences With Social Media
MID-SEM REVIEW.
Computer Assisted Language Learning & Multimedia Language Learning
Social Media For All.
Social Media Data Mining
SOCIAL MEDIA MARKETING
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Overview Social media applications inform, educate, and entertain people through online (multi-)media A social networking application allows users to create.
Information Retrieval and Web Search
Weichuan Dong Qingsong Liu Zhengyong Ren Huanyang Zhao
Social Knowledge Mining
Multi-Dimensional Data Visualization
Web 2.0 Technologies and Community Building Online by
SOCIAL MEDIA Etimesgut Halk Eğitim Merkezi - Ankara.
Social Media Strategies for Sharing Agricultural Knowledge
Speech Capture, Transcription and Analysis App
Building Relationships One Tweet at a Time
Lesson 2: Internet Communication
Text Mining & Natural Language Processing
Ben Jones - S Rebecca Hunter - S
Text Mining & Natural Language Processing
Social Media Management
Information Retrieval and Web Search
The Internet and Electronic mail
Helpful Things To Know For Successful Digital Marketing Strategy Presented By:- Abhinav Shashtri.
Presentation transcript:

Real-Time Tweet Analysis W/ Maltego Carbon 3.5.3 (and Maltego Chlorine 3.6.0)

Overview Self-intros Twitter Facts Maltego Carbon Facts Your ideas for data extractions Twitter Facts Internet as Database Maltego Carbon Facts Tweet Analyser (sic) “Machine” Human “Sensor Networks” Event Graphing “Tweet Analyzer” Data Extraction as a Jumping-Off Point to Further Research Computer-Enhanced Data Mining Content Mining Structure Mining Assertability and Qualifiers Your ideas for research

Self-Intros Experiences with social media platforms? Areas of research interest? Particular topics you want addressed, questions you want answered? Your ideas to “seed” data extractions #hashtags @mentions @names Keywords Phrases Names Events, and others

Twitter Facts So-called “SMS of the Internet”: “short message service”, 140 characters, culture of “status updates” Multilingual Platform: Available in 33 languages (URL Encode/Decode sometimes needed for some languages) Linguistic Sub-communities/Subgraphs: Identification of linguistic sub- communities in various networks Those on Twitter: 500 million+ users (as of late 2014), hundreds of millions of Tweets a day 8% automated or robot accounts (“Twitterbots”); also automated sensor accounts; also cyborg accounts (part-human, part-automation) Those not on Twitter: Blocked in N. Korea, China, and Iran; individual Tweets censored from certain countries and regions at the requests of governments

Twitter Facts (cont.) Tweets: Text, abbreviations, shortened URLs, images, and videos; used complementarily with online sites (highly linked) Microblogging Grammar and Syntax: @, #, and others; replies; retweets; @mentions; labeled conversations on a shared topic; favorites; embed Tweets on another Web page Synchronic Conversations: The assumptions of (near) real-time interactivity and relational intimacy across social and parasocial relationships, distances, cultures, and identities Volatile Micro(nano)blogging Messaging: “Bursty” popularity but fading / decaying within hours (brief temporal scales, fleeting user attention), based on “survival analysis” Seems like Ephemera, but Not: Archival of Tweets by the Library of Congress (not sure how usable, findable) Public messages may be quickly deleted but are always already recorded and captured

Data Extractions from Twitter Twitter Facts (Cont.) Data Extractions from Twitter Public (Released) Data Only: Twitter application programming interfaces (APIs) allow access to public data only, not private data Two Types of Data Extractions: Slice-in-time (cross-sectional) or continuous data (both rate-limited by Twitter’s API) Whitelisting: Need to be white-listed (with a verified account) for enhanced API access Historical Twitter data beyond a week or so generally requires going with a Twitter-approved commercial company to do the extraction

Internet as Database Web 2.0: The Social Web Social networking sites (Facebook, LinkedIn) Microblogging (Twitter) Blogging Wikis Content sharing sites (YouTube, Flickr, Vimeo, SlideShare, and others) Collaborative encyclopedias (Wikipedia) Surface Web (and Internet) http networks Content networks Technological understructures Hidden or Deep Web …

Maltego Carbon Facts Penetration (“Pen”) Testing Tool Mapping URLs and “http networks” Reconnaissance on the understructure of web presences and technologies used Geolocation of online contents (GPS coordinates to online content) Extractions of social networks on Facebook and Twitter Conversions of various types of online contents to other related information De-aliasing identities Tying an individual to phone numbers and emails Parameter-setting: 12 entities – 10K entities (results) Caveats: Noisy data, challenges with disambiguation, challenges with knowing how large of a sample was collected (from the amount available)

Maltego Carbon Facts (cont.) Machines and Transforms: Data extractions and visualizations “machines”—sequences of scripted data extractions “transforms”—converting one type of information to other types Relationships of online contents (expressed as undirected 2D graphs) Web-based Application Programming Interfaces: Use of web-based application programming interfaces (APIs) of various social media platforms Versions: Commercial vs. (limited) community versions Company: Created by Paterva, a S. African software company

Tweet Analyzer “Machine”

Tweet Analyzer Machine (Cont.) Dynamic and continuous iterated extractions Pay attention to the status or progress bar because some analyses take some time to get started. The sentiments (positive, negative, and neutral) do not show up until a sufficient number of messages are collected. Text-seeded Links Tweet topics, social media accounts (“Twits”), URLs (uniform resource locators), and digital contents on the Web and Internet Clusters related (potentially similar) Tweets Outputs data as various types of 2D graphs (static and dynamic) and as entity lists in tables (partially exportable from Maltego as .xlsx files)

The AlchemyAPI Runs an automated sentiment analysis tool (by AlchemyAPI, which uses both a linguistic and statistical-based analysis of language and built off of using a Web corpus of 200 billion words as a training corpus) against the Tweets captured by Maltego Carbon / Chlorine in a streaming way AlchemyAPI, which is part of IBM Watson (recently acquired), retrains its cloud- based (software as a service) algorithm monthly on Web-extracted data (which is mostly unstructured data) The API can identify over 100 languages (for cross-lingual analysis); there are eight (8) main languages supported for most AlchemyAPI services New services being introduced include machine vision, particularly object recognition and facial recognition in collected images (based on “deep learning” techniques)

The AlchemyAPI (cont.) Messaging is classified as positive (close to +1), negative (close to -1), or neutral (close to or equal to 0) based on semantics, co-occurring words in proximity, and statistical analysis (probabilities, levels of certitude or confidence) Probably TMI: IBM Watson may enable “personality insights” from the extracted textual data The AlchemyAPI is offered as a web-based service for app developers to enable various data extractions and analytics (on an apparent micropayment context)

A Brief History of AlchemyAPI Founded in 2005 Has 40,000+ developers around the world Is used in 36+ countries Has three main APIs: AlchemyLanguage, AlchemyVision, and AlchemyData (news service) AlchemyLanguage enables named entity extraction, sentiment analysis, keyword extraction, relations extraction, and taxonomy creation (classification of text documents based on thousands of categories and subcategories) AlchemyVision enables image analysis and tagging; face detection (along with the identification of gender, age, and other identifying information) AlchemyData enables the extraction of named entities, events, locations, dates, and other relevant information for text summarization of news events

Human “Sensor Networks” Use of each human “node” in a network as a sharer of information Benefitting from human presence and locational coverage (location-aware devices and applications) Benefitting from human sensing (awareness) and intelligence Filtered through perception, cognition, emotion, and thought (mental processing) Benefitting from smart device sensing Enhanced with photographic-, audio-, and video-recording capabilities; enhanced with location-aware capabilities Thought to have value in unfolding emergency or crisis situations Theoretically and practically possible to have city-wide / region-wide / country-wide and broader electronic situational awareness by drawing on a number of electronic data streams (public and private)

Event Graphing Eventgraphs: Data visualizations of time-bounded occurrences or “events” including information about participating individuals, messaging, audio, video, and other related files Topics of Tweet Conversations: Most popular topics around a word or phrase or symbol or equation (any keyword “string”); making mental connections that were not apparent before Entities and Egos: Social networks and individuals interacting around the particular topic “Mayor(s) of the hashtag” (egos and entities), those most influential and active Sub-groups / islands / clusters around an event Pendants, whiskers, and isolates

Event Graphing (cont.) Seeding for the “Event” Data Extraction: Defined #hashtags (and variants) around an event (whether formal or informal) or phenomenon or campaigns or movements; select keywords (words or phrases without the hashtag); select social accounts @names

“Tweet Analyzer” Data Extraction as a Jumping-Off Point to Further Research A “breadth-and-depth” search (mapping the network and then drilling down on various aspects of the graph that is of-interest, such as particular nodes, clusters, messages, links, or other aspects) Examples: Mapping targeted ego neighborhoods and networks Identifying geographical locations linked to online Tweet discourses Identifying geographical locations linked to online accounts and entities Identifying images, videos, and URLs linked to particular discourses (based on campaigns or movements or events)

Computer-Enhanced Data mining Content Mining of Digital Contents and Messaging Structure Mining of Social Networks and Content Networks CORE: text, imagery, videos, audio, URLs, and others Sentiment analysis (expressed feelings, beliefs, attitudes, direction of opinion, strength of opinion, polarity, inferences on purpose, and others; obvious and latent) Content analysis (of messages) Word-sense disambiguation Semantic analysis Frequency counts (word clouds) …via machine-reading and human “close reading” CORE: egos and entities (individuals and groups; humans, cyborgs, sensors and ‘bots); social media platform accounts for various purposes Relationships (formal links): Follower- following / friend Relationships (interaction-based links): Emergent networks around issues, Twitter campaigns, and others (actual interactions) …via machine data visualization and human analysis

Assertability and Qualifiers The Social Medium Platform and its Constituencies: What different types of assertions can you make about data on a particular type of social media platform? Its users? Its regionalisms? Its cultures? Its jargon? What are They Saying? What does the messaging mean? How is the multimedia messaging understood along with the text messaging (more easily machine-processed and even easier to disambiguate through human processing than audio / imagery / video / web pages / other)? What is the relevance of the sentiment—positive, negative, and neutral? How far can you generalize about online conversations? What can you assert about meaning or intention? And what does the talk suggest about possible behaviors?

Assertability and Qualifiers (cont.) Size of Data Extraction: How do you know how much of what is available was actually captured? (no N = all, no API-enabled knowledge of % of data captured vs. amount of data actually available) Spatial Mapping: Given the sparsity of geolocational information in microblogging messages and the locational inaccuracy of what may be shared, what sorts of “digital maps” may be drawn around conversations related to certain issues? How would confidence in such information be measured? How would error rates be understood?

Assertability and Qualifiers (cont.) Egos and Entities: What can you generalize about individuals and groups ascribing to particular ideas? What can you assert about the human or group (or ‘bot or cyborg) identities behind social media accounts? Trending Issues: What can you assert about how issues “trend” on various social media platforms? When is continuous sampling desirable (as with dynamic data)? When is slice-in- time sampling desirable (as with more static data)? Are there space-time interactions that may be captured?

Your Ideas for Research?

Contact and Conclusion Dr. Shalin Hai-Jew iTAC, K-State 212 Hale Library 785-532-5262 shalin@k-state.edu Resource: Conducting Surface Web-Based Research with Maltego Carbon (on Scalar)