I was looking through many APIs to figure out what I wanted to use and how I wanted to develop this Twitterbot. My early attempts consisted of developing.

Slides:



Advertisements
Similar presentations
Copyright © 2003 Pearson Education, Inc. Slide 1-1 The Web Wizards Guide to PHP by David A. Lash.
Advertisements

Mark Briody Lives and works at Caringbah, Sydney Australia…
Agenda Overview of the project Resources. CS172 Project crawlingrankingindexing.
Google Chrome & Search C Chapter 18. Objectives 1.Use Google Chrome to navigate the Word Wide Web. 2.Manage bookmarks for web pages. 3.Perform basic keyword.
Chapter 1 - An Introduction to Computers and Problem Solving
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
A Night To Remember By Walter Lord
HOW TO ACCESS ‘SpringerProtocols’ 1) LIBRARY PORTAL ( 2) Click URL :
1 Web Wizards Guide To PHP David Lash Chapter 1 Introduction to PHP.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Modules, Hierarchy Charts, and Documentation
SCRIPTS INDEPENDENT STUDY CHELSEA W. THE IDEA Study how to adapt a story into different types of scripts with correct formatting Different scripts included:
8/17/2015CS346 PHP1 Module 1 Introduction to PHP.
C GENRES IN LITERATURE How many are there?. Think about what you read… Are there certain things you would prefer to read? Give me some examples? Why do.
1 Spidering the Web in Python CSC 161: The Art of Programming Prof. Henry Kautz 11/23/2009.
INTRO & SEARCH TIPS GOOGLE SCHOLAR Please view on full screen. Press F5 on your keyboard.
Starter for 10 Unit 10: Flickr & YouTube Transform IT SFT10_Flickr_YouTube.
Copyright © 2003 Pearson Education, Inc. Slide 1-1 Web Design & Development PHP.
Constructing Your Own Corpus from Written Language.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Accessible Core Content /Core language Practice and Curriculum Resources.
CIT 590 Intro to Programming Last lecture on Python.
By: ___________________
JavaDoc1 JavaDoc DEPARTMENT OF COMPUTER SCIENCE AND SOFTWARE ENGINEERING CONCORDIA UNIVERSITY July 24, 2006 by Emil Vassev & Joey Paquet revision 1.2 –
Chapter 1 CSIS-120: Java Intro. What is Programming?  A: It is what makes computer so useful.  The flexibility of a computer is amazing  Write a term.
C-Map Tutorial How to create and save a concept map using C-Map, export a C-Map as an image, and export a C-Map as a webpage.
STAAR Questions. Nonfiction What is the best title for article? The author organizes the article by – Describing Explaining How to Compare and contrast.
RSC eBook Collection April 2007 RSC eBook Collection Over 700 Books c. 8,000 chapters c. 250,000 pages 10,000 items - tables.
 CIKM  Implementation of Smoothing techniques on the GPU  Re running experiments using the wt2g collection  The Future.
CS 4720 Dynamic Web Applications CS 4720 – Web & Mobile Systems.
Tech Tip Write, Share, Revise, Compare Julie Thompson MEDT 7477.
Introduction to Python Lesson 1 First Program. Learning Outcomes In this lesson the student will: 1.Learn some important facts about PC’s 2.Learn how.
Computer Security coursework 3 (part 1) Dr Alexei Vernitski.
GEO375 Final Project: From Txt to Geocoded Data. Goal My Final project is to automate the process of separating, geocoding and processing 911 data for.
Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.
1 2/21/05CS120 The Information Era Chapter 4 Basic Web Page Construction TOPICS: Lists, Fonts, Links, and Preformatted Text.
Fundamentals of Web DevelopmentRandy Connolly and Ricardo HoarFundamentals of Web DevelopmentRandy Connolly and Ricardo Hoar Fundamentals of Web DevelopmentRandy.
 Packages:  Scrapy, Beautiful Soup  Scrapy  Website  
A Nonparametric Method for Early Detection of Trending Topics Zhang Advisor: Prof. Aravind Srinivasan.
1 Taking Notes. 2 STOP! Have I checked all your Source cards yet? Do they have a yellow highlighter mark on them? If not, you need to finish your Source.
Kurzweil 3000 Changing The Way We Teach and The Way Students Learn Kurzweil 3000 v 10 Changing The Way We Teach and The Way Students Learn.
Cloud Computing % of us use some form of cloud coumputing.
 Human language : commonly used to express feeling and understand what other people expression.  Computer language : are a languages by which a user.
Cloud Analytics Platforms Christian Frey. About AIDA Our mission is to advance knowledge in data analytics through research, education and outreach Our.
ECS – Storyboarding and Introduction to Web Design
Introduction to Programming
Introduction to Computing Science and Programming I
Introduction to gathering and analyzing data via APIs Gus Cavanaugh
A Playful Introduction to Programming by Jason R. Briggs
Topic: Programming Languages and their Evolution + Intro to Scratch
Chapter 5- Assembling , Linking, and Executing Programs
CSCI-235 Micro-Computer Applications
Lesson 11: Web Services & API's
Corpus Linguistics I ENG 617
Measuring Sustainability Reporting using Web Scraping and Natural Language Processing Alessandra Sozzi
The Five Stages of Writing
September 13-14, 2016 Content Objectives:
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Web archive data and researchers’ needs: how might we meet them?
The Five Stages of Writing
MIS JavaScript and API Workshop (Part 3)
Extracting Recipes from Chemical Academic Papers
Writing Part 2.
PRESENTATION: GROUP # 5 Roll No: 14,17,25,36,37 TOPIC: STATISTICAL PARSING AND HIDDEN MARKOV MODEL.
Introduction to AJAX and JSON
Ajax and JSON Jeremy Shafer Department of MIS Fox School of Business
Ajax and JSON Jeremy Shafer Department of MIS Fox School of Business
The Role of Programming Languages
HappyAImen WANG, Chenghui SHEN, Kairan WU, Shukun
The Web Wizard’s Guide to PHP by David A. Lash
Presentation transcript:

I was looking through many APIs to figure out what I wanted to use and how I wanted to develop this Twitterbot. My early attempts consisted of developing grammars using natural language API’s or n-grams using other machine learning API’s I was exploring my options on what I wanted the bot to do. I had envisioned a couple things: Parsing news pages, web crawling, and making up stories. Generating fantasy phrases based on Dungeons and Dragons modules from PDFs

My initial corpus consisted of smaller free text-based news articles and short stories I had to parse through html and figure out decent articles to train with. There were some errors in parsing, in the text itself, and uninteresting information such as the programming descriptions

Even though there were issues with parsing and using the Markov Chains, I still got some neat results. I decided I liked the fantasy approach from my original idea. The PDFs for Dungeons and Dragons books had a ton of bad data such as dice rolls and in general a non-friendly corpus to parse

Amazon Web Services – Elastic Cloud Computing (EC2) Instance Windows Server 2012 Python 3.5 and Libraries: Tweepy, Markovify, BeautifulSoup Create a corpus of the Lord of the Rings series and an equal part of Harry Potter which was about four of the books (to balance out the probability) Set a timeout for every hour and let it go! Developing the Corpus: Find the books as a text file to make metadata removal easier? Nope! Find the e-books then? Yeah! Convert the e-book to a text file! Clean up the metadata! Remove the page, title, and chapter wording leaving only sentence structure Remove all quotations and other grammatical symbols that would cause odd output Remove other miscellaneous data that would be bad for the training model Run the Markov Chain algorithm and find sentence candidates under 140 characters per Twitter restriction, then post every hour!

The bot posts new and interesting plotlines, deleted scenes, remixes, lots of interesting phrases, and comments that make me think of Mystery Science Theater 3000 (and occasionally jibberish)