Presentation is loading. Please wait.

Presentation is loading. Please wait.

I was looking through many APIs to figure out what I wanted to use and how I wanted to develop this Twitterbot. My early attempts consisted of developing.

Similar presentations


Presentation on theme: "I was looking through many APIs to figure out what I wanted to use and how I wanted to develop this Twitterbot. My early attempts consisted of developing."— Presentation transcript:

1

2 I was looking through many APIs to figure out what I wanted to use and how I wanted to develop this Twitterbot. My early attempts consisted of developing grammars using natural language API’s or n-grams using other machine learning API’s I was exploring my options on what I wanted the bot to do. I had envisioned a couple things: Parsing news pages, web crawling, and making up stories. Generating fantasy phrases based on Dungeons and Dragons modules from PDFs

3 My initial corpus consisted of smaller free text-based news articles and short stories I had to parse through html and figure out decent articles to train with. There were some errors in parsing, in the text itself, and uninteresting information such as the programming descriptions

4 Even though there were issues with parsing and using the Markov Chains, I still got some neat results. I decided I liked the fantasy approach from my original idea. The PDFs for Dungeons and Dragons books had a ton of bad data such as dice rolls and in general a non-friendly corpus to parse

5 Amazon Web Services – Elastic Cloud Computing (EC2) Instance Windows Server 2012 Python 3.5 and Libraries: Tweepy, Markovify, BeautifulSoup Create a corpus of the Lord of the Rings series and an equal part of Harry Potter which was about four of the books (to balance out the probability) Set a timeout for every hour and let it go! Developing the Corpus: Find the books as a text file to make metadata removal easier? Nope! Find the e-books then? Yeah! Convert the e-book to a text file! Clean up the metadata! Remove the page, title, and chapter wording leaving only sentence structure Remove all quotations and other grammatical symbols that would cause odd output Remove other miscellaneous data that would be bad for the training model Run the Markov Chain algorithm and find sentence candidates under 140 characters per Twitter restriction, then post every hour!

6

7 The bot posts new and interesting plotlines, deleted scenes, remixes, lots of interesting phrases, and comments that make me think of Mystery Science Theater 3000 (and occasionally jibberish)


Download ppt "I was looking through many APIs to figure out what I wanted to use and how I wanted to develop this Twitterbot. My early attempts consisted of developing."

Similar presentations


Ads by Google