WEB TEXT DAY 34 - 11/14/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.

Slides:



Advertisements
Similar presentations
Strings and regular expressions Day 10 LING Computational Linguistics Harry Howard Tulane University.
Advertisements

TEXT STATISTICS 1 DAY /20/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
TEXT STATISTICS 7 DAY /05/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
In the right place? This is Rutgers-Newark The course is Multimedia Journalism The class is My name is Prof. Brian Horton Prof. Brian Horton.
Course Orientation Announcements & Syllabus Tool.
What Is A Web Page? An Introduction to the Internet.
1 CS428 Web Engineering Lecture 18 Introduction (PHP - I)
UNICODE & CONTROL DAY /24/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Searching UCN Databases Finding Journal Articles Through Ebsco.
TEXT STATISTICS 5 DAY /29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Unit 8.2 / Lesson 1 / presentation1a Web Elements.
NLTK & BASIC TEXT STATS DAY /08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
COMPUTATION WITH STRINGS 4 DAY 5 - 9/05/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
ON-LINE DOCUMENTS 3 DAY /17/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
CS435/535, Computer Graphics Jingyuan(Alex) Zhang Office: 3413 SEC (Science and Engineering Complex) Phone:
UNICODE DAY /22/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Introduction to Podcasts (and other techniques) Introduction to Podcasting Understanding Podcasts Finding Podcast Creating Podcasts (or audio downloads)
1 RSS The Easy Way! Reading & Creating Feeds with Free and Easy to Use Tools Russell O’Neill General Services Administration Senior Software Engineer,
COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
SCRIPTS & FUNCTIONS DAY /06/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Week 1 – Beginners Content McAfee & Big Fish Games CoderDojo.
TWITTER DAY /07/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
TWITTER 2 DAY /10/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
REGULAR EXPRESSIONS 3 DAY 8 - 9/12/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
REGULAR EXPRESSIONS 4 DAY 9 - 9/15/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Information Retrieval and Web Search Crawling in practice Instructor: Rada Mihalcea.
Creating Webpages. Today’s Topics Embed video Embed music More text formatting Wordpress.
Copyright 2007, EMC Paradigm Publishing Inc. INTERNET EXPLORER 7 BACKNEXTEND 1-1 LINKS TO OBJECTIVES Launching Internet Explorer Launching Internet Explorer.
REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
HINDU STYLE PORTFOLIO TEMPLATE
 Packages:  Scrapy, Beautiful Soup  Scrapy  Website  
ON-LINE DOCUMENTS DAY /13/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
CONTROL 2 DAY /26/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
TWITTER 3 DAY /12/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
COMPUTATION WITH STRINGS 3 DAY 4 - 9/03/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Web Scraping with Python and Selenium. What is Web Scraping?  Software technique for extracting info from websites Get information programmatically that.
{ Analyzer Tutorial By You will be able to find the download link of the latest version here.
■ A blog originally was a personal website meant to be like a diary or journal. ■ Basically a type of website, like a forum or a social bookmarking site.
Internet Explorer. 2 Menu bar Command bar Back & forward buttons Search bar (type word or phrases )
Introduction to HTML 4.0 Getting Started – Basic Terminology Teacher: Mr. Ho.
CONTROL 3 DAY /29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
1 3/2/05CS120 The Information Era Chapter 4 Basic Web Page Construction TOPICS: Hyperlinks.
Lesson 14: Web Scraping TopHat Attendance
Lesson 14: Web Scraping Topic: Web Scraping.
Flat text Day 6 - 9/12/16 LING 3820 & 6820 Natural Language Processing
Computation with strings 2 Day 3 - 9/02/16
Getting web pages First we need to get the webpage by issuing a HTTP request. The best option for this is the requests library that comes with Anaconda:
Flat text 2 Day 7 - 9/14/16 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Flat text 3 Day 8 - 9/16/16 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Computation with strings 3 Day 4 - 9/07/16
© 2010, Mike Murach & Associates, Inc.
Regular expressions 2 Day /23/16
LING 3820 & 6820 Natural Language Processing Harry Howard
control 4 Day /01/14 LING 3820 & 6820 Natural Language Processing
LING 3820 & 6820 Natural Language Processing Harry Howard
Scrapy Web Cralwer Instructor: Bei Kang.
Control 3 Day /05/16 LING 3820 & 6820 Natural Language Processing
NLP 2 Day /07/16 LING 3820 & 6820 Natural Language Processing
Your valuable input to the Half Double book
Regular expressions 3 Day /26/16
Put the names of the people in the group here
Web Project Evaluation Checklist (45 Points)
Put the names of the people in the group here
Find a video that can be saved as an MP4
Computation with strings 4 Day 5 - 9/09/16
Bryan Burlingame 24 April 2019
Control 1 Day /30/16 LING 3820 & 6820 Natural Language Processing
An Introduction to the Internet
Presentation transcript:

WEB TEXT DAY /14/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University

Course organization 14-Nov-2014NLP, Prof. Howard, Tulane University 2   The syllabus is under construction.   Chapter numbering  3.7. How to deal with non-English characters 3.7. How to deal with non-English characters  4.5. How to create a pattern with Unicode characters 4.5. How to create a pattern with Unicode characters  6. Control 6. Control

Open Spyder 14-Nov NLP, Prof. Howard, Tulane University

Twitter Review 14-Nov NLP, Prof. Howard, Tulane University

Finding text on the web 14-Nov NLP, Prof. Howard, Tulane University

14-Nov-2014NLP, Prof. Howard, Tulane University 6

Firefox: Tools > web developer > Page source Safari: Prefs > Advanced > Show develop >> show page source  If someone asked you how to do something …. By all means, you still need pictures, even video. But there's nothing to replace the specificity that comes from the alphabet. Use labels. Use words. 14-Nov-2014NLP, Prof. Howard, Tulane University 7

We need  requests  % pip install feedparser  % pip install BeautifulSoup4 14-Nov-2014NLP, Prof. Howard, Tulane University 8

Get the text 1. import requests 2. from bs4 import BeautifulSoup 3. url = ' 4. html = requests.get(url).text 5. soup = BeautifulSoup(html) 6. print soup.find("div", {"class":"entry- body"}).text.encode('utf8') 14-Nov-2014NLP, Prof. Howard, Tulane University 9

Install feedparser by hand   click on Downloads button  choose.zip file  $ cd /Users/harryhow/Downloads/feedparser  $ python setup.py install 14-Nov-2014NLP, Prof. Howard, Tulane University 10

Get the RSS feed 1. from bs4 import BeautifulSoup 2. import feedparser 3. url = 'feed://feeds.feedblitz.com/sethsblog' 4. fp = feedparser.parse(url) 5. print "Fetched %s entries from '%s'" % (len(fp.entries), fp.feed.title) 6. blog_posts = [] 7. for e in fp.entries: 8. blog_posts.append({'title': e.title, 9. 'content': BeautifulSoup(e.content[0].value).get_text().encode('utf8'), 10. 'link': e.links[0].href}) 11. print blog_posts[0]['content'] 14-Nov-2014NLP, Prof. Howard, Tulane University 11

something else maybe a quiz Next time 14-Nov-2014NLP, Prof. Howard, Tulane University 12