Team web space Local access Web Access \\up.ist.local\TeamWebsites\ist402sp17\section2\Team01 \\up.ist.local\TeamWebsites\ist402sp17\section2\Team02 . \\up.ist.local\TeamWebsites\ist402sp17\section2\Team10 Web Access http://teams.up.ist.psu.edu/ist402sp17/section/Team01 http://teams.up.ist.psu.edu/ist402sp17/section/Team10
Visualization of Trending
Topic Trending in Conferences
Trends Can be used to analyze Newspaper stories Emails Online forum discussions Product review Microblogs Category change .
Streamgraph Requirement Benefits Issues Categories, frequencies (weights), time Categories and frequencies Can be hard to get and need advanced algorithms E.g., topics of news articles, conference topics, Benefits Clear trending patterns Comparison among different categories/topics Issues Dominant shapes may be misleading.
Python 3 Extracting Online Data
Importance of Online Data Data related to our first programming assignment http://police.psu.edu/daily-crime-log Online data is dynamic. Online data could be voluminous. Online data is usually "well" structured. Can you extract online data easily and automatically?
Example: Penn State Crime Logs Each record has a well-defined HTML structure. <div class="views-field views-field-title"> <span class="views-label views-label-title">Incident #: </span> <span class="field-content">PSU201700961</span> </div> <div class="views-field views-field-field-occurred"> <span class="views-label views-label-field-occurred">Occurred: </span> <span class="field-content"><span class="date-display-single">03/15/2017 <div class="date-display-range"> <span class="date-display-start" property="dc:date" datatype="xsd:dateTime" content="2017-03-15T02:42:00-04:00">2:42 AM </span> to <span class="date-display-end" property="dc:date" datatype="xsd:dateTime" content="2017-03-15T03:05:00-04:00">3:05 AM</span> </div></span></span> </div> <div class="views-field views-field-field-location"> <span class="views-label views-label-field-location">Location: </span> <span class="field-content">Cunningham Hall</span> </div>
Our Goal Extract all Records from a Page and Save them to a CSV File.
Basic Idea Extract Information based on HTML Tags We need a package to parse HTML codes. Extract individual categories Build a dataframe based on all data from all categories Export the dataframe to a CSV file.
Exercise Follow the exercise instruction \\up.ist.local\Courses\Spring2017\IST402\InClassExe rcisesResources\Week10_Python3