Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Extraction using Web Scraping

Similar presentations


Presentation on theme: "Data Extraction using Web Scraping"— Presentation transcript:

1 Data Extraction using Web Scraping
Ishaan Agrawal Cisco Systems India pvt. Ltd.

2 Points to cover About the task What is Web Scraping
DITA Tags – HTML Mapping How it works Challenges faced and best practices for writers

3 About the task Problem statement: To extract commands from configuration guides and command reference guides.   Use Case: Identify the delta (difference) for command reference content missing on different platforms.   Aim: Speed up the process by automating the extraction of commands from guides.

4 What is Web Scraping Web Scraping (also called Web Data Extraction, Web Harvesting etc.) is a data extraction technique employed to extract large amounts of data from webpages (websites) and saved to your local machine.

5 DITA Tags – HTML Mapping
<synph> <kwd> clear configuration lock </kwd> </synph> <synph> <kwd> clear </kwd> <kwd> configuration </kwd> <kwd> lock </kwd> </synph> HTML Output <span class="synph"><span class="kwd">clear</span> <span class="kwd">configuration</span> <span class="kwd">lock</span></span>

6 DITA Tags – HTML Mapping Example

7 How it works Programming language - Python
BeautifulSoup – a Python package HTML Source code Validate book URL and extract list of chapters from the TOC Iterate chapter by chapter and extract commands Create a .txt file and write the extracted commands in it


Download ppt "Data Extraction using Web Scraping"

Similar presentations


Ads by Google