Download presentation
Presentation is loading. Please wait.
1
Data Extraction using Web Scraping
Ishaan Agrawal Cisco Systems India pvt. Ltd.
2
Points to cover About the task What is Web Scraping
DITA Tags – HTML Mapping How it works Challenges faced and best practices for writers
3
About the task Problem statement: To extract commands from configuration guides and command reference guides. Use Case: Identify the delta (difference) for command reference content missing on different platforms. Aim: Speed up the process by automating the extraction of commands from guides.
4
What is Web Scraping Web Scraping (also called Web Data Extraction, Web Harvesting etc.) is a data extraction technique employed to extract large amounts of data from webpages (websites) and saved to your local machine.
5
DITA Tags – HTML Mapping
<synph> <kwd> clear configuration lock </kwd> </synph> <synph> <kwd> clear </kwd> <kwd> configuration </kwd> <kwd> lock </kwd> </synph> HTML Output <span class="synph"><span class="kwd">clear</span> <span class="kwd">configuration</span> <span class="kwd">lock</span></span>
6
DITA Tags – HTML Mapping Example
7
How it works Programming language - Python
BeautifulSoup – a Python package HTML Source code Validate book URL and extract list of chapters from the TOC Iterate chapter by chapter and extract commands Create a .txt file and write the extracted commands in it
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.