Download presentation
Presentation is loading. Please wait.
Published byWilfred Turner Modified over 9 years ago
1
Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud
2
Aspirations / Reality Aggregate apartments listings from all across the internet to create a… …simple, one-stop, apartment search Aggregate apartment listings from top sites. (Washington state only) …mostly one-stop apartment search. …mostly simple.
3
Building It Brandon – Site specific extractors Statistics Noah –Server configuration Front-end development Zac – Site specific extractors Advanced Search Zak – Crawler / Aggregator Commute distance feature
4
Page Extraction Statistics Extractor NameFiles CrawledListings Found Extraction Errors % error- free Rent.com49073250100 ApartmentRatings.com78557231198.4 Craigslist.com979427739196.6 MyNewPlace.com73929017091.6
5
Extraction Accuracy Statistics Extractor NameTPTNFPFNPrecisionRecallF-score Rent.com2811351200.9591.0000.979 ApartmentRatings.com390100.9751.0000.987 Craigslist63147390.9550.8750.913 MyNewPlace.com186 10440.9490.8090.873
6
Experiment Conclusion Much higher accuracy on the structured pages versus unstructured craigslist Craigslist is candidate for machine learning Machine learning likely worse on others
7
What we learned How to configure Amazon Web Services with a LAMP stack How to create a web application with AJAX How to use Jobo and Nutch for web crawling How to parse HTML for pertinent data The considerations of starting a web business
8
Unexpected Outcomes Amazon Web Services was slower than a $7/month virtual server Most of the large listing sites were surprisingly easy to extract data from Aggregating information from the web is legally tricky
9
Things We’d Do Differently Better version control More pre-coding design More quality control and testing More extensible extractors (Maybe an existing HTML parser)
10
Demo
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.