Data Connections -- Options, Functionality, and Performance
Objective Use VBA code to extract and store data from web pages when the underlying database is not accessible. Step one – Accessing the web pages Step two – Extracting the data
Accessing the Web Pages Internet Explorer Object WinHttpRequest Object XMLHttpRequest object
Document Object Model (DOM) Regular Expressions Retrieving the Data
IE Object Dim objIE As Object Dim varTables As Variant Set objIE = CreateObject("InternetExplorer.Application") URL = " objIE.Visible = False objIE.navigate URL Do Until Not objIE.Busy DoEvents Loop While objIE.Document.ReadyState <> "complete" Wend Set varTables = objIE.Document.all.tags("TABLE")
Advantages View web page in action Credentials Background tasks DOM Disadvantages Hangs up Can interfere with other browsers More resources IE Object
WinHttpRequest Object Dim winHttpReq As Object Set winHttpReq = CreateObject("WinHttp.WinHttpRequest.5.1") URL = " requesttimeout=5000&SDate=" & ProcDate winHttpReq.SetTimeouts , , , winHttpReq.Open "GET", URL, False winHttpReq.SetCredentials “username", “password", HTTPREQUEST_SETCREDENTIALS_FOR_SERVER winHttpReq.Send result = winHttpReq.responseText Set winHttpReq = Nothing
Advantages More timeout control Waits for webpage to complete Disadvantages Cannot see what is returned May need Credentials No DOM WinHttpRequest Object
Time Trial IE ObjectWinHttpRequest Object Using two different versions of the Scan Error Tracking program, we ran three separate scans of 542 zip codes. Execution time is in seconds.
DOM vs RegEx The DOM is preferred when parsing web pages where the data is in uniform locations, using tables and rows. Regular Expressions work best when attempting to find data on a page where the location of the information, or the structure of the page is not known in advance.