Let’s leave the fascinating world of algebra and Matlab for a while and let’s see what exciting we can do from the Excel level. Microsoft Excel has this ‘thanks God’ programming toolkit known as VBA (Visual Basic for Application) that adds a soul of coding into a mortal body of spreadsheets. If you are an advanced programmer and you are asked or forced to deal with solving problems in Excel/VBA, no doubt you will find programming in VBA as a return to the kindergarten. Saying shortly, VBA is all about fun and relax at work.
Great. As I’m working over a special section of QuantAtRisk.com devoted to From Zero to Hero bootcamp for those of You who wish to brush up or master own skill in VBA and Matlab programming in the application to the quantitative finance and risk management, today I want you to jump with me into the open ocean of VBA application for a specific task that I found very helpful at some stage of my work with the data. Therefore, let’s define the problem first.
The problem is about our quick ability to download some data from the webpage of our choice (e.g. regarding a recent fund or investment option performance) directly into Excel, and next, to look for and extract exactly what we would like to save for our further quantitative analysis.
Since VBA codes for solving this problem are easily spread around the Web, it wouldn’t be challenging to dedicate another article treating the same problem in a new frame. And it’s not my style. Because I love to challenge myself in my private life, I even more love to challenge the problem I wish to solve. I don’t need to look too far as the problems I find around challenge me equally.
1. Case Study
We hunt for the data that are displayed by the data provider within a web-frame, and all VBA codes aimed at getting access to that frame simply fail. Let’s consider the example of InvestSMART.com.au webpage. It’s an Australian portal feeding for some funds its data from the Morningstar.com.au data provider. We are looking for the latest investment option performance of, say, OnePath OA PS – OP Australian Shares EF of the APIR code MMF0022AU:
and we want to download into Excel the Total Return numbers for 1 Month, 3 Month, and 1 Year performance. In case when the standard ways of getting this information do not work, for example, utilizing the classical variables of doc.body.innerText or doc.body.outerText as in the epic approach:
Dim iFrm As HTMLIFrame Dim doc As MSHTML.HTMLDocument For iterator = 0 To appIE.Document.all.Length - 1 If TypeName(appIE.Document.all(iterator)) = "HTMLIFrame" Then Set iFrm = appIE.Document.all(iterator) Set doc = iFrm.Document MsgBox doc.body.outerText End If Next
we need to get another way round this problem. One of them is the application of Google Search for the fund performance based on the investment option code.
2. Limit Google Search down to the first listing, download the webpage directly into Excel
That title says everything. We can use Google.com to perform the search for the same investment option among all webpages of InvestSMART as indexed by Google up to date. This solution seems to be more elegant and getting straight to the point. To make this post worth substantially more, imagine, that your task is to repeat the data acquisition process for a long list of options you have in your Excel worksheet:
We not only look for the way of fetching the required performance numbers but also to automate the whole process to work in the background (or overnight). Given the active Excel worksheet constructed as displayed above, we write VBA code (a macro) making use of Google Search engine. We read in the APIR codes one by one from column B and allow VBA to search the Google by the keyword constructed, in general, as: “InvestSMART”+APIRCode:
'Get Fund Performance directly into Excel utilizing VBA and Google ' '(c) 2013 QuantAtRisk.com, by Pawel Lachowicz Sub CheckAPIRCodes() WB = ActiveWorkbook.Name WS = ActiveSheet.Name 'Declaration of Variables Dim i, j, k, r As Long Dim cellstr, cellstr2, cellstr3, cellstr4, str1, str2, extractedHTML As String Dim NameB, NameS, MyStr, GoogleSearchPath, SearchString As String Dim d1, d2, d3, d4, d5, d6, APIRCode As Variant Dim Nstr, Mstr, c1, c2, c3, c4, c5, c6 As Long Dim SearchingError As Boolean Dim iStart, iEnd As Integer Dim ie As InternetExplorer Dim RegEx As RegExp, RegMatch As MatchCollection Dim iedoc, pDisp As Object SearchingError = False Set ie = New InternetExplorer Set RegEx = New RegExp 'Find number of all rows with codes in WS (including header) Dim Nrow As Long Nrow = ActiveSheet.Range("B" & Rows.Count).End(xlUp).Row '--Main Loop over the list of all APIR codes as specified in column B For r = 1 To Nrow Application.Wait (Now + TimeValue("0:00:05")) SearchingError = False 'Read APIR Code from the list (row number = r) Workbooks(WB).Worksheets(WS).Activate APIRCode = Workbooks(WB).Worksheets(WS).Cells(r, 2).Value '--GOOGLE Search for APIR Code in InvestSMART Set ie = New InternetExplorer Set RegEx = New RegExp 'Search in Google for 'SearchString' keyword GoogleSearchPath = "https://www.google.com.au/search?q=" 'The keyword SearchString = "InvestSMART+" & APIRCode ie.Navigate GoogleSearchPath & SearchString 'Loop until IE page is full loaded Do Until ie.ReadyState = READYSTATE_COMPLETE Loop 'Set regular expression MyStr = "" MyStr = ie.Document.body.innerText Set RegMatch = Nothing Set RegMatch = RegEx.Execute(MyStr) If RegMatch.Count > 0 Then ie.Navigate RegMatch(0) Do Until ie.ReadyState = READYSTATE_COMPLETE Loop 'show Internet Explorer ie.Visible = True Set iedoc = ie.Document 'Extract the first webpage from the Google search; if failed, 'handle the error extractedHTML = "" On Error GoTo ErrHandler: extractedHTML = iedoc.getElementById("search").innerHTML ErrHandler: ie.Quit Mstr = 0 Mstr = Len(extractedHTML) Resume GoOn: GoOn: If (Mstr = 0) Then SearchingError = True Else SearchingError = False End If If (SearchingError = False) Then 'find the first href as this will be the first link, 'add 1 to encompass the quote iStart = InStr(1, extractedHTML, "href=", vbTextCompare) + Len("href=") + 1 'locate the next quote as this will be the end of the href iEnd = InStr(iStart, extractedHTML, Chr(34), vbTextCompare) 'extract text extractedHTML = Mid(extractedHTML, iStart, iEnd - iStart) Nstr = Len(extractedHTML) iStart = 0 iStart = InStr(1, extractedHTML, "&", vbTextCompare) If (iStart > 0) Then str1 = Left(extractedHTML, iStart) str2 = Right(extractedHTML, Nstr - iStart - 4) extractedHTML = str1 & str2 End If End If str1 = Left(extractedHTML, 29) If (str1 <> "https://www.investsmart.com.au") Then SearchingError = True End If Else 'have the best sex ever End If
In line #29 the code counts the number of elements listed in column B. This solution works only for the lists having no blank cells between the elements.
The practice with the code also reveals that Google protects itself from the robotic-like requests sent from our VBA level if the time of sending a search request to Google is spaced by a few seconds only. Therefore, the smart step to avoid this obstacle is to add a time delay of about 5 seconds (see line #35). The code works perfectly under Windows environment as it uses Internet Explorer, so please don’t be misled by a screenshot taken from Apple’s MacOS. To the best of my knowledge, there are problems in running this code in Apple due to the lack of right socket replacing IE.
The last step is natural. If Google found and limited its first search entry to the webpage of InvestSMART and requested APIR code, we fetch and download the page directly into Excel. The following solution (among many many available on-line) I found to work the best:
'---- If InvestSMART with APIR code has been found by Google, 'extract performance numbers If (SearchingError = False) Then 'Download extractedHTML into Excel (new workbook) Application.Workbooks.Open (extractedHTML) NameB = ActiveWorkbook.Name NameS = ActiveSheet.Name
where extractedHTML variable stores the exact web address of the page with the data we are interested in. Check this code separately how does it work with different domains. A little bit of reconnaissance here will help you to discover how this process is powerful for Excel. In short, the extractedHTML webpage is opened in Excel as a new worksheet!
Given that unique experience, we are left with a basic data manipulation based on the data extracted from the Internet:
For i = 1 To 50 Workbooks(NameB).Worksheets(NameS).Activate cellstr = Workbooks(NameB).Worksheets(NameS).Cells(i, 1).Value cellstr2 = Left(cellstr, 16) If (cellstr2 = "Fund Performance") Then cellstr4 = cellstr For j = 1 To 20 Workbooks(NameB).Worksheets(NameS).Activate cellstr = Workbooks(NameB).Worksheets(NameS).Cells(i+1,j).Value If (cellstr = "1 Month") Then c1 = j If (cellstr = "3 Month") Then c2 = j If (cellstr = "1 Year") Then c3 = j Next j For j = i + 1 To i + 5 cellstr = Workbooks(NameB).Worksheets(NameS).Cells(j, 1).Value cellstr2 = Left(cellstr, 12) If (cellstr2 = "Total Return") Then k = j 'read data from InvestSMART '1M d1 = Workbooks(NameB).Worksheets(NameS).Cells(k, c1).Value '3M (Q) d2 = Workbooks(NameB).Worksheets(NameS).Cells(k, c2).Value '1Y d3 = Workbooks(NameB).Worksheets(NameS).Cells(k, c3).Value 'save in a working sheet Workbooks(WB).Worksheets(WS).Activate Workbooks(WB).Worksheets(WS).Cells(r, 3).Value = d1 Workbooks(WB).Worksheets(WS).Cells(r, 4).Value = d2 Workbooks(WB).Worksheets(WS).Cells(r, 5).Value = d3 Workbooks(WB).Worksheets(WS).Cells(r, 6).Value = cellstr4 Workbooks(WB).Save End If Next j End If Next i 'Workbooks(NameB).Close End If 'ie.Quit Set RegEx = Nothing Set ie = Nothing Next r Workbooks(WB).Worksheets(WS).Activate Range("A1").Select End Sub
what accomplishes our mutual VBA-Google efforts into the final result of a “Master and his dog” game of throwing and fetching the ball:
1 comment
For the VBA code to work you need to select the References, which you have not described. Best would be an example excel file.