This post will be short but very informative. You can learn a few good Unix/Linux tricks on the way. The goal is well defined in the title. So, what’s the quickest solution? We will make use of Python in the Unix-based environment. As you will see, for any text file, writing a single line of Unix commands is more than enough to deliver exactly what we need (a basic text file processing). If you try to do the same in Windows.. well, good luck!
In general, we need to get through the FTP gate of NASDAQ heaven. It is sufficient to log on as an anonymous user providing your password defined by your email. In fact, any fake email will do the job. Let’s begin coding in Python:
# How to Get a List of all NASDAQ Securities as a CSV file using Python? # +tested in Python 3.5.0b2, Mac OS X 10.10.3 # # (c) 2015 QuantAtRisk.com, by Pawel Lachowicz import os os.system("curl --ftp-ssl anonymous:jupi@jupi.com " "ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt " "> nasdaq.lst")
Here we use os module from the Python’s Standard Library and a Unix command of curl. The latter allows us to connect to FTS server of NASDAQ exchange, fetch the file of nasdaqlisted.txt to be usually stored in the SymbolDirectory directory and download it directly to our current folder under a given name of nasdaq.lst. During that process you will see the progress information displayed by Python, e.g.:
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 162 100 162 0 0 125 0 0:00:01 0:00:01 --:--:-- 125 100 174k 100 174k 0 0 23409 0 0:00:07 0:00:07 --:--:-- 39237
Now, in order to inspect the content of the downloaded file we may run in Python an extra line of code, namely:
os.system("head -20 nasdaq.lst") print()
which displays first 20 lines from the top:
Symbol|Security Name|Market Category|Test Issue|Financial Status|Round Lot Size AAIT|iShares MSCI All Country Asia Information Technology Index Fund|G|N|N|100 AAL|American Airlines Group, Inc. - Common Stock|Q|N|N|100 AAME|Atlantic American Corporation - Common Stock|G|N|D|100 AAOI|Applied Optoelectronics, Inc. - Common Stock|G|N|N|100 AAON|AAON, Inc. - Common Stock|Q|N|N|100 AAPC|Atlantic Alliance Partnership Corp. - Ordinary Shares|S|N|N|100 AAPL|Apple Inc. - Common Stock|Q|N|N|100 AAVL|Avalanche Biotechnologies, Inc. - Common Stock|G|N|N|100 AAWW|Atlas Air Worldwide Holdings - Common Stock|Q|N|N|100 AAXJ|iShares MSCI All Country Asia ex Japan Index Fund|G|N|N|100 ABAC|Aoxin Tianli Group, Inc. - Common Shares|S|N|N|100 ABAX|ABAXIS, Inc. - Common Stock|Q|N|N|100
As you can see, we are not interested in first 8 lines of our file. Before cleaning that mess, let’s inspect the “happing ending” as well:
os.system("tail -5 nasdaq.lst") print()
displaying
ZVZZT|NASDAQ TEST STOCK|G|Y|N|100 ZWZZT|NASDAQ TEST STOCK|S|Y|N|100 ZXYZ.A|Nasdaq Symbology Test Common Stock|Q|Y|N|100 ZXZZT|NASDAQ TEST STOCK|G|Y|N|100 File Creation Time: 0624201511:02|||||
Again, we notice that the last line does not make our housewarming party more merrier.
Given that information, we employ heavy but smart one-liner making use of immortal Unix commands of cat and sed in the pipe (pipeline process). Therefore, the next calling in our Python code does 3 miracles all-in-one shot. Have a look:
os.system("tail -n +9 nasdaq.lst | cat | sed '$d' | sed 's/|/ /g' > " "nasdaq.lst2")
If you view the output file of nasdaq.lst2 you will see its content to be exactly as we wanted it to be, i.e.:
$ echo; head nasdaq.lst2; echo "..."; tail nasdaq.lst2 AAIT iShares MSCI All Country Asia Information Technology Index Fund G N N 100 AAL American Airlines Group, Inc. - Common Stock Q N N 100 AAME Atlantic American Corporation - Common Stock G N D 100 AAOI Applied Optoelectronics, Inc. - Common Stock G N N 100 AAON AAON, Inc. - Common Stock Q N N 100 AAPC Atlantic Alliance Partnership Corp. - Ordinary Shares S N N 100 AAPL Apple Inc. - Common Stock Q N N 100 AAVL Avalanche Biotechnologies, Inc. - Common Stock G N N 100 AAWW Atlas Air Worldwide Holdings - Common Stock Q N N 100 AAXJ iShares MSCI All Country Asia ex Japan Index Fund G N N 100 ... ZNGA Zynga Inc. - Class A Common Stock Q N N 100 ZNWAA Zion Oil & Gas Inc - Warrants G N N 100 ZSAN Zosano Pharma Corporation - Common Stock S N N 100 ZSPH ZS Pharma, Inc. - Common Stock G N N 100 ZU zulily, inc. - Class A Common Stock Q N N 100 ZUMZ Zumiez Inc. - Common Stock Q N N 100 ZVZZT NASDAQ TEST STOCK G Y N 100 ZWZZT NASDAQ TEST STOCK S Y N 100 ZXYZ.A Nasdaq Symbology Test Common Stock Q Y N 100 ZXZZT NASDAQ TEST STOCK G Y N 100
The command of
tail -n +9 nasdaq.lst
lists all lines of the file skipping first nine at the beginning. Next we push in a pipe that output and list it as a whole using cat command. In next step that output is processed by sed command which (a) removes the last line first; (b) the second one replaces all “|” tokens with “empty space” token. Finally, the processed output is saved as a nasdaq.lst2 file. The power of Unix in a single line. After 15 years of using it I’m still smiling to myself doing that :)
All right. What is left? Getting a list of tickers and storing it into a CSV file. Piece of cake. Here we employ the Unix command of awk in the following way:
os.system("awk '{print $1}' nasdaq.lst2 > nasdaq.csv") os.system("echo; head nasdaq.csv; echo '...'; tail nasdaq.csv")
which returns
AAIT AAL AAME AAOI AAON AAPC AAPL AAVL AAWW AAXJ ... ZNGA ZNWAA ZSAN ZSPH ZU ZUMZ ZVZZT ZWZZT ZXYZ.A ZXZZT
i.e. an isolated list of NASDAQ tickers stored in nasdaq.csv file. From this point, you can read it into Python’s pandas DataFrame as follows:
import pandas as pd data = pd.read_csv("nasdaq.csv", index_col=None, header=None) data.columns=["Ticker"] print(data)
displaying
Ticker 0 AAIT 1 AAL 2 AAME 3 AAOI 4 AAON 5 AAPC ... [3034 rows x 1 columns]
That’s it.
In the following post, I will make use of that list to fetch the stock trading data and analyse the distribution of extreme values–the gateway to prediction of extreme and heavy losses for every portfolio holder (part 2 out of 3). Stay tuned!
DOWNLOADS
nasdaqtickers.py
1 comment
Hi, thank you for this post but I was wondering what the difference is between two files available in the ftp directory. To be specific, what is the difference between “nasdaqlisted.txt” and “nasdaqtraded.txt.”. Thank you for your help !