Comprehensive Book Review: Python for Finance Cookbook, 2nd Ed. by Eryk Lewinson

Thanks to the courtesy of Packt Publishing, I had the pleasure of receiving, reading, and studying the new release of Python for Finance Cookbook, the book by Eryk Lewinson. This is the second (and probably the last) edition, according to the author himself. Therefore, it must be solid and memorable. I do think Eryk made every effort to deliver a masterpiece in a neat and condensed form.

Since 2015, the Python language has gained extreme popularity in its direct application in data science, machine learning, numerical computations, science, and finance. It provides an opportunity to write clear code, strongly inclined towards the English language in its syntax, making it an appealing go-to solution for advanced prototyping of ideas, models, and products. Thus, it is not surprising that many C++ quants (quantitative analysts) who code heavily for the needs of hedge funds, investment firms, or other financial institutions discovered Python as a brilliant remedy for all their problems.

Eight years later, the landscape in computational finance and banking has changed so dramatically, and Python leads the way, competing with other languages like R, C/C++, Java, or Rust. Taking this opportunity, it would not be wise to lean against it and pick up the fundamentals of Python, including its must-know libraries: NumPy, Pandas, SciPy, Seaborn, Scikit-learn, Matplotlib – just to name a few. It quickly becomes a nightmare for a newcomer – where to start? What’s the best strategy to become a good Python programmer? Well, life proves that, as usual, it takes time, dedication, perseverance, and focus to master the language at the level that gives you the freedom and flexibility of its use. Now, how to blend it with finance and wide financial applications? Well, here come the cookbooks – individually crafted books targeting specific topics, delivering pinpoint solutions.

And honestly, Eryk’s book fills many blind spots and gaps in the current Python-related literature. Across 741 pages and 15 chapters, the author takes you from the basics to most up-to-date subjects, starting with data providers, data processing, visualisaiton, exploring, forecasting, model construction, testing, backtesting, trading, closing all with advances in machine and deep learning. Quite a lot in a single book!

Let me highlight key points of the book, supplemented with extras I personally think that could be added to the cookbook or further explored by the reader.

Chapter 1: Acquiring Financial Data

In this chapter, you will learn about the top five attractive free or partially free financial data providers. We begin with Yahoo! Finance (legacy!) and grab the data into DataFrames, an entry point to any data processing. We also cover the former Quandl, now Nasdaq Data Link, which is the source of any financial and non-financial data. This can be further extended with Intrinio, which provides access to intraday historical data, real-time stock/option prices, financial statement data and fundamentals, company news, earnings-related information, IPOs, economic data such as the Gross Domestic Product (GDP), unemployment rate, federal funds rate, etc. Another example is Alpha Vantage, which is another popular data vendor providing high-quality financial data. Eryk illustrates everything with handy and carefully selected examples, and the code is ready to use after minor modifications.

When it comes to cryptocurrency data, we use CoinGeko as an example, but CryptoCompare.com is also worth mentioning. Its Python API is rich and allows for free and fast crypto-data exploration. I would add a section on Binance exchange itself, which stores (in zip files) historical time-series data with various frequencies from trading. If this is not sufficient, and you wish to go beyond simple price-series, the Deribit API allows you to access live data on crypto options and futures. More on that can be found soon in my new book, Cryptocurrencies and Crypto-Derivatives with Python.

A strong point of this and all other chapters of Eryk’s cookbook is the fact that it always contains additional subsections called “There’s more…” where you can find supplementary materials that allow you to go deeper into the leading subject, serving as a palette of new directions to choose from if you search for more.

Having data in hand is where ideas are born. Ideas about what data carry, mean, and tell you. Then, you are ready to make the next move.

Chapter 2: Data Preprocessing

Eryk wrote this chapter around some rudimentary topics, such as converting prices to returns, adjusting the returns for inflation, changing the frequency of time series data, different ways of imputing missing data, changing currencies, and different ways of aggregating trade data. Believe it or not, but the concept of working with return-series (rates) rather than price-series (levels) is quite often misunderstood and misused in finance. You will learn about the differences and conversions. What I found missing but worth mentioning are so-called compressed returns, i.e., the model that addresses explosive rates (e.g., interest rates) such that a switch between relative and absolute shocks is possible given market conditions.

An adjustment for inflation is described, a very modern topic these days. You can learn how to use CPI data for the US, but other countries’ rates are also in scope.

Eryk briefly touches on changing the frequency of time series data. There is a mention of realized volatility and its annualized counterpart. However, this topic is much wider than what you will find in this subsection and should be explored more. Also, the following part, on the imputation of time-series (filling the dates/gaps with missing values), is limited to pandas’ back and forward fill function, which is too simplistic and should be avoided in financial applications at any cost. Very recently, the PRA, which regulates European banks, called for more advanced in-house data imputation techniques, and long-gap filling methodology (e.g., gaps longer than 5 days) should reflect on the localized assets’ level of correlation, preserving a certain level of volatility. It is obvious that any flat-filling removes or neglects certain characteristics of the asset’s behavior.

I really enjoyed the section on different ways of aggregating trade data using pandas’ groupby function. Grouping and binning time-series are broad topics in Time-Series Analysis, and especially in quant trading, the reader should go beyond the book mentions. The treatment of series with gaps, applying averaging within the bins, or exploring the data distribution in a tight bin span (e.g., 30 seconds) often leads you to solutions that require more data gymnastics than a simple Python function. You need to think like a pro statistician rather than a programmer.

Chapter 3. Visualising Financial Time-Series

Working with data cannot be done through visualization alone on the screen. While it is true that a single picture can convey more information than a thousand words, it at least provides a view of the third or fourth dimension from different angles. In this chapter, Eryk provides very useful examples of different datasets illustrated using Python and R graphical libraries such as matplotlib, seaborn, plotly, altair, plotnine, and bokeh. Although you won’t learn all the functions of each library from A to Z, the examples in the book will guide you in the right direction.

Honestly, there is only one recipe for success in graphical representation of data in Python: make a deal with yourself to sit down and practice, practice, practice. The more variations of code you write and plots you make, the better you become in this area. Eryk’s examples are enjoyable.

The progress in the development of new graphical libs has not stopped. For example, the economics-oriented altair library is beautiful, but I found it challenging to switch entirely from matplotlib. Read and find something that works for you.

Chapter 4. Exploring Financial Time-Series Data

The chapter begins with outlier detection using rolling statistics, followed by the use of the Hampel filter. In Value-at-Risk (VaR) measure estimation, outliers often mean VaR breaches. Banks are concerned about these events and turn their quant analytics teams upside down. The aim is to protect capital against grey and black swans for internal portfolios. Rolling measures are the starting points where the outcomes of models or executed trades are brought to light.

This motivates Eryk to move to the topic of changepoints in time series, where a changepoint can be defined as a point in time when the probability distribution of a process or time series changes, for example, when there is a change in the mean of the series. From this, the application of the Hurst exponent to the detection of changing patterns in time series is illustrated and nicely explained. The latter is a classical textbook example of a model often used in algo-trading. While not as promising as Kalman’s filter, it is a smooth entry into the broad subject of trading patterns classification (uptrend, downtrend, trading sideways).

The section investigating stylized facts of asset returns will help you to grasp and apply the statistical properties of time series. The examples underlying the use of autocorrelation and partial autocorrelation are clear and helpful.

Chapter 5. Technical Analysis and Building Interactive Dashboards

While studying time-series analysis, reading books and trying out your own code to discover latent variables or series characteristics can take up a good deal of a few months, it is better to combine the knowledge from previous chapters and turn it into profits. In Chapter 5, Eryk will take you step-by-step through the basic construction of technical indicators (MACD, RSI, etc.) and their visual representation, using TA-Lib. You will learn how to plot price series via candlesticks in various ways, the knowledge that is often spread across the internet now gathered in a single book chapter!

Eryk finishes with an exemplary use of the streamlit library, which allows for the creation of simple web-based, standalone apps for interactive data visualization, selection, grouping, and pivoting. Streamlit is Python’s response to R’s R-Studio, and while it still unfortunately falls behind in the scope of its functionality, it’s easy to pick up entirely in one day! For those of you who wish for something more advanced, Dash Plotly is the next bus stop to consider.

Chapter 6. Time-Series Analysis and Forecasting

Here, you will dive deeper into time-series analysis. This chapter is a must-read. First, Eryk covers time-series decomposition, taking care of seasonality in the data. Next, he moves to testing for stationarity in time series and correcting for the lack of stationarity. You will find the right tools here to deal with the problem of the lack of time-series stationarity, which is often neglected as the entry point to many machine learning models or misused in autoregressive modeling of price/return series.

Exponential smoothing methods are explored sufficiently, followed by modeling time series with ARIMA class models. This classical approach in TSA is now clearly illustrated in Python. Of course, I wish I could see more math of different models in this section, but I understand that there is plenty of literature on the topic and very little practical examples, so it is needed if you are looking for a go-to recipe.

In that sense, the chapter is not abundant but contains the groundwork to take off.

Chapter 7. Machine Learning-Based Approaches to Time-Series Forecasting

ake another deep breath and let’s dive even deeper. This chapter provides a fantastic introduction to the world of machine learning with Python examples. ML is still a hot topic (not to be confused with the AI solution of ChatGPT!). There has always been a desire to forecast the next price action when trading assets. If only we could know the future, the next price appreciation or depreciation in the next 1, 2, or 5 seconds or minutes, everyone could be rich! Fortunately, ML has given us the tools to indulge our whims.

A new era has begun. People are exploring ML algorithms combined with financial chaotic data to increase their confidence in making the next move on this multidimensional chessboard of greed and ego.

If you are interested in learning recipes related to validation methods for time series, feature engineering for time series, time series forecasting as reduced regression, forecasting with Meta’s Prophet, and AutoML for time series forecasting with the PyCaret library, then follow Eryk in this chapter. With nearly 70 pages of high-quality solutions and step-by-step examples, everything you need to begin with ML in Python is here. I have learned a lot from Eryk myself! Take your time to read this chapter over the course of a week or two, and then move on to Sebastian Raschka’s book on ML. Eryk’s foreground will help you get a good grip on the more advanced concepts described by Sebastian. It’s a perfect combo for all beginners to ML in Python!

Chapter 8. Multi-Factor Models

Finance conducted by banks and asset management firms would not be complete without multi-factor models. These are simple models that take into account a limited set of market observables, examine their relationships, and are used to describe a quantity under investigation as a clever proxy. Two-, three-, four-, and five-factor models are illustrated, beginning with the CAPM model.

OLS regression method is widely applied and its outcomes are very well discussed by Eryk. I have found something new here, namely notes on the Fama-MacBeth regression. In short, it aims to estimate the factor models for multiple assets at once, using cross-sectional (panel) data. Following this approach, you can (a) estimate the portfolios’ exposure to the risk factors and learn how much those factors drive the portfolios’ returns; and (b) understand how much taking a given risk is worth by knowing the premium that the market pays for the exposure to a certain factor.

Chapter 9. Modeling Volatility with GARCH Class Models

Autoregressive Integrated Moving Average cannot account for volatility that is not constant over time. Here, Box-Cox transformations can be used to adjust for modest changes in volatility. In this chapter, Eryk focuses on conditional heteroskedasticity, which is a phenomenon caused when an increase in volatility is correlated with a further increase in volatility.

He describes in great detail the history, theory, and practical aspects of modeling stock return volatility with ARCH and GARCH models, forecasting volatility using GARCH models, multivariate volatility forecasting with the CCC-GARCH model, and forecasting the conditional covariance matrix using DCC-GARCH.

The material is presented in a way that everyone can understand its importance, such as in risk management and computational risk applications. The GARCH model gives you a future estimate of the asset’s volatility. This is far different from knowing its price one step ahead, but the confidence intervals may limit your field of view, allowing for better trading decisions.

Chapter 10. Monte Carlo Simulations in Finance

Doing quantitative finance without Monte Carlo simulation is like going to a casino without money. And Monte Carlo simulation has nothing to do with the casinos in Monte Carlo to make the story more funny. It is all about using computers to help generate millions of possible scenarios of future actions that could take place, estimating outcome probabilities, and eventually making wiser decisions.

Eryk illustrates these concepts with some heavy stuff, i.e., derivative pricing models employing the Monte Carlo approach. This is a great introduction to simulations in Python! These pieces of code are a must-study.

Chapters 11-15

In the remaining chapters, the author delves into Asset Allocation, Backtesting Trading Strategies, Applied Machine Learning using credit default identification, touches on Advanced Concepts for ML Projects, and concludes with 55 pages on Deep Learning in Finance. I was highly impressed with the depth of research Eryk conducted and the effort he put into all these chapters.

Summary

In summary, the most important takeaway from Eryk’s book is the wide spectrum of Python examples accompanied by a quality description of the need, importance, meaning, and solutions that can be proposed within Python. However, I did notice that short notes on computational time constraints or requirements were missing in the light of various projects. Despite this minor point, the book is a fantastic read and a solid Python-based textbook, cookbook, and go-to book for those seeking peculiar solutions in quant finance and beyond.

Verdict

Highly recommended!

Convinced to BUY Python for Finance Cookbook?

Click HERE and order your copy via Amazon.com today!

Explore Further

→ Python for Quants. Volume I. Book by Pawel Lachowicz (2015)

Be first to know!