There are not too many creative opportunities to leave a person applying for a quant role dumbfounded with the interview question directly related to his work resume. First of all, we want to test his quant skillset, check his ability to read a code, analyse the model, listen to his judgement on the correctness of assumptions, execution, and a selection of proposed modifications. This post is based on actual events. Here is a story of Mike$^*$.
During my preps to the interview with a candidate for a leading position in risk department, I spent a quality of time analysing his 5+ page long Resume which listed a number of professional positions around the world he held. The most striking feature was that he worked within his last two financial institutions for nearly two years and two years only. Since we were looking for someone who would stay at the proposed position much longer than that, I said:
Mike, from your Resume I learned that you had been starting your job positions in Jan 2009, Jun 2012, Jun 2014, and May 2016, respectively. Now you apply for a new role. Let’s assume we will hire you in May 2018, that’s another two year long position. I can see a pattern. And that pattern worries me. I have quickly applied a Logistic Regression model to your data and with the use of Machine Learning I asked computer “What the probability that Mike will leave us in May 2020?!”. Here is what computer predicted…
Next, I handed him the following Python code with the outputs:
import pandas as pd import numpy as np import statsmodels.api as sm from sklearn.linear_model import LogisticRegression from sklearn import metrics import warnings warnings.filterwarnings("ignore") # Data taken from Mike's Resume # Mike did not work at his new workplace more than 24 months y = np.array([False, True, True, True, True]) # Timestamps when he started his new roles [years] X = np.array([2008.083, 2012.500, 2014.500, 2016.417, 2018.417]) # Logistic Regression Model logit_model = sm.Logit(y, X) result = logit_model.fit() print(result.summary2())
Optimization terminated successfully. Current function value: 0.499590 Iterations 5 Results: Logit ============================================================== Model: Logit No. Iterations: 5.0000 Dependent Variable: y Pseudo R-squared: 0.002 Date: 2018-04-17 06:29 AIC: 6.9959 No. Observations: 5 BIC: 6.6053 Df Model: 0 Log-Likelihood: -2.4979 Df Residuals: 4 LL-Null: -2.5020 Converged: 1.0000 Scale: 1.0000 ----------------------------------------------------------------- Coef. Std.Err. z P>|z| [0.025 0.975] ----------------------------------------------------------------- x1 0.0007 0.0006 1.2418 0.2143 -0.0004 0.0018 ==============================================================
# Prediction for Mike leaving a new role in May 2020 # we add May 2020 with a flag True meaning that Mike # will not work more than 24 months as a out-of-sample test set X_train, X_test, y_train, y_test = X.reshape(-1, 1), np.array([2020.417]), \ y.reshape(-1, 1), np.array([True]) logreg = LogisticRegression() logreg.fit(X_train, y_train) # training computer we recognize the patterns # prediction y_pred = logreg.predict(X_test.reshape(-1, 1)) print("\nAccuracy of logistic regression classifier on test set: \ {:.2f}%".format(100*logreg.score(X_test.reshape(-1, 1), y_test)))
Accuracy of logistic regression classifier on test set: 100.00%
I continued:
Computer tells me you gonna leave our bank in May 2020 and it is 100% sure. Given the model and code, what would you do to change the outcome? Can you tell me what is wrong with this model? Is it implemented correctly? If yes, why do you think so? If not, where is a catch?
First, Mike was laughing as he felt surprised a lot by this questions. This is that sort of challenge to approach the candidate in such a fancy way I like most. But you know, it’s always good to break the boredom and try something new.
Try to find all problems with this interview question. There are few. Drop me your answers in the comments. Good luck!
$^*$ – the name changed intentionally