Working with financial time-series, especially in trading and its following analysis of data trends or so on, we wish to rebin (resample) our data into a new time-series which would provide us with some sort of a new information on the average value of the underlying data records characterized by high volatility in time.
The following algorithm in Maltab does the job pretty well:
% Function rebins any time-series [x,y] where x is a time vector % and y stores time-series values from the current (regular or % irregular samping). A new time-series sampling is dt. % % Input: x : time vector [1xN] % y : data vector [1xN] % dy : data error vector [1xN] % dt : new rebinning time (dt > current sampling) % % (c) 2013, QuantAtRisk.com, by Pawel Lachowicz % % Example: x=1:1:1001; % assume 1001 trading days % y=rand(1,1001); % simulate pseudo data % dy=0.1*rand(1,1001); % simulate pseudo data errors (if required) % dt=50; % rebin data from 1 day sampling down to 50 day intervals % [rx,ry]=rebin(x,y,dy,dt); % or [rx,ry]=rebin(x,y,[],dt]; % plot(x,y,'color',[0.7 0.7 0.7]); % plot original time-series % hold on; plot(rx,ry,'ro-'); % plot down-rebined time-series function [rx,ry]=rebin(x,y,dy,dt) if(isempty(dy)) dy=ones(1,length(y)); end rdata=[]; j=1; k=1; t2=x(1); while(j<=length(x)) i=j; if((x(i)+dt)>x(end)) break else t2=t2+dt; end i=j; sa=0; wa=0; ea=0; il=0; while(x(i)<t2) sa=sa+(y(i)/dy(i)^2); wa=wa+(1/dy(i)^2); i=i+1; il=il+1; end ry=sa/wa; rx=t2-dt; if(il>=1) rdata=[rdata; rx ry]; end j=j+il; end rx=rdata(:,1); ry=rdata(:,2); end
Example
As an example we will use the daily rate of returns in trading of AAPL (Apple Inc.) from Jan-1984 to Apr-2011 (download Matlab’s aaplr.mat M-file). Below, the short set of command lines allow us to execute the rebinning process of return series,
clear all; close all; clc; load aaplr.mat plot(aaplR) x=1:length(aaplR); % 6855 days y=aaplR; plot(x,y,'color',[0.7 0.7 0.7]); % rebin the data with dt=25 days step (one trading month) dt=25; dy=[]; % we have no information on data errors [rx,ry]=rebin(x,y,dy,dt) % overplot results and display first 500 days hold on; plot(rx,ry,'ro-'); xlim([0 500]); ylim([-0.125 0.15]);
and plot both time-series, original and rebinned with a new bin time of 25 days for the first 500 days of trading:
where a red line denotes a new rebinned data time-series with a binning time of 25 trading days. The function (the algorithm) computes a simple weighted mean based on data points falling into an interval of $\langle t,t+dt \rangle$. If we do not specify the input data error vector of $dy$ as in the example above, we should get a simple mean as a consequence.
8 comments
Thanks, I just added last data point handling. Maybe not perfect but I think good enough.
Krzysztof
if((x(i)+dt)>x(end)) % last data point
rx=x(end);
lasty = y(x(i):end);
ry=mean(lasty);
rdata=[rdata; rx ry];
break
Looks good!!
I was playing with rebin algo and came to strange behaviour. See below.
for matrix 1;3;4;4;6;6 and dt=2 i was hoping to get
ry = 2 4 6 and rx = 2 4 6 but im getting ry = 2 4 and rx = 1 3
so it seems it missing last data point and makes a future leak. Can you comment this.
Krzysztof
>> cc=[1;3;4;4;6;6]
cc =
1
3
4
4
6
6
>> x=1:length(cc)
x =
1 2 3 4 5 6
>> dt=2
dt =
2
>> dy=[]
dy =
[]
>> [rx,ry]=rebin(x,cc,dy,dt)
rx =
1
3
ry =
2
4
>>
Good observation. You can get rx=[2 4] if you change in code a line #48 to rx=(2*t2-dt)/2. But I programmed (on purpose) this algo in the way to reject some data points at the end as defined in line #31. It was over 7 years ago and I had some reasons for doing that. You can fix this but redefining test conditions.
OK I see now, your algorithm just average the data for subperiod dt and creates new time series from those values. Initially I thought that it picks data points every dt period and makes linear regression between them so I think I know answer for my
question.
Krzysztof
I wanted to use your rebinned data for training AI algos for my FOREX system to see if it improves the results. For more info see:
http://www.trade2win.com/boards/trading-software/105880-3rd-generation-nn-deep-learning-deep-belief-nets-restricted-boltzmann-machines.html
however than i coming to the following problem for real time application. If I down sample the data with dt as you suggest (e.g. 60) than I need in real time the data values in time t+1, t+2, t+3…t+59 for my model to predict properly as it was learning on rebinned data. So how can I obtain those data points ??
Krzysztof
Hi Krzysztof, thx for your comment. The algo I described here works as an up-sampler only, so if you have unevenly sampled FX tick time-series and you want to get a new one but sampled every 60 sec, you can use this algo. Yes, it averages all data points every new (e.g.) dt=60 sec. It has some extra value added because you can overplot the standard deviation for each new bin.