Financial Time-Series Rebinning

Working with financial time-series, especially in trading and its following analysis of data trends or so on, we wish to rebin (resample) our data into a new time-series which would provide us with some sort of a new information on the average value of the underlying data records characterized by high volatility in time.

The following algorithm in Maltab does the job pretty well:

% Function rebins any time-series [x,y] where x is a time vector
%   and y stores time-series values from the current (regular or
%   irregular samping). A new time-series sampling is dt.
%
%  Input: x  : time vector [1xN]
%         y  : data vector [1xN]
%         dy : data error vector [1xN]
%         dt : new rebinning time (dt > current sampling)
%
% (c) 2013, QuantAtRisk.com, by Pawel Lachowicz
%
% Example: x=1:1:1001; % assume 1001 trading days
%          y=rand(1,1001); % simulate pseudo data
%          dy=0.1*rand(1,1001); % simulate pseudo data errors (if required)
%          dt=50; % rebin data from 1 day sampling down to 50 day intervals
%          [rx,ry]=rebin(x,y,dy,dt);
%      or  [rx,ry]=rebin(x,y,[],dt];
%          plot(x,y,'color',[0.7 0.7 0.7]); % plot original time-series
%          hold on; plot(rx,ry,'ro-'); % plot down-rebined time-series

function [rx,ry]=rebin(x,y,dy,dt)
    if(isempty(dy))
        dy=ones(1,length(y));
    end
    rdata=[];
    j=1;
    k=1;
    t2=x(1);
    while(j<=length(x))
        i=j;
        if((x(i)+dt)>x(end))
            break
        else
            t2=t2+dt;
        end
        i=j;
        sa=0;
        wa=0;
        ea=0;
        il=0;
        while(x(i)<t2)
            sa=sa+(y(i)/dy(i)^2);
            wa=wa+(1/dy(i)^2);
            i=i+1;
            il=il+1;
        end
        ry=sa/wa;
        rx=t2-dt;
        if(il>=1)
            rdata=[rdata; rx ry];
        end
        j=j+il;
    end
    rx=rdata(:,1);
    ry=rdata(:,2);
end

Example

As an example we will use the daily rate of returns in trading of AAPL (Apple Inc.) from Jan-1984 to Apr-2011 (download Matlab’s aaplr.mat M-file). Below, the short set of command lines allow us to execute the rebinning process of return series,

clear all; close all; clc;

load aaplr.mat
plot(aaplR)

x=1:length(aaplR); % 6855 days
y=aaplR;
plot(x,y,'color',[0.7 0.7 0.7]);

% rebin the data with dt=25 days step (one trading month)
dt=25;
dy=[]; % we have no information on data errors

[rx,ry]=rebin(x,y,dy,dt)

% overplot results and display first 500 days
hold on; plot(rx,ry,'ro-');
xlim([0 500]);
ylim([-0.125 0.15]);

and plot both time-series, original and rebinned with a new bin time of 25 days for the first 500 days of trading:

rb

where a red line denotes a new rebinned data time-series with a binning time of 25 trading days. The function (the algorithm) computes a simple weighted mean based on data points falling into an interval of $\langle t,t+dt \rangle$. If we do not specify the input data error vector of $dy$ as in the example above, we should get a simple mean as a consequence.

8 comments
  1. Thanks, I just added last data point handling. Maybe not perfect but I think good enough.

    Krzysztof

    if((x(i)+dt)>x(end)) % last data point

    rx=x(end);
    lasty = y(x(i):end);
    ry=mean(lasty);
    rdata=[rdata; rx ry];

    break

  2. I was playing with rebin algo and came to strange behaviour. See below.

    for matrix 1;3;4;4;6;6 and dt=2 i was hoping to get
    ry = 2 4 6 and rx = 2 4 6 but im getting ry = 2 4 and rx = 1 3
    so it seems it missing last data point and makes a future leak. Can you comment this.

    Krzysztof

    >> cc=[1;3;4;4;6;6]
    cc =
    1
    3
    4
    4
    6
    6
    >> x=1:length(cc)
    x =
    1 2 3 4 5 6
    >> dt=2
    dt =
    2
    >> dy=[]
    dy =

    []

    >> [rx,ry]=rebin(x,cc,dy,dt)

    rx =

    1
    3

    ry =

    2
    4

    >>

    1. Good observation. You can get rx=[2 4] if you change in code a line #48 to rx=(2*t2-dt)/2. But I programmed (on purpose) this algo in the way to reject some data points at the end as defined in line #31. It was over 7 years ago and I had some reasons for doing that. You can fix this but redefining test conditions.

  3. OK I see now, your algorithm just average the data for subperiod dt and creates new time series from those values. Initially I thought that it picks data points every dt period and makes linear regression between them so I think I know answer for my
    question.

    Krzysztof

  4. I wanted to use your rebinned data for training AI algos for my FOREX system to see if it improves the results. For more info see:

    http://www.trade2win.com/boards/trading-software/105880-3rd-generation-nn-deep-learning-deep-belief-nets-restricted-boltzmann-machines.html

    however than i coming to the following problem for real time application. If I down sample the data with dt as you suggest (e.g. 60) than I need in real time the data values in time t+1, t+2, t+3…t+59 for my model to predict properly as it was learning on rebinned data. So how can I obtain those data points ??

    Krzysztof

    1. Hi Krzysztof, thx for your comment. The algo I described here works as an up-sampler only, so if you have unevenly sampled FX tick time-series and you want to get a new one but sampled every 60 sec, you can use this algo. Yes, it averages all data points every new (e.g.) dt=60 sec. It has some extra value added because you can overplot the standard deviation for each new bin.

Leave a Reply

Your email address will not be published. Required fields are marked *