Supplementary material for our paper
"Improvements in the reconstruction of time-varying
gene regulatory networks: dynamic programming and 
regularization by information sharing among genes" 
(submitted to Bioinformatics, 2010)

Authors: Marco Grzegorczyk and Dirk Husmeier 

This ReadMe.txt file describes how our Matlab implementaion 
of the regularized cpBGe model can be used.
 
The cpBGe model is a 'class 2' model, which was proposed in
Grzegorczyk and Husmeier (NIPS 22, 2009, 682-690). 
Details are given in our supplementary paper.

The implementation here is the one proposed in our 
Bioinformatics main paper with 2 methodological improvements 
based on dynamic programming and regularization by 
information sharing among genes.

This ReadMe.txt file consists of two parts:
Part 1 deals with the input and output arguments. 
Part 2 gives a worked toy example.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% PART 1 - OUR IMPLEMENTATION
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Note that our Matlab implementation of the regularized 
cpBGe model requires the 'Statistics Toolbox'
for Matlab. There is only one function that has the 
following synthax:

[Run] = PROC_cpBGe_regularized(data, steps, p, k);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Input arguments:

1) data is the data matrix where each row corresponds to
a network node (gene) and each column corresponds to a time point. 
E.g.: data(i,t) is the realisation of node X_i at time point t

2) 'steps' is the number of Gibbs sampling steps to be performed

3) 'p' and 'k' are the hyperparameters of the negative binomial 
distribution underlying the point process prior for changepoints
(0<p<1, k=1,2,3,...)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Output arguments:
The function outputs the Matlab structure 'Run'
in which the trajectory of the accomplished Gibbs sampler
is stored.

For i=1,...,(steps+1):

Run.dag{i}
Run.matrix{i}
Run.log_score(i)
Run.Node_Vec{i}
Run.steps

can be accessed.


For i=1 our implementation stores the
initialisation, which is:

- the empty graph (without any edges) stored in 
Run.dag{1} % matrix of zeros

- all nodes being clustered to different clusters stored in
Run.Node_Vec{1} % vector [1,2,...,N] where N is the number of nodes and each index corresponds to a cluster 
 
- there aren't any changepoints (for each cluster c=1,...,N)
Run.matrix{1} % matrix of ones

- the log-score (i.e.: log(likelihood*prior)) of the initialisation is given by:
Run.log_score(1)

We decided to store the initialisation, as one may decide to 
employ different initialisations. From the Run file the initialisation
can always be seen.  




For i = 2,...,(steps+1)
the trajectory of the accomplished Gibbs sampler have been stored.
 
Run.dag{i}
Run.matrix{i}
Run.log_score(i)
Run.Node_Vec{i}

Please note that i refers to the (i-1)-th Gibbs sample, since i=1 
is used for storing the initialisation.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Now let's have a closer look at the trajectory: 
What has been sampled in the i-th (i>1) step?

We have to look at:

LOG_SCORE = Run.log_score(i-1); % see 1)
DAG       = Run.dag{i-1};	% see 2)
NODE_VEC  = Run.Node_Vec{i-1};  % see 3)
MATRIX    = Run.matrix{i-1};	% see 4)


1)
LOG_SCORE
is the log-score (i.e. log(likelihood*prior)) of the
state (DAG,NODE_VEC,MATRIX) where

2)
DAG is a N-by-N matrix where N is the number of nodes
DAG(k,j)=1 means that there was an edge from X_k to X_j
DAG(k,j)=0 means that there was no edge from X_k to X_J
in the i-th sampled graph

3)
NODE_VEC is a vector of length N where N is the number of nodes. 
NODE_VEC(j) is the cluster to which the j-th node has been
clustered


MATRIX is the N-by-(m-1) allocation vector matrix where
N is the number of nodes and 
m is the number of observations (time points). 

Please note that in a dynamic Bayesian network (with a time 
lag of 1) the first time point t=1 cannot be scored, since 
there are no realisations of the potential parent nodes at
t=0. That is why MATRIX has (m-1) columns only. The first column
corresponds to the second time point t=2 and the last column 
corresponds to the last time point t=m.


4)
The i-th row of MATRIX gives the allocation vector 
of the i-th cluster.

E.g.:
MATRIX(i,t)=k means that the (t+1)-th time point has been allocated
to the k-th mixture component (segment) for the i-th cluster.

To find out which nodes belong to the k-th cluster it can be used:

nodes_in_cluster_k = find(NODE_VEC==k);

Please note that 'nodes_in_cluster_k' may be an empty set, if there is no
node in cluster k. This happens if the number of clusters c is lower than the number of nodes,
symbolically c<N. 
If the k-th cluster is empty, then the k-th row of MATRIX is 
redundant. Just for convenience we decided not to remove rows 
that correspond to empty clusters. Rows belonging to empty clusters 
are simply left unchanged until the cluster becomes occupied again. 
This does not cause any problems, because whenever an empty cluster is 
re-occupied with nodes, the allocation vector will be re-sampled and the 
new vector replaces the redundant k-th row.

To avoid confusion it should be mentioned in this context
that label switching is NO issue in our regularized cpBGe 
model. The quantities of interest for the prediction (clusters, 
changepoints, and network structures) are label invariant, in 
the sense of Jasra et al. (Statistical Science 2005, Vol. 20, 50-67); 
hence identifiability problems do NOT arise. 

% END OF PART 1



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% PART 2 - WORKED EXAMPLE
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Two very simple toy examples. The following script can 
% be copied and pasted to the Matlab command window:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% We assume that there are N=3 network nodes X1,X2, and X3 
% in a network with two edges:
% X1 -> X2
% X1 -> X3


% and that we have sampled (X1,X2,X3) at 
% m=20 time points.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% To make the results re-producible:
rand('state',13234);

% First, we generate some 'class 1' data:
% We assume that there is one single changepoint at t=11
% that is common to all three nodes X1, X2, and X3 
% ('class 1 data')

data_1 = zeros(3,20);

% Data for X1 without parents

data_1(1,:) = [randn(1,10)-2,randn(1,10)+2]; 

data_1(2,1) = randn(1);
data_1(3,1) = randn(1);

for t=2:10
data_1(2,t) = (-1)*data_1(1,t-1) + 0.3*randn(1);
data_1(3,t) = (-2)*data_1(1,t-1) + 0.3*randn(1);
end

% cp at t=10 (simply switch the signs of the coefficients)

for t=11:20
data_1(2,t) = (+1)*data_1(1,t-1) + 0.3*randn(1);
data_1(3,t) = (+2)*data_1(1,t-1) + 0.3*randn(1);
end


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% We also generate some 'class 2 data':
% We assume that 
% (i) there is a changepoint at t=11 that is common to X1 and X2, and
% (ii)there are two changepoints for X3 at t=6 and t=15 
% ('class 2 data')

data_2 = zeros(3,20);

% Data for X1 without parents

data_2(1,:) = [randn(1,10)+1,randn(1,10)-1]; 

data_2(2,1) = randn(1);
data_2(3,1) = randn(1);

% Data for X2 without parents

for t=2:10
data_2(2,t) = (-1)*data_2(1,t-1) + 0.3*randn(1);
end

% For X2 there is a cp after t=10 

for t=11:20
data_2(2,t) = (+1)*data_2(1,t-1) + 0.3*randn(1);
end

for t=2:10
data_2(2,t) = (-1)*data_2(1,t-1) + 0.3*randn(1);
end

% Data for X3 
% there are two changepoints at t=6 and t=15 

for t=2:6
data_2(3,t) = (+2)*data_2(1,t-1) + 0.3*randn(1);
end

for t=7:15
data_2(3,t) = (-2)*data_2(1,t-1) + 0.3*randn(1);
end

for t=16:20
data_2(3,t) = (+2)*data_2(1,t-1) + 0.3*randn(1);
end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Set the parameters of the negative binomial...

p = 0.0001;
k = 2;

% ...and let's perform 10 Gibbs sampling steps:
steps = 10;

% Run the Gibbs sampler for data_1 and data_2:


[Run_1] = PROC_cpBGe_regularized(data_1, steps, p, k);
[Run_2] = PROC_cpBGe_regularized(data_2, steps, p, k);

% During the MCMC simulation the step index i can be seen
% on the screen (i=1 to i=steps+1)

% The MCMC simulations for data_1 and data_2 should have 
% finished in some minutes.


% We look at the marginal edge posterior probabilities:

[N,m] = size(data_1);

DAG_1 = zeros(N,N);
DAG_2 = zeros(N,N);

for i=2:(steps+1)
DAG_1 = DAG_1 + Run_1.dag{i};
DAG_2 = DAG_2 + Run_2.dag{i};
end

DAG_1 = DAG_1/steps;
DAG_2 = DAG_2/steps;

DAG_1

DAG_2

% It can be seen that the correct edges X1->X2 and X1->X3
% have obtained the greatest posterior probabilities.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% There are three plot types that may be helpful:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% First, we monitor the log scores
figure(1)
clf
subplot(2,1,1)
plot(Run_1.log_score)
subplot(2,1,2)
plot(Run_2.log_score)

% For both data sets a plateau is reached. There is no indication
% AGAINST convergence.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Second, we look at a heatmap of the connectivity structures 
% between nodes. That is, we want to visualise the clusters of 
% nodes:

C_1 = zeros(N,N);
C_2 = zeros(N,N);

    
for i=2:(steps+1)
    
     VEC_1 = Run_1.Node_Vec{i};
     VEC_2 = Run_2.Node_Vec{i};
    
        for j=1:3
            for k=1:3
                
                c1_j = VEC_1(j);
                c1_k = VEC_1(k);

   		c2_j = VEC_2(j);
                c2_k = VEC_2(k);
       
                if(c1_j==c1_k)
                    C_1(j,k) = C_1(j,k)+1;
                end
    
                if(c2_j==c2_k)
                    C_2(j,k) = C_2(j,k)+1;
                end

            end
        end
    
    
end


C_1 = C_1/steps;
C_2 = C_2/steps;

figure(2)
clf

subplot(1,2,1)
imagesc(C_1,[0,1]);
colormap(gray)

subplot(1,2,2)
imagesc(C_2,[0,1]);
colormap(gray)

% In the second figure there are two panels with heatmaps
% for data_1 (left panel) and data_2 (right panel). 
% A grey shading is used to indicate 
% the connectivity probabilities between nodes 
% (black=0 and C=1 white) 

  
% From the left panel (for data_1) it can be seen that the three
% nodes have been clustered to the same cluster. The heatmap is 
% white indicating that there are high connectivities between the
% three nodes (i.e the number of clusters is c=1).

% From the right panel (for data_2) it can be seen that the third
% node is seperated from the first two nodes X1 and X2. That is, 
% the nodes X1 and X2 build the first cluster, and X3 is seperated 
% from them in a second cluster. It follows that the number of 
% clusters is c=2.

% We note that these findings are consistent with the data #
% generation mechanisms described above.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Finally, we want to find out which changepoints have been inferred for the nodes
% in the clusters

[N,m] = size(data_1);

m = m-1; % the first time point t=1 is not allocated

C1_1 = zeros(m,m);
C1_2 = zeros(m,m);
C1_3 = zeros(m,m);

C2_1 = zeros(m,m);
C2_2 = zeros(m,m);
C2_3 = zeros(m,m);


for i=2:(steps+1)
    
     	mat_1        = Run_1.matrix{i};
     	vec_1        = Run_1.Node_Vec{i};

	mat_2        = Run_2.matrix{i};
     	vec_2        = Run_2.Node_Vec{i};

     	index_1_1    = vec_1(1);
	index_1_2    = vec_1(2);
	index_1_3    = vec_1(3);

	index_2_1    = vec_2(1);
	index_2_2    = vec_2(2);
	index_2_3    = vec_2(3);


     	VEC_1_1      = mat_1(index_1_1,:);
     	VEC_1_2      = mat_1(index_1_2,:);
	VEC_1_3      = mat_1(index_1_3,:);

	VEC_2_1      = mat_2(index_2_1,:);
     	VEC_2_2      = mat_2(index_2_2,:);
	VEC_2_3      = mat_2(index_2_3,:);

        for x=1:m
            for y=1:m
       
                if(VEC_1_1(x)==VEC_1_1(y))
                    C1_1(x,y) = C1_1(x,y)+1;
                end
    

                if(VEC_1_2(x)==VEC_1_2(y))
                    C1_2(x,y) = C1_2(x,y)+1;
                end
    


                if(VEC_1_3(x)==VEC_1_3(y))
                    C1_3(x,y) = C1_3(x,y)+1;
                end
    


                if(VEC_2_1(x)==VEC_2_1(y))
                    C2_1(x,y) = C2_1(x,y)+1;
                end
    

                if(VEC_2_2(x)==VEC_2_2(y))
                    C2_2(x,y) = C2_2(x,y)+1;
                end
    
               
                if(VEC_2_3(x)==VEC_2_3(y))
                    C2_3(x,y) = C2_3(x,y)+1;
                end
    
            end
        end
    
    
end




C1_1 = C1_1/steps;
C1_2 = C1_2/steps;
C1_3 = C1_3/steps;

C2_1 = C2_1/steps;
C2_2 = C2_2/steps;
C2_3 = C2_3/steps;


figure(3)
clf

subplot(3,2,1)
imagesc(C1_1,[0,1]);
colormap(gray)
subplot(3,2,2)
imagesc(C2_1,[0,1]);
colormap(gray)

subplot(3,2,3)
imagesc(C1_2,[0,1]);
colormap(gray)
subplot(3,2,4)
imagesc(C2_2,[0,1]);
colormap(gray)

subplot(3,2,5)
imagesc(C1_3,[0,1]);
colormap(gray)
subplot(3,2,6)
imagesc(C2_3,[0,1]);
colormap(gray)


% The plot is arranged as a 3-by-2 matrix. 
% The first column corresponds to data_1, and 
% the second column corresponds to data_2.

% The rows correspond to the three network nodes:
% first row  -> X1
% second row -> X2
% third row  -> X3

% In each of the six panels there is a heatmap,
% and a grey shading is used to indicate 
% the connectivity probabilities between time points
% (black=0 and white=1).



% From the left column (for data_1) it can be seen that 
% the three nodes X1, X2 and X3 share the same changepoint 
% at t=10. 

% From the right column (for data_2) it can be seen that 
% the first two nodes X1 and X2  share the same changepoint 
% at t=10. For node X3 two changepoint at t=6 and t=15 have
% been inferred.



% From the right panel (for data_2) it can be seen that the third
% node is seperated from the first two nodes X1 and X2. That is, 
% the nodes X1 and X2 build the first cluster, and X3 is seperated 
% from them in a second cluster. It follows that the number of 
% clusters is c=2.

% We note that these findings are consistent with the data 
% generation mechanisms described above.
% data_1 -> class 1 data (all nodes share the same changepoint at t=10)
% data_2 -> class 2 data 
% (the nodes X1 and X2 share the same changepoint at t=10 and
% node X3 has two changepoints at t=6 and t=15).

% END OF WORKED EXAMPLE