Medical Systems Biology

Software
References

Software

Download

The C++ sources of CANTATA can be downloaded here.

A precompiled Windows executable is available here.

A precompiled Mac OS X universal binary (Power PC/Intel) is available here.

Compiling

Windows and Mac OS X users can skip this step by downloading the above precompiled binary and extracting it to the desired location.

We recommend compiling the source code using the GCC compiler or MinGW for Windows. On Mac OS, the GCC compiler delivered with XCode Tools. The code also compiles with Microsoft Visual C++.

For older versions of GCC (prior to 4.2) or for Visual C++, you may additionally require the Boost libraries.

If using GCC/MinGW, the code can be compiled using the included Makefile:

Extract the source code to the target directory <target>.
Open a shell (or a Windows command prompt) and change to the directory using cd <target>.
Compile the sources by typing make. If using GCC prior to 4.2 and Boost, you should supply the Boost include path using make -e INCLUDE=<path>.
There should now be a subdirectory bin containing the executable file cantata (or cantata.exe). This executable can be copied to a location in the search path.

Usage — A short tutorial

The following examples outline the usage of the program based on the mammalian cell cycle network [1]. The files used in this tutorial can be downloaded here. The ZIP archive contains three files:

cellcycle.txt: The original mammalian cell cycle network model (Fauré et al. [1]) in the BoolNet network file format. In short, each line of the file describes the dependencies of one gene, starting with the target gene, and followed by a separator sign (,) and the Boolean transition function. The transition function consists of gene names, combined by the Boolean operators AND (&) and OR (|). Genes or expressions can be negated by the ! sign.
cellcycle_truncated.txt: A "draft" version of the cell cycle model in which one important dependency has been deleted: The transition function for gene CycB has been modified from
```
CycB, (! Cdc20 & ! Cdh1)
```
to
```
CycB, (! Cdc20)
```
That is, CycB lacks the dependency on Cdh1, which changes the dynamic behaviour of the model.
cellcycle_rules.txt: The rule file specifying the desired dynamics of the network for reconstruction. In this case, it contains two rules specifying the steady-state attractor and the 7-state cycle:
```
# The steady-state attractor
Attractor:
Initial condition:
!CycD
State specifications:
!CycD Rb !E2F !CycE !CycA p27 !Cdc20 Cdh1 !UbcH10 !CycB

# The 7-state cycle
Attractor:
Initial condition:
CycD
State specifications:
CycD !Rb !E2F CycE CycA !p27 !Cdc20 !Cdh1 !UbcH10 !CycB
CycD !Rb !E2F !CycE CycA !p27 !Cdc20 !Cdh1 UbcH10 CycB
CycD !Rb !E2F !CycE CycA !p27 Cdc20 !Cdh1 UbcH10 CycB
CycD !Rb !E2F !CycE !CycA !p27 Cdc20 Cdh1 UbcH10 !CycB
CycD !Rb E2F !CycE !CycA !p27 !Cdc20 Cdh1 UbcH10 !CycB
CycD !Rb E2F CycE !CycA !p27 !Cdc20 Cdh1 !UbcH10 !CycB
CycD !Rb E2F CycE CycA !p27 !Cdc20 Cdh1 !UbcH10 !CycB
```
Each rule starts with the keywords Attractor: or Chain: specifying the type of the rule (attractor or time series).
The next section, Initial condition:, specifies which start states should yield the corresponding attractor (or chain). The condition is a space-separated list of gene values. If a gene name is preceded by !, the gene is inactive (0), otherwise it is active (1). In this case, all states with CycD=1 yield the 7-state attractor, and all states with CycD=0 yield the steady-state attractor.
The section State specifications: denotes the beginning of the state specification list. Each of the following lines describes the desired gene values for one attractor state, or for several successive attractor states if not all genes are specified. The format of a specification entry is the same as the initial condition.

In the following, we assume that these three files have been extracted to a directory, and that cantata is in your search path.

First, we can check whether the network models match these expectations. We first verify the original ("true") network model. Open a shell/command prompt, change to the directory where the network files are located, and type:

cantata --validate -n cellcycle.txt -r cellcycle_rules.txt

Here, --validate tells the program to validate a model specified in parameter -n according to a list of rules specified in parameter -r.

Input network:
CycD = CycD
Rb = ((!CycA & !CycB & !CycD & !CycE) | (p27 & !CycB & !CycD))
E2F = ((!Rb & !CycA & !CycB) | (p27 & !Rb & !CycB))
CycE = (E2F & !Rb)
CycA = ((E2F & !Rb & !Cdc20 & !(Cdh1 & UbcH10)) | (CycA & !Rb & !Cdc20 & !(Cdh1 & UbcH10)))
p27 = ((!CycD & !CycE & !CycA & !CycB) | (p27 & !(CycE & CycA) & !CycB & !CycD))
Cdc20 = CycB
Cdh1 = ((!CycA & !CycB) | Cdc20 | (p27 & !CycB))
UbcH10 = (!Cdh1 | (Cdh1 & UbcH10 & (Cdc20 | CycA | CycB)))
CycB = (!Cdc20 & !Cdh1)

Input network file:           cellcycle.txt
Rule file:                    cellcycle_rules.txt
Random seed:                  1315472969
Max. number of start states:  100000
Max. number of transitions:   1000

Violations of the rule set:

Rule 1:
(no violations)
Rule 2:
(no violations)
Finished

As expected, the true network obeys to both rules, i.e. it has the two specified attractors.

We now validate the perturbed network draft:

cantata --validate -n cellcycle_truncated.txt -r cellcycle_rules.txt -c 5

Input network:
CycD = CycD
Rb = ((!CycA & !CycB & !CycD & !CycE) | (p27 & !CycB & !CycD))
E2F = ((!Rb & !CycA & !CycB) | (p27 & !Rb & !CycB))
CycE = (E2F & !Rb)
CycA = ((E2F & !Rb & !Cdc20 & !(Cdh1 & UbcH10)) | (CycA & !Rb & !Cdc20 & !(Cdh1 & UbcH10)))
p27 = ((!CycD & !CycE & !CycA & !CycB) | (p27 & !(CycE & CycA) & !CycB & !CycD))
Cdc20 = CycB
Cdh1 = ((!CycA & !CycB) | Cdc20 | (p27 & !CycB))
UbcH10 = (!Cdh1 | (Cdh1 & UbcH10 & (Cdc20 | CycA | CycB)))
CycB = !Cdc20

Input network file:           cellcycle_truncated.txt
Rule file:                    cellcycle_rules.txt
Random seed:                  1315484034
Max. number of start states:  100000
Max. number of transitions:   1000

Violations of the rule set:

Rule 1:

Attractor matching 1 (using alternative 1):
Attractor           	=>	Specifications      
0 0 0 0 0 0 1 1 1 0 	=>	0 1 0 0 0 1 0 1 0 0  (State spec. 1)
0 1 1 0 0 1 0 1 1 0 	=>	0 1 0 0 0 1 0 1 0 0  (State spec. 1)
0 1 0 0 0 1 0 1 0 1 	=>	0 1 0 0 0 1 0 1 0 0  (State spec. 1)
0 0 0 0 0 0 1 0 0 1 	=>	0 1 0 0 0 1 0 1 0 0  (State spec. 1)

Violations caused by this matching:
State specification 1: Rb != 1
State specification 1: E2F != 0
State specification 1: p27 != 1
State specification 1: Cdc20 != 0
State specification 1: Cdh1 != 1
State specification 1: UbcH10 != 0
State specification 1: CycB != 0

Start states yielding this matching:
	0 0 0 0 0 0 0 0 0 0 
	0 0 0 0 0 0 0 0 0 1 
	0 0 0 0 0 0 0 0 1 0 
	0 0 0 0 0 0 0 0 1 1 
	0 0 0 0 0 0 0 1 0 0 
	(... further 507 states ...)

Rule 2:

Attractor matching 1 (using alternative 1):
Attractor           	=>	Specifications      
1 0 0 0 0 0 1 1 1 0 	=>	1 0 0 0 0 0 1 1 1 0  (State spec. 4)
1 0 1 0 0 0 0 1 1 0 	=>	1 0 1 0 0 0 0 1 1 0  (State spec. 5)
1 0 1 1 0 0 0 1 0 1 	=>	1 0 1 1 0 0 0 1 0 0  (State spec. 6)
                    	=>	1 0 1 1 1 0 0 1 0 0  (State spec. 7)
                    	=>	1 0 0 1 1 0 0 0 0 0  (State spec. 1)
                    	=>	1 0 0 0 1 0 0 0 1 1  (State spec. 2)
1 0 0 1 1 0 1 0 0 1 	=>	1 0 0 0 1 0 1 0 1 1  (State spec. 3)

Violations caused by this matching:
State specification 3: CycE != 0
State specification 3: UbcH10 != 1
State specification 6: CycB != 0
Start states yielding this matching:
	1 0 0 0 0 0 0 0 0 0 
	1 0 0 0 0 0 0 0 0 1 
	1 0 0 0 0 0 0 0 1 0 
	1 0 0 0 0 0 0 0 1 1 
	1 0 0 0 0 0 0 1 0 0 
	(... further 507 states ...)
Finished

Here, the program first prints the attractors of the network and their optimal matchings with the rules. We can see that the perturbed network draft has two 4-state attractors that are matched with the 1-state attractor and the 7-state attractor of the original network. Below each matching, the violations are listed. In this example, there are many violations. CANTATA prints the the number of the state specification and the violating gene for each violation. Furthermore, the start states that lead to the matching and cause the violations are printed. By specifying -c 5, we tell the program to print only the first 5 violating states.

Now, let us try to reconstruct the true model from the disrupted network draft using the CANTATA optimization algorithm. This algorithm is started using the main option --optimize:

cantata --optimize -n cellcycle_truncated.txt -r cellcycle_rules.txt -o result.txt -ni 1000

The parameter -o result.txt tells the program to write the results to a file result.txt. We set the number of iterations to 1000 (which is the default value) using -ni 1000. When the optimization process is complete, the file result.txt contains a header summarizing the algorithm's configuration, followed by a list of candidate networks with their three objective values, e.g.

Input network file:           cellcycle_truncated.txt
Rule file:                    cellcycle_rules.txt
Random seed:                  1316096208
Population size:              100
Number of offspring:          200
Fract. of injected nets:      0.1
Neg. every i-th offspring:    50
Number of generations:        1000
Number of restarts:           1
Initial mutations:            1
Epsilon:                      0.0005
Weights of topology scores:   0.25/0.25/0.5
Max. number of start states:  200
Max. number of transitions:   100

Best candidate networks: 

CycD = CycD
Rb = ((!CycA & !CycB & !CycD & !CycE) | (p27 & !CycB & !CycD))
E2F = ((!Rb & !CycA & !CycB) | (p27 & !Rb & !CycB))
CycE = (E2F & !Rb)
CycA = ((E2F & !Rb & !Cdc20 & !(Cdh1 & UbcH10)) | (CycA & !Rb & !Cdc20 & !(Cdh1 & UbcH10)))
p27 = ((!CycD & !CycE & !CycA & !CycB) | (p27 & !(CycE | CycA) & !CycB & !CycD))
Cdc20 = CycB
Cdh1 = ((!CycA & !CycB) | Cdc20 | (p27 & !CycB))
UbcH10 = (!Cdh1 | (Cdh1 & UbcH10 & (Cdc20 | CycA | CycB)))
CycB = (!Cdc20 & !Cdh1)
Fitness: 0 0.24914 0.098 Run: 1 Generation: 419

CycD = CycD
Rb = ((!CycA & !CycB & !CycD & !CycE) | (!CycB & !CycD & p27))
E2F = ((!Rb & !CycA & !CycB) | (p27 & !Rb & !CycB))
CycE = (E2F & !Rb)
CycA = ((!Rb & !Cdc20 & !(Cdh1 & UbcH10) & E2F) | (CycA & !Rb & !Cdc20 & !(Cdh1 & UbcH10)))
p27 = ((!CycD & !CycE & !CycA & !CycB) | (p27 & !(CycE & CycA) & !CycB & !CycD))
Cdc20 = CycB
Cdh1 = ((!CycA & !CycB) | Cdc20 | (p27 & !CycB))
UbcH10 = (!Cdh1 | (Cdh1 & UbcH10 & (Cdc20 | CycA | CycB)))
CycB = (!Cdc20 & !Cdh1)
Fitness: 0 0.24914 0.108 Run: 1 Generation: 641

Finished

In the example printed here, the first fitness value of both resulting candidate network models is 0, which indicates that the networks match the rules perfectly. In this case, the second resulting network is equivalent to the true network, which means that the deleted dependency of CycB on Cdh1 was reconstructed and no further changes were applied. The first candidate also recovers this dependency, but changes an & to a | in the function for p27. Depending on the random initialization of the algorithm, you might get a different result when running the example.

The result file is not readable directly by BoolNet, as it contains multiple candidate network models and additional annotation (the header and objective scores). If the candidate networks should be analyzed in BoolNet, CANTATA can write them to separate network files. For example,

cantata --optimize -n cellcycle_truncated.txt -r cellcycle_rules.txt -o result.txt -on candidate_%d.txt -me 0

writes all candidates that match the rules perfectly to files candidate_0.txt, candidate_1.txt, candidate_2.txt, ..., i.e. the %d marker is replaced by a running number. The parameter -me sets a threshold for the first objective, i.e. only files with a score in the first objective that is less than or equal to this value are written to files. As we set this error to 0 (which is the default value), only candidate network models that match the rule set perfectly are written to files.

This tutorial covers only parts of the options available in CANTATA. A full description of the command line options and the file formats is available here.

References

[1] Fauré A., Naldi, A., Chaouiya, C., and Thieffry, D. (2006). Dynamical analysis of a generic Boolean model for the control of the mammalian cell cycle. Bioinformatics, 22(14), e124-e131.

Contents

Software

Download

Compiling

Usage — A short tutorial

References