The C++ sources of CANTATA can be downloaded here.
A precompiled Windows executable is available here.
A precompiled Mac OS X universal binary (Power PC/Intel) is available here.
Windows and Mac OS X users can skip this step by downloading the above precompiled binary and extracting it to the desired location.
We recommend compiling the source code using the GCC compiler or MinGW for Windows. On Mac OS, the GCC compiler delivered with XCode Tools. The code also compiles with Microsoft Visual C++.
For older versions of GCC (prior to 4.2) or for Visual C++, you may additionally require the Boost libraries.
If using GCC/MinGW, the code can be compiled using the included Makefile:
The following examples outline the usage of the program based on the mammalian cell cycle network [1]. The files used in this tutorial can be downloaded here. The ZIP archive contains three files:
CycB, (! Cdc20 & ! Cdh1)to
CycB, (! Cdc20)That is, CycB lacks the dependency on Cdh1, which changes the dynamic behaviour of the model.
# The steady-state attractor Attractor: Initial condition: !CycD State specifications: !CycD Rb !E2F !CycE !CycA p27 !Cdc20 Cdh1 !UbcH10 !CycBEach rule starts with the keywords Attractor: or Chain: specifying the type of the rule (attractor or time series).
# The 7-state cycle Attractor: Initial condition: CycD State specifications: CycD !Rb !E2F CycE CycA !p27 !Cdc20 !Cdh1 !UbcH10 !CycB CycD !Rb !E2F !CycE CycA !p27 !Cdc20 !Cdh1 UbcH10 CycB CycD !Rb !E2F !CycE CycA !p27 Cdc20 !Cdh1 UbcH10 CycB CycD !Rb !E2F !CycE !CycA !p27 Cdc20 Cdh1 UbcH10 !CycB CycD !Rb E2F !CycE !CycA !p27 !Cdc20 Cdh1 UbcH10 !CycB CycD !Rb E2F CycE !CycA !p27 !Cdc20 Cdh1 !UbcH10 !CycB CycD !Rb E2F CycE CycA !p27 !Cdc20 Cdh1 !UbcH10 !CycB
In the following, we assume that these three files have been extracted to a directory, and that cantata is in your search path.
First, we can check whether the network models match these expectations. We first verify the original ("true") network model. Open a shell/command prompt, change to the directory where the network files are located, and type:
cantata --validate -n cellcycle.txt -r cellcycle_rules.txt
Here, --validate tells the program to validate a model specified in parameter -n according to a list of rules specified in parameter -r.
Input network: CycD = CycD Rb = ((!CycA & !CycB & !CycD & !CycE) | (p27 & !CycB & !CycD)) E2F = ((!Rb & !CycA & !CycB) | (p27 & !Rb & !CycB)) CycE = (E2F & !Rb) CycA = ((E2F & !Rb & !Cdc20 & !(Cdh1 & UbcH10)) | (CycA & !Rb & !Cdc20 & !(Cdh1 & UbcH10))) p27 = ((!CycD & !CycE & !CycA & !CycB) | (p27 & !(CycE & CycA) & !CycB & !CycD)) Cdc20 = CycB Cdh1 = ((!CycA & !CycB) | Cdc20 | (p27 & !CycB)) UbcH10 = (!Cdh1 | (Cdh1 & UbcH10 & (Cdc20 | CycA | CycB))) CycB = (!Cdc20 & !Cdh1)
Input network file: cellcycle.txt Rule file: cellcycle_rules.txt Random seed: 1315472969 Max. number of start states: 100000 Max. number of transitions: 1000
Violations of the rule set:
Rule 1: (no violations) Rule 2: (no violations) Finished
As expected, the true network obeys to both rules, i.e. it has the two specified attractors.
We now validate the perturbed network draft:
cantata --validate -n cellcycle_truncated.txt -r cellcycle_rules.txt -c 5
Input network: CycD = CycD Rb = ((!CycA & !CycB & !CycD & !CycE) | (p27 & !CycB & !CycD)) E2F = ((!Rb & !CycA & !CycB) | (p27 & !Rb & !CycB)) CycE = (E2F & !Rb) CycA = ((E2F & !Rb & !Cdc20 & !(Cdh1 & UbcH10)) | (CycA & !Rb & !Cdc20 & !(Cdh1 & UbcH10))) p27 = ((!CycD & !CycE & !CycA & !CycB) | (p27 & !(CycE & CycA) & !CycB & !CycD)) Cdc20 = CycB Cdh1 = ((!CycA & !CycB) | Cdc20 | (p27 & !CycB)) UbcH10 = (!Cdh1 | (Cdh1 & UbcH10 & (Cdc20 | CycA | CycB))) CycB = !Cdc20
Input network file: cellcycle_truncated.txt Rule file: cellcycle_rules.txt Random seed: 1315484034 Max. number of start states: 100000 Max. number of transitions: 1000
Violations of the rule set:
Rule 1:
Attractor matching 1 (using alternative 1): Attractor => Specifications 0 0 0 0 0 0 1 1 1 0 => 0 1 0 0 0 1 0 1 0 0 (State spec. 1) 0 1 1 0 0 1 0 1 1 0 => 0 1 0 0 0 1 0 1 0 0 (State spec. 1) 0 1 0 0 0 1 0 1 0 1 => 0 1 0 0 0 1 0 1 0 0 (State spec. 1) 0 0 0 0 0 0 1 0 0 1 => 0 1 0 0 0 1 0 1 0 0 (State spec. 1)
Violations caused by this matching: State specification 1: Rb != 1 State specification 1: E2F != 0 State specification 1: p27 != 1 State specification 1: Cdc20 != 0 State specification 1: Cdh1 != 1 State specification 1: UbcH10 != 0 State specification 1: CycB != 0
Start states yielding this matching: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 (... further 507 states ...)
Rule 2:
Attractor matching 1 (using alternative 1): Attractor => Specifications 1 0 0 0 0 0 1 1 1 0 => 1 0 0 0 0 0 1 1 1 0 (State spec. 4) 1 0 1 0 0 0 0 1 1 0 => 1 0 1 0 0 0 0 1 1 0 (State spec. 5) 1 0 1 1 0 0 0 1 0 1 => 1 0 1 1 0 0 0 1 0 0 (State spec. 6) => 1 0 1 1 1 0 0 1 0 0 (State spec. 7) => 1 0 0 1 1 0 0 0 0 0 (State spec. 1) => 1 0 0 0 1 0 0 0 1 1 (State spec. 2) 1 0 0 1 1 0 1 0 0 1 => 1 0 0 0 1 0 1 0 1 1 (State spec. 3)
Violations caused by this matching: State specification 3: CycE != 0 State specification 3: UbcH10 != 1 State specification 6: CycB != 0 Start states yielding this matching: 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 (... further 507 states ...) Finished
Here, the program first prints the attractors of the network and their optimal matchings with the rules. We can see that the perturbed network draft has two 4-state attractors that are matched with the 1-state attractor and the 7-state attractor of the original network. Below each matching, the violations are listed. In this example, there are many violations. CANTATA prints the the number of the state specification and the violating gene for each violation. Furthermore, the start states that lead to the matching and cause the violations are printed. By specifying -c 5, we tell the program to print only the first 5 violating states.
Now, let us try to reconstruct the true model from the disrupted network draft using the CANTATA optimization algorithm. This algorithm is started using the main option --optimize:
cantata --optimize -n cellcycle_truncated.txt -r cellcycle_rules.txt -o result.txt -ni 1000
The parameter -o result.txt tells the program to write the results to a file result.txt. We set the number of iterations to 1000 (which is the default value) using -ni 1000. When the optimization process is complete, the file result.txt contains a header summarizing the algorithm's configuration, followed by a list of candidate networks with their three objective values, e.g.
Input network file: cellcycle_truncated.txt
Rule file: cellcycle_rules.txt
Random seed: 1316096208
Population size: 100
Number of offspring: 200
Fract. of injected nets: 0.1
Neg. every i-th offspring: 50
Number of generations: 1000
Number of restarts: 1
Initial mutations: 1
Epsilon: 0.0005
Weights of topology scores: 0.25/0.25/0.5
Max. number of start states: 200
Max. number of transitions: 100
Best candidate networks:
CycD = CycD
Rb = ((!CycA & !CycB & !CycD & !CycE) | (p27 & !CycB & !CycD))
E2F = ((!Rb & !CycA & !CycB) | (p27 & !Rb & !CycB))
CycE = (E2F & !Rb)
CycA = ((E2F & !Rb & !Cdc20 & !(Cdh1 & UbcH10)) | (CycA & !Rb & !Cdc20 & !(Cdh1 & UbcH10)))
p27 = ((!CycD & !CycE & !CycA & !CycB) | (p27 & !(CycE | CycA) & !CycB & !CycD))
Cdc20 = CycB
Cdh1 = ((!CycA & !CycB) | Cdc20 | (p27 & !CycB))
UbcH10 = (!Cdh1 | (Cdh1 & UbcH10 & (Cdc20 | CycA | CycB)))
CycB = (!Cdc20 & !Cdh1)
Fitness: 0 0.24914 0.098 Run: 1 Generation: 419
CycD = CycD
Rb = ((!CycA & !CycB & !CycD & !CycE) | (!CycB & !CycD & p27))
E2F = ((!Rb & !CycA & !CycB) | (p27 & !Rb & !CycB))
CycE = (E2F & !Rb)
CycA = ((!Rb & !Cdc20 & !(Cdh1 & UbcH10) & E2F) | (CycA & !Rb & !Cdc20 & !(Cdh1 & UbcH10)))
p27 = ((!CycD & !CycE & !CycA & !CycB) | (p27 & !(CycE & CycA) & !CycB & !CycD))
Cdc20 = CycB
Cdh1 = ((!CycA & !CycB) | Cdc20 | (p27 & !CycB))
UbcH10 = (!Cdh1 | (Cdh1 & UbcH10 & (Cdc20 | CycA | CycB)))
CycB = (!Cdc20 & !Cdh1)
Fitness: 0 0.24914 0.108 Run: 1 Generation: 641
Finished
In the example printed here, the first fitness value of both resulting candidate network models is 0, which indicates that the networks match the rules perfectly. In this case, the second resulting network is equivalent to the true network, which means that the deleted dependency of CycB on Cdh1 was reconstructed and no further changes were applied. The first candidate also recovers this dependency, but changes an & to a | in the function for p27. Depending on the random initialization of the algorithm, you might get a different result when running the example.
The result file is not readable directly by BoolNet, as it contains multiple candidate network models and additional annotation (the header and objective scores). If the candidate networks should be analyzed in BoolNet, CANTATA can write them to separate network files. For example,
cantata --optimize -n cellcycle_truncated.txt -r cellcycle_rules.txt -o result.txt -on candidate_%d.txt -me 0
writes all candidates that match the rules perfectly to files candidate_0.txt, candidate_1.txt, candidate_2.txt, ..., i.e. the %d marker is replaced by a running number. The parameter -me sets a threshold for the first objective, i.e. only files with a score in the first objective that is less than or equal to this value are written to files. As we set this error to 0 (which is the default value), only candidate network models that match the rule set perfectly are written to files.
This tutorial covers only parts of the options available in CANTATA. A full description of the command line options and the file formats is available here.
[1] Fauré A., Naldi, A., Chaouiya, C., and Thieffry, D. (2006). Dynamical analysis of a generic Boolean model for the control of the mammalian cell cycle. Bioinformatics, 22(14), e124-e131.
Wissenschaftlicher Mitarbeiter (m/w/d)
"Recent Trends and Future Challenges in Learning from Data" has been published with Springer.
Our paper "Permutation-invariant linear classifiers" has been published in Machine Learning.
Our paper "Prediction of resistance to bevacizumab plus FOLFOX in metastatic colorectal cancer-Results of the prospective multicenter PERMAD trial" has been published in PLoS One.
Our paper "Segmentation-based cardiomegaly detection based on semi-supervised estimation of cardiothoracic ratio" has been published in Scientific Reports.
"Prospective study validating a multidimensional treatment decision score predicting the 24-month outcome in untreated patients with clinically isolated syndrome and early relapsing–remitting multiple sclerosis, the ProVal-MS study" has been published in Neurological Research and Practice.
Our paper "GatekeepR: an R shiny application for the identification of nodes with high dynamic impact in boolean networks" has been published online first in Bioinformatics.
Our paper "The Necessity of Interoperability to Uncover the Full Potential of Digital Health Devices" has been published in JMIR Medical Informatics.
"Multicentric pilot study to standardize clinical whole exome sequencing (WES) for cancer patients" has been published in npj Precision Oncology.
Our paper "AMBAR-interactive alteration annotations for molecular tumor boards" has been published in Computer Methods and Programs in Biomedicine.
"A protocol for the use of cloud-based quantum computers for logical network analysis of biological systems" has been published in STAR Protocols.
Our paper "A systems biology approach to define mechanisms, phenotypes, and drivers in PanNETs with a personalized perspective" has been published in npj systems biology and applications.
"Supporting SURgery with GEriatric Co-Management and AI (SURGE-Ahead): A study protocol for the development of a digital geriatrician" has been published in PLoS One.
"Self-Assessment of Having COVID-19 With the Corona Check Mhealth App" has been published in IEEE Journal of Biomedical and Health Informatics.
Our first quantum computing paper "Leveraging quantum computing for dynamic analyses of logical networks in systems biology" has been published in Patterns.