Supplementing your model with other data¶
This tutorial will go over using supplementary data from outside sources, such
as the utilization of gene expression data to subset a model into “active”
reactions, the use of a template model to generate a draft model based on
reciprocal best hits, or the application of thermodynamic properties to further
constrain modeling results through the gimme
, psammotate
, and TMFA
.
Materials¶
The materials used in the part of the tutorial can be found in the tutorial-part-X
directory in the psamm-tutorial repository. The previously used E. coli model will
be reused here in order to demonstrate the use of the psammotate
and gimme
functions,
which will be located in the tutorial-part-6/
directory. There
will also be a tutorial-part-6/additional-data
directory that contain the required
materials for each of the following functions.
(psamm-env) $ cd <PATH>/tutorial-part-6/E_coli_yaml
Subsetting a Metabolic Models Using Expression Data with Gimme¶
This tutorial will go over using the gimme
function to subset a metabolic
model and create a new metabolic model based on gene expression data.
Constructing a Transcriptome File¶
The Gimme algorithm functionally subsets a model based on expression data provided
using the --transcriptome-file
argument. This file should contain two columns:
one with each gene within a model and one with a measure of the expression of the
gene, such as transcripts per million (TPM) or Reads Per Kilobase of transcript,
per Million mapped reads (RPKM). For the purposes of this tutorial, expression
data has been mocked up by randomly generating expression numbers for genes in
the E. coli model.
Basic use of the Gimme
command¶
To run Gimme, you must first specify the directory of the two column transcriptome file and specify a threshold value that defines at what threshold below which genes will be dropped from the new model. This produces a new model that drops any reactions coded by genes that do not meet this threshold. Any reactions associated with multiple genes must meet the threshold across all. Running the command as follows will print out a list of the internal reactions and a list of the external reaction that meet the thresholds.
(psamm-env) $ psamm-model gimme --transcriptome-file additional-data/transcriptome.tsv --expression-threshold 100
As you increase the expression threshold, you will see fewer reactions retained within the model, which is shown below by using the word count function to count the number of lines returned by each iteration of gimme:
(psamm-env) $ psamm-model gimme --transcriptome-file additional-data/transcriptome.tsv --expression-threshold 100 | wc -l
86
(psamm-env) $ psamm-model gimme --transcriptome-file additional-data/transcriptome.tsv --expression-threshold 500 | wc -l
74
This application of the gimme algorithm sets the objective biomass to 100%. That
is to say that if a reaction is below the required threshold but is required to
maintain biomass, it is kept. You can lower the required biomass with the
--biomass-threshold
flag, which will subsequently drop increasing numbers of
reactions as they are no longer necessary to meet the biomass threshold. This is
shown below by using the word count function to count the number of lines returned
by each iteration of gimme:
(psamm-env) $ psamm-model gimme --transcriptome-file additional-data/transcriptome.tsv --expression-threshold 100 --biomass-threshold 0.93757758084 | wc -l # Maximum for this model
86
(psamm-env) $ psamm-model gimme --transcriptome-file additional-data/transcriptome.tsv --expression-threshold 100 --biomass-threshold 0 | wc -l # No Biomass Threshold
82
In addition to simply writing out a list of reactions that satisfy the expression
and biomass thresholds, by specifying --export-model
, you can redirect the
output to create an entirely new model.
(psamm-env) $ mkdir gimme_out
(psamm-env) $ psamm-model gimme --transcriptome-file additional-data/transcriptome.tsv --expression-threshold 100 --export-model ./gimme_out/
Generating a Draft Model From a Template Using Psammotate¶
This tutorial will go over using the psammotate
function to generate draft
models based on a reciprocal best hits file between the two models that
provides gene associations based on mapping the genes from a reference file
onto the genes of a draft model.
Materials¶
The materials used in this part of the tutorial can be found in the tutorial-part-7
directory in the psamm-tutorial repository. This directory contains a file called
gene_associations.tsv
which contains a two column reciprocal best hits mapping,
mapping the genes in the E. coli model model to mock genes from a mock organism
(the mock organism gene names are formatted as “imaginary{Integer}” and have been
randomly generated).
Format of the Reciprocal Best Hits File¶
The psammotate program requires a reciprocal best hits file. This is essentially a file that must have two columns (among other potential information): (1) a list of genes from the organism you are drafting a model for (2) genes from the reference organism that are mapped to (i.e. share a row with)
genes from the draft organism based on some annotation
(3) using ‘-’ as the no mapping indicator This will allow you to create a model based on the curations of the reference organism and the annotations of the draft organism based on the gene associations. These columns need not be in any particular location within a table, as you will specify the index of the columns for the target and template genes.
If you do not have a gene association for every gene, the genes from the template model are retained by default. these lines may be simply left blank.
Basic use of the psammotate
command¶
To run psammotate
, you must specify the file containing the gene mapping between the
template and the target model. Additionally, you must specify which columns contain
the genes from the template model and which contain the genes from the target,
or draft model, genes. This will by default generate a new reactions file called
homolo_reactions.yaml
in the current directory, that is formatted as a
psamm reactions file and contains the new gene mappings from the draft model.
(psamm-env) $ psamm-model psammotate --rbh additional-data/gene_associations.tsv --template 1 --target 2
The output file, homolo_reactions.yaml
contains all of the reactions that
were mapped with new gene annotations. Remember that if there is not gene
annotation in gene_associations.tsv
for a reference gene, it is kept by
default with the gene name of “None”. This can also be seen in the standard output:
ReactionID Original_Genes Translated_Genes In_final_model
ACALD b0351 or b1241 imaginary7180 or imaginary2425 True
ACALDt s0001 imaginary1481 True
ACKr b3115 or b2296 or b1849 imaginary7287 or imaginary956 or imaginary1755 True
ACONTa b0118 or b1276 imaginary4907 or imaginary2569 True
ACONTb b0118 or b1276 imaginary4907 or imaginary2569 True
ACt2r None None True
In this output, the first column is the reaction name, the second is the template
gene name, the third is the target gene name, and the last column indicates if the
gene was imported into homolo_reactions.yaml
(True) or dropped from the model
(False). If the reactions not mapped to should be dropped, use the –ignore-na
option (Note, we cannot overwrite homolo_reactions.yaml, so lets remove it first):
(psamm-env) $ rm homolo_reactions.yaml
(psamm-env) $ psamm-model psammotate --rbh additional-data/gene_associations.tsv --template 1 --target 2 --ignore-na
Note the difference in the output, where the reaction ACt2r is now false and has not been imported into the new draft model:
ReactionID Original_Genes Translated_Genes In_final_model
ACALD b0351 or b1241 imaginary7180 or imaginary2425 True
ACALDt s0001 imaginary1481 True
ACKr b3115 or b2296 or b1849 imaginary7287 or imaginary956 or imaginary1755 True
ACONTa b0118 or b1276 imaginary4907 or imaginary2569 True
ACONTb b0118 or b1276 imaginary4907 or imaginary2569 True
ACt2r None None False
Output options¶
There are several options for output file names/directories besides the default
as well. If you would prefer to not use homolo_reactions.yaml, you can specify your
own prefix using --output
, as shown below:
(psamm-env) $ psamm-model psammotate --rbh ../psammotate/gene_associations.tsv --template 1 --target 2 --output draft_reactions
Which will output the draft_reactions.yaml
file instead of the homolo_reactions.yaml
file.
Thermodynamics-Based Metabolic Flux Analysis¶
The tmfa
function in psamm is an implementation of the TMFA algorithm as
detailed in [Henry07]. This method incorporates additional thermodynamic
constraints into the flux balance framework, allowing for the simulation
of growth, while accounting for the thermodynamic feasibility of the
metabolic reactions. Like the other two methods in this part of the tutorial,
TMFA requires additional data to be prepared beforehand. For details on all
of these input files, see the command line interface section related to the
TMFA command TMFA (tmfa).
Note that TMFA is only compartible with the Gurobi and CPLEX LP solvers out of the four supported by PSAMM.
For this tutorial example tmfa
data has been provided based based on the
available data from another E. coli model in [Henry07]. Since multiple
files are required to run tmfa
, the command has been set up to use a
central config.yaml file. This file is then used to specify the relative
paths (from where you are running the program) to the various input files.
This config file is specified through providing the path to the file through
the --config
command line argument.
(psamm-env) $ psamm-model tmfa --config ./config.yaml ....
This option allows for tmfa
to be set up and run without having to specify
paths to multiple files on the command line every time.
Basic TMFA Input Options¶
The tmfa
command contains a few options that can be specified through the
command line to designate things like biomass thresholds and the temperature
that the simulation will be run at.
The first of these options is the --threshold
option. This can be used to
specify a value that the biomass flux will be fixed at during the tmfa
simulations. For example to run a tmfa
simulation where the biomass flux is
fixed at 0.5, you can use the following command:
(psamm-env) $ psamm-model tmfa --config ./config.yaml --threshold 0.1 simulation
The next option that can be specified is the temperature that will be used for the simulation. Since temperature is a component of the calculation of the gibbs free energy of reactions, this parameter can affect the thermodynamics in the model. The temperature is given in Celsius.
(psamm-env) $ psamm-model tmfa --config ./config.yaml --threshold 0.1 --temp 15 simulation
The next option is related to the use of the error estimates in for the gibbs
free energy of reaction values. For most prediction methods there will be some
uncertainty in the estimation of the gibbs free energy values. This uncertainty
can be incopertated into the tmfa
simulation directly through using the
--err
option.
(psamm-env) $ psamm-model tmfa --config ./config.yaml --err simulation
The last general option for the tmfa
command is the --hamilton
option.
This option allows the user to run tmfa
with a slightly modified version of the
algorithm that makes all reactions reversible, and only constraints the
reversibility based on thermodynamics. This method is further detailed in the
paper [Hamilton13]. To run the tmfa
command using this option you can use
the following command:
(psamm-env) $ psamm-model tmfa --config ./config.yaml --hamilton simulation
The TMFA command then contains two sub-commands that can be used for debugging,
util
, and for running simulations, simulation
. To access these sub-commands
you can run the tmfa
command like so:
(psamm-env) $ psamm-model tmfa --config ./config.yaml util ....
or
(psamm-env) $ psamm-model tmfa --config ./config.yaml simulation ....
TMFA util functions¶
The TMFA Utility functions can be accessed through the util
sub-command of
the tmfa
command. The command contains two utility functions, one is to
generate a template configuration file that can be used when setting up new
models to run tmfa
. The option --generate-config
can be used to generate a
template configuration file called example-config.yaml.
(psamm-env) $ psamm-model tmfa --config ./config.yaml util --generate-config
The other utility function that is provided is the --random-addition
function. This function can be used to randomly add thermodynamic constraints
to the reactions in the model, and test if the biomass falls below a set
threshold. This process can be used to test out the Gibbs free energy
constraints for a model that is not producing biomass, to see what thermodynamic
constraints might be causing problems.
(psamm-env) $ psamm-model tmfa --config ./config.yaml --threshold 0.1 util --random-addition
Running Growth Simulations with TMFA¶
tmfa
simulations can be run in two ways. By default the simulation will be run
and will produce Flux Variability-like results that provide upper and lower
bounds for the variables in the tmfa
problem. This type of simulation can be
run as follows to simulation growth at maximum biomass production with applying
thermodynamic constraints:
(psamm-env) $ psamm-model tmfa --config ./config.yaml simulation
This command will produce output like the following showing the variable type, name of the variable, the lower bound, and then upper bound:
Flux CS 0.18355092011356122 0.18355092011362384
DGR CS -42.528916818320724 -1.0000000010279564e-06
Zi CS 1.0 1.0
....
In this example the variables associated with the Citrate Synthase reaction (CS), are shown. This simulation shows that the model does use the citrate synthase reaction and that this reaction is thermodynamically feasible (as indicated by the negative gibbs free energy value). The ‘Flux’ range shows the possible upper and lower bound of the flux values for the reactions in the simulation, the ‘DGR’ range shows the possible range of gibbs free energy of reaction values for the reaction, and lastly the ‘Zi’ variable shows the binary constraint variable that is used to constrain reactions to be on or off based on the thermodynamics. For the ‘Zi’ variable a 1 indicates that the reaction can carry flux, while 0 indicates that it cannot.
Further down the results, after the reactions have been printed out, the compound concentrations will be printed out. Similarly they show the compound ID and the lower and upper bounds of the compound concentrations. The concentrations are printed as molar values. Due to the small size of the E. coli core model, most of the metabolites are largely unconstrained and able to vary between the lower and upper bounds. But a few of the central metabolites do end up being constrained. For example in this model, many of the metabolites in the central carbon metabolism are constrained to some extent.
CONC icit_c[c] 9.999e-06 0.007
CONC co2_c[c] 9.999e-06 9.999e-05
CONC 2pg_c[c] 9.999e-06 0.0127
Some additional tmfa
simulation options are provided in addition to the
default FVA-like option. The first of these options runs a single FBA-like
tmfa
simulation that just provides one solution to the problem without
simulating the variability of the variables. This type of simulation can be run
with the following command:
(psamm-env) $ psamm-model tmfa --config config.yaml simulation --single-solution fba
or
psamm-model tmfa --config config.yaml simulation --single-solution l1min
These commands will produce just single values for each reactions or compound instead of providing a range of values. These functions are useful for testing and debugging, but will miss some of the inherent variability in the simulation.
Flux NH4t_forward 0.9276730532906614
Flux NH4t_reverse 0.0
Flux O2t_forward 0.0
Flux O2t_reverse 0.0
Flux PDH 0.0
Flux PFK 9.830773843510082
Flux PFL 18.235468131213306
The last type of simulation function provided in the tmfa
command is
the randomsparse functions. These commands work in the same way as the
randomsparse
function in PSAMM and can be used to either do deletions
based on genes or based on reactions.
(psamm-model) $ psamm-model tmfa --config config.yaml simulation --randomsparse
or
(psamm-model) $ psamm-model tmfa --config config.yaml simulation --randomsparse_genes
Overall the tmfa
function can be used to explore a variety of metabolic
features and provide a way to further explore the relationships between
metabolic reactions through their thermodynamics.