This tutorial will go over how to utilize the curation functions in PSAMM to correct common errors and ensure that metabolic reconstructions are accurate representations of the metabolism of an organism.
For information on how to install PSAMM and the associated requirements, as well how to download the materials required for this tutorial you can reference the Installation and Materials section of the tutorial.
For this part of the tutorial we will be using a modified version of the E. coli core metabolic models that has been used in the other sections of the tutorial. This model has been modified to add in a new pathways for the utilization of mannitol as a carbon source. To access this model and the other files needed you will need to go into the tutorial-part-2 folder located in the psamm-tutorial folder.
(psamm-env) $ cd <PATH>/tutorial-part-2/
Once in this folder you should see two directories. One is the E_coli_yaml folder which contains the version of the model we will use. The other is called additional_files, which contains some files we will use during the tutorial.
Many types of errors can be introduced into metabolic models. Some errors can be introduced during manual editing of model files while others can result from inconsistent representations of the biology of the system. Various features in PSAMM are designed ot help identify and fix these problems to ensure that the reconstruction does not contain these kinds of errors.
Some errors cannot be easily identified without extensive manual inspection of the model data files. These PSAMM functions are designed to help identify these errors and make the correction process easier.
The most basic way to identify possible errors in a model will be through reading the warning messages printed out by PSAMM when any functions are run on a model. These warning messages can be an easy way to identify if something in the reconstruction is not set up the way that was intended. The following are examples of the types of warnings that PSAMM will provide and what kinds of errors they might indicate.
The first type of warning that PSAMM can provide is a waning that there is a compound that is in a reaction but is not defined in the compound information of the model. While PSAMM doesn’t necessarily know if this is an error, these warning can help identify compound ids in the reconstruction that may have typos in them or that need to be defined in the compounds data for the reconstruction. For example in the warning below it would appear that the compound id for ATP had been mistyped and included two extra t’s in it. These types of errors can make reactions in a model inconsistent and may lead to incorrect conclusions from the model if they are not corrected.
WARNING: The compound cpd_atttp was not defined in the list of compounds
The second type of warning will similarly help identify if there was an error introduced in one of the reconstruction’s reactions. This warning will indicate that there is a compound present in the reconstruction that has a compartment that is not defined elsewhere in the model. In the example below a compound was added in a reaction as being in the compartment ‘X’. Since this compartment was not used in the model the reaction involving this instance of the compound would become flux inconsistent.
WARNING: The compartment X was not defined in the list of compartments.
The third and fourth types of warnings can be useful in identify that the exchange file is set up correctly for the reconstruction. These two kinds of errors will help identify if there are compounds that are present in the extracellular compartment but do not have a corresponding exchange reaction in the boundary conditions. This can be problematic for some models that require certain sinks for overproduced compounds in the boundary. The other kind of warning will indicate if there are compounds in the exchange reactions that cannot be utilized by any reactions in the model. This could indicate that a transport reaction is missing from the model or that the compound could be removed from the exchange file.
WARNING: The compound cpd_chitob was in the extracellular compartment but not defined in the medium WARNING: The compound cpd_etoh was defined in the medium but is not in the extracellular compartment
The previous examples of warning messages produced by PSAMM can be helpful as a first step in identifying possible errors in a model but there are various other types of errors that may be present in models that specific PSAMM functions can help identify. The first kind of errors are ones related to the balancing of reactions in model. It is important that metabolic models be balanced in terms of elements, charge, and stoichiometry. PSAMM has three functions available to identify reactions that are not balanced in these properties which can help correct them and lead to more accurate and true representations of metabolism.
PSAMM’s masscheck tool can be used to check if the reactions in the model are stoichiometrically consistent and the compounds that are causing the imbalance. This can be useful when curating the model because it can assist in easily identify missing compounds in reactions. A common problem that can be identified using this tool is a loss of hydrogen atoms during a metabolic reaction. This can occur due to modeling choices or incomplete reaction equations but is generally easy to identify using masscheck.
To report on the compounds that are not balanced use the following masscheck command:
(psamm-env) $ psamm-model masscheck
This command will produce an output like the following:
... accoa_c 1.0 Acetyl-CoA acald_e 1.0 Acetaldehyde acald_c 1.0 Acetaldehyde h_e 0.0 H h_c 0.0 H INFO: Consistent compounds: 73/75
masscheck command will first try to assign a positive mass to all of
the compounds in the model while balancing the masses such that the left-hand
side and right-hand side add up in every model reaction. All the compound
masses are reported, and the compounds that have been assigned a zero value for
the mass are the ones causing imbalances.
In certain cases a metabolic model can contain compounds that represent electrons, photons, or some other artificial compound. These compounds can cause problems with the stoichiometric balance of a reaction because of their unique functions. In order to deal with this an additional property can be added to the compound entry that will designate it as a compound with zero mass. This designation will tell PSAMM to consider these compounds to have no mass during the stoichiometric checking which will prevent them from causing imbalances in the reactions. An example of how to add that property to a compound entry can be seen below:
- id: phot name: Photon zeromass: yes
To report on the specific reactions that may be causing the imbalance, the following command can be used:
(psamm-env) $ psamm-model masscheck --type=reaction ... FRUKIN 1.0 |Fructose[c]| + |ATP[c]| => |D-Fructose-6-phosphate[c]| + |ADP[c]| + |H[c]| INFO: Consistent reactions: 100/101
This check is performed similarly to the compound check. In addition, mass residual values are introduced for each metabolic reaction in the network. These mass residuals are then minimized and any reactions that result in a non-zero mass residual value after minimization are reported as being stoichiometrically inconsistent. A non-zero residual value after minimization tells you that the reaction in question may be unbalanced and missing some mass from it.
Sometimes the residue minimization problem may have multiple solutions. In these cases the residue value may be reallocated among a few connected reactions. In this example the unbalanced reaction is the MANNIDEH reaction:
MANNIDEH |manni[c]| + |nad[c]| => |fru[c]| + |nadh[c]|
In this reaction equation the right hand side is missing a proton. However minimization problem can result in the residue being placed on either the fru_c or the nadh_c compounds in an attempt to balance the reaction. Because nadh_c occurs in thirteen other reactions in the network, the program has already determined that that compound is stoichiometrically consistent. On the other hand fru_c only occurs one other time. Since this compound is less connected the minimization problem will assign the non-zero residual to this compound. This process results in the FRUKIN reaction which contains this compound as being identified as being stoichiometrically inconsistent.
In these cases you will need to manually check the reaction and then use
--checked option for the
masscheck command to force the non-zero
residual to be placed on a different reaction. This will rerun the consistency
check and force the residual to be placed on a different reaction. To do this
we would run the following command.
(psamm-env) $ psamm-model masscheck --type=reaction --checked FRUKIN
Now, the output should report the MANNIDEH reaction and it can be seen that the reaction equation of MANNIDEH is specified incorrectly. It appears that a hydrogen compound was left out of the reaction for MANNIDEH. This would be an easy problem to correct by simply adding in a hydrogen compound to correct the lost atom in the equation.
The stoichiometric consistency checking allows for the easy identification of stoichiometrically inconsistent compounds while providing a more targeted subset of reactions to check to fix the problem. This allows you to quickly identify problematic reactions rather than having to manually go through the whole reaction database in an attempt to find the problem.
In some cases there are reactions that are going to be inherently unbalanced
and might cause problems with using these methods. If you know that this is the
case for a specific reaction they can specify that the reaction be excluded
from the mass check so that the rest of the network can be analyzed. To do this
--exclude option can be used. For example if you wanted to exclude the
reaction FRUKIN from the mass check they could use the following command:
(psamm-env) $ psamm-model masscheck --exclude FRUKIN
This exclude option can be helpful in removing inherently unbalanced reactions
like macromolecule synthesis reations or incomplete reactions that would be
identified as being stoichiometrically inconsistent. It is also possible to
create a file that lists multiple reactions to exclude. Put each reaction
identifier on a separate line in the file and refer to the file be prefixing
the file name with a
(psamm-env) $ psamm-model masscheck --exclude @excluded_reactions.txt
Before we fix the model with the correction to the MANNIDEH reaction, let us first check the model for formula inconsistencies to show how this can also be used in conjunction with mass checking and other methods to correct model inconsistencies.
Formula Consistency Checking¶
Formula checking will check that each reaction in the model is balanced with respect to the chemical formulas of each compound. To check the model for formula consistencies run the formula check command:
(psamm-env) $ psamm-model formulacheck
The output should appear as follows:
INFO: Model: Ecoli_core_model INFO: Model Git version: 9812080 MANNIDEH C27H40N7O20P2 C27H39N7O20P2 H Biomass_Ecoli_core_w_GAM C1088.0232H1471.1810N446.7617O1236.7018P240.5298S3.7478 C1045.4677H1395.2089N441.3089O1189.0281P236.8511S3.7478 C42.5555H75.9721N5.4528O47.6737P3.6787 INFO: Unbalanced reactions: 2/80 INFO: Unchecked reactions due to missing formula: 0/80
In this case two reactions are identified in the model as being unbalanced. The biomass objective function, Biomass_Ecoli_core_w_GAM, and the reaction that was previously identified through masscheck as being unbalanced, MANNIDEH. In the case of the objective function this is imbalanced due to the formulation of the objective function. The reaction functions as a sink for the compounds required for growth and only outputs depleted energy compounds. This leads to it being inherently formula imbalanced but it is a necessary feature of the model. The other reaction is MANNIDEH. It can be seen that the total number of atoms on each side does not match up. PSAMM also outputs what atoms would be needed to balance the reaction on both sides. In this case there is a missing hydrogen atom on the right side of the equation. This can be easily rectified by adding in the missing hydrogen. To do this correction in this tutorial, you can copy a fixed version of the mannitol pathway from the additional files folder using the following command:
(psamm-env) $ cp ../additional_files/mannitol_pathway_v2.yaml mannitol_pathway.yaml
Once that problem with the new reaction is fixed the model will pass both the formula check and mass check.
Charge Consistency Checking¶
The charge consistency function is similar to the formula consistency function but instead of using the chemical formulas for the compounds, PSAMM will use the assigned charges that are designated in the compounds file and check that these charges are balanced on both sides of the reaction.
To run a charge consistency check on the model use the chargecheck command:
(psamm-env) $ psamm-model chargecheck
This E. coli SBML model does not contain charge information for the compounds. A sample output is provided below to show what the results would look like for a charge imbalanced model. The output from the charge check will display any reactions that are charge imbalanced and show what the imbalance is and then show the reaction equation. This can be used to quickly check for any missed inconsistencies and identify reactions and compounds that should be looked at more closely to confirm their correctness.
... rxn12510 1.0 |ATP[c]| + |Pantothenate[c]| => |4-phosphopantothenate[c]| + |H+[c]| + |ADP[c]| rxn12825 4.0 |hemeO[c]| + |H2O[c]| => |Heme[c]| + (4) |H+[c]| rxn13643 1.0 |ADP-glucose[c]| => |Glycogen[c]| + |H+[c]| + |ADP[c]| rxn13710 6.0 (5) |D-Glucose[c]| + (4) |ATP[c]| => |Glycogen[c]| + (4) |H+[c]| + (4) |Phosphate[c]| + (4) |H2O[c]| + |ADP[c]| INFO: Unbalanced reactions: 94/1093 INFO: Unchecked reactions due to missing charge: 0/1093
Flux Consistency Checking¶
The flux consistency checking function can be used to identify reactions that cannot carry flux in the model. This tool can be used as a curation tool as well as an analysis tool. In this tutorial it will be highlighted for the curation aspects and later its use in flux analysis will be demonstrated.
To run a flux consistency check on the model use the
(psamm-env) $ psamm-model fluxcheck --unrestricted
The unrestricted option with the command will tell PSAMM to remove any limits on the exchange reactions. This will tell you which reactions in the model can carry flux if the model is given all compounds in the media freely. This can be helpful for identifying which reactions may not be linked to other parts of the metabolism and can be helpful in identifying gaps in the model. In this case it can be seen that no reactions were identified as being inconsistent.
In some situations there are pathways that might be
modeled but not necessarily connected to the other aspects of metabolism.
A common occurrence of this is with vitamin biosynthesis pathways that are
not incorporated into the biomass in the model.
fluxcheck will identify
these as being flux inconsistent but the modeler will need to identify if this
is due to incomplete information on the pathways or if it is due to some
error in the formulation of the reactions.
PSAMM will tell you how many exchange reactions cannot be used as well as how many internal model reactions cannot carry flux. PSAMM will also list the reactions and the equations for the reactions to make curation of these reactions easier.
fluxcheck command was used with the –unrestricted option which
allowed the exchange reactions to all be active. This command can also be
used to see what reactions cannot carry flux when specific media are
supplied. To run this command on the network with the media that is
specified in the media file run the following command:
(psamm-env) $ psamm-model fluxcheck INFO: Model: Ecoli_core_model INFO: Model Git version: 9812080 INFO: Using flux bounds to determine consistency. ... EX_fru_e |D-Fructose[e]| <=> EX_fum_e |Fumarate[e]| <=> EX_glc_e |D-Glucose[e]| <=> EX_gln_L_e |L-Glutamine[e]| <=> EX_mal_L_e |L-Malate[e]| <=> FRUpts2 |D-Fructose[e]| + |Phosphoenolpyruvate[c]| => |D-Fructose-6-phosphate[c]| + |Pyruvate[c]| FUMt2_2 (2) |H[e]| + |Fumarate[e]| => (2) |H[c]| + |Fumarate[c]| GLCpts |Phosphoenolpyruvate[c]| + |D-Glucose[e]| => |Pyruvate[c]| + |D-Glucose-6-phosphate[c]| GLNabc |ATP[c]| + |L-Glutamine[e]| + |H2O[c]| => |L-Glutamine[c]| + |ADP[c]| + |H[c]| + |Phosphate[c]| MALt2_2 |L-Malate[e]| + (2) |H[e]| => |L-Malate[c]| + (2) |H[c]| INFO: Model has 5/80 inconsistent internal reactions (0 disabled by user) INFO: Model has 5/21 inconsistent exchange reactions (0 disabled by user)
In this case it can be seen that there are various exchange reactions
blocked as well as various internal reactions related to other carbon
metabolic pathways. The current model should only be supplying mannitol
as a carbon source and this would mean that these other carbon pathways
would be blocked in this condition. In this way, you can use the
command to see what reactions are specific to certain metabolic pathways and
In addition to inconsistencies found within individual reactions there can also be
global inconsistencies for the reactions within a metabolic network. These include
metabolites that can be produced but not consumed, ones that can be consumed by reactions
but are not produced, and reactions that cannot carry flux in a model. PSAMM includes various
functions for the identification of these features in a network including the functions
fluxcheck. Additionally the functions
fastgapfill can be used to help
fill these gaps that are present through the introduction of additional reactions into the network.
Gapcheck in PSAMM¶
gapcheck function in PSAMM can be used to identify dead end metabolites in a metabolic network.
These dead end metabolites are compounds in the metabolic model that can either be produced but not consumed
or ones that can be consumed but not produced. Reactions that contain these compounds cannot carry flux within
a model and are often the result of knowledge gaps in our understanding of metabolic networks.
gapcheck function allows the use of three methods for the identification of these dead end metabolites
within a metabolic network. These are the
prodcheck method is the most straightforward of these methods and can be used to identify any
compounds that cannot be produced in the metabolic network. It will iterate through the reactions in a network
and maximize each one. If the reaction can carry a flux then the metabolites involved in the reaction
are not considered to be blocked.
To use this function the following command can be run:
(psamm-env) $ psamm-model gapcheck --method prodcheck
The function will produce output like the following that lists out any metabolites in the model that cannot be produced in this condition:
fru[e] D-Fructose fum[e] Fumarate glc_D[e] D-Glucose gln_L[e] L-Glutamine mal_L[e] L-Malate INFO: Blocked compounds: 5
This result indicates that the following metabolites currently cannot be produced in the model. This only tells part of the story though, as this function was run with the defined media that was set for the model. As a result there are gaps identified like, ‘D-Glucose’, that will not be considered gaps in other conditions. To do a global check using this function on the model without restrictions on the media the following command can be used:
(psamm-env) $ psamm-model gapcheck --method prodcheck --unrestricted-exchange
The unrestricted tag in this function will temporarily set all of the exchange reaction bounds to be -1000 to 1000 allowing all nutrients to be either taken up or produced. Gap-checking in this condition will allow for the identification of gaps that are not media dependent and may instead be the result of incomplete pathways and knowledge gaps.
The second method implemented in the
gapcheck function is the
sinkcheck method. This method is similar to
prodcheck but is implemented in a way where the flux through each introduced sink for a compound is maximized.
This ensures that the metabolite can be produced in excess from the network for it to not be considered a dead end
(psamm-env) $ psamm-model gapcheck --method sinkcheck --unrestricted-exchange
The last method implemented in the
gapcheck function is the
gapfind method. This method is an implementation
of a previously published method to identify gaps in metabolic networks [Kumar07]. This method will use
a network based optimization to identify metabolites with no production pathways present.
(psamm-env) $ psamm-model gapcheck --method gapfind --unrestricted-exchange
These methods included in the
gapcheck function can be used to identify various kinds of
‘gaps’ in a metabolic model network. PSAMM also includes three functions for filling these gaps
through the addition of artificial reactions or reactions from a supplied database. The
completepath can be used to perform these gapfilling procedures
during the process of generating and curating a model.
gapfill function in PSAMM can be used to apply a GapFill algorithm based on [Kumar07] to a metabolic model
to search for and identify reactions that can be added into a model to unblock the production of a specific
compound or set of compounds. To provide an example of how to utilize this
gapfill function a version of
the E. coli core model has been provided in the tutorial-part-2/Gapfilling_Model/ directory. In this directory
is the E. coli core model with a small additional, incomplete pathway, added that contains the following reactions:
- id: rxn1 equation: succ_c[c] => a[c] - id: rxn3 equation: b[c] => c[c] + d[c]
This small additional pathway converts succinate to an artificial compound ‘a’. The other reaction can convert compound ‘b’ to ‘c’ and ‘d’. There is no reaction to convert ‘a’ to ‘b’ though, and this can be considered a metabolic gap. In an additional reaction database, but not included in the model itself, is an additional reaction:
- id: rxn2 equation: a[c] => b[c]
This reaction, if added would be capable of unblocking the production of ‘c’ or ‘d’, by allowing for the conversion of compound ‘a’ to ‘b’. In most cases when performing gap-filling on a model a larger database of non-model reactions could be used. For this test case the production of compound ‘d[c]’ could be unblocked by running the following command:
(psamm-env) psamm-model gapfill --compound d[c]
This would produce an output that first lists all of the reactions from the original metabolic model. Then lists the included gap-filling reactions with their associated penalty values. And lastly will list any reactions where the gap-filling result suggests that the flux bounds of the reaction be changed. A sample of the reaction is shown below:
.... TPI Model 0 Dihydroxyacetone-phosphate[c] <=> Glyceraldehyde-3-phosphate[c] rxn1 Model 0 Succinate[c] => a[c] rxn3 Model 0 b[c] => c[c] + d[c] rxn2 Add 1 a[c] => b[c]
Some additional options can be used to refine the gap-filling. The first of these options is
option that can be added to the command. If this option is used then the gap-filling will be performed with no
implicit sinks for compounds, meaning that all compounds produced need to be consumed by other reactions in the
metabolic model. By default, if this option is not used with the command, then implicit sinks are added for all
compounds in the model meaning that any compound that is produced in excess can be removed through the added sinks.
The other way to refine the gap-filling procedure is through defining specific penalty values for the addition of
reactions from different sources. Penalties can be set for specific reactions in a gap-filling database
through a tab separated file provided in the command using the
--penalty option. Additionally penalty values
for all database reactions can be set using the
--db-penalty option followed by a penalty value. Similarly
penalty values can be assigned to added transport reactions using the
--tp-penalty option and to added
exchange reactions using the
--ex-penalty option. An example of a command that applies these penalties
to a gap-filling simulation would be like follows:
(psamm-env) $ psamm-model gapfill --compound d[c] --ex-penalty 100 --tp-penalty 10 --db-penalty 1
gapfill function in PSAMM can be used through the model construction process to help identify potential
new reactions to add to a model and to explore how metabolic gaps effect the capabilities of a metabolic
fastgapfill function in PSAMM is different gap-filling method that uses the FastGapFill algorithm
to attempt to generate a gap-filled model that is entirely flux consistent [Thiele14]. The implementation
of this algorithm in PSAMM can be utilized for unblocking an entire metabolic model or for unblocking
specific reactions in a network. Often times unblocking all of the reactions in a model at the same time
will not produce the most meaningful and easy to understand results so only performing this function on a
subset of reactions is preferable. To do this the
--subset option can be used to provide a file that
contains a list of reactions to unblock. In this example that list would look like this:
This file can be provided to the command to unblock the small artificial pathway that was added to the E. coli core model:
(psamm-env) $ psamm-model fastgapfill --subset subset.tsv
In this case the output from this command will show the following:
.... TPI Model 0 Dihydroxyacetone-phosphate[c] <=> Glyceraldehyde-3-phosphate[c] rxn1 Model 0 Succinate[c] => a[c] rxn3 Model 0 b[c] => c[c] + d[c] EX_c[e] Add 1 c[e] <=> EX_d[e] Add 1 d[e] <=> EX_succ_c[e] Add 1 Succinate[e] <=> TP_c[c]_c[e] Add 1 c[c] <=> c[e] TP_d[c]_d[e] Add 1 d[c] <=> d[e] TP_succ_c[c]_succ_c[e] Add 1 Succinate[c] <=> Succinate[e] rxn2 Add 1 a[c] => b[c]
The output will first list the model reactions which are labeled with the ‘Model’ tag in the second column
of the output. PSAMM will list out any artificial exchange and transporters, as well as any gap reactions
included from the larger database. These will be labeled with the Add tag in the second column. When compared
gapfill results from the previous section it can be seen that the
fastgapfill result suggests
some artificial transporters and exchange reactions for certain compounds. This is due to this method
trying to find a flux consistent gap-filling solution.
Penalty values can be assigned for different types of reactions in the same way that they are in the
--ex-penalty for artificial exchange reactions,
--tp-penalty for artificial transporters,
--db-penalty for new database reactions, and penalties on specific reactions through a penalty file provided
psamm-model includes a search function that can be used to search the model
information for specific compounds or reactions. To do this the search function
can be used. This can be used for various search methods. For example to search
for the compound named fructose the following command can be used:
(psamm-env) $ psamm-model search compound --name 'Fructose' INFO: Model: Ecoli_core_model INFO: Model Git version: db22229 id: fru_c formula: C6H12O6 name: Fructose Defined in ./compounds.yaml:?:?
To do the same search but instead use the compound ID the following command can be used:
(psamm-env) $ psamm-model search compound --id 'fru_c'
These searches will result in a printout of the relevant information contained within the model about these compounds. In a similar way reactions can also be searched. For example to search for a reaction by a specific ID the following command can be used:
(psamm-env) $ psamm-model search reaction --id 'FRUKIN'
Or to search for all reactions that include a specific compound the following command can be used:
(psamm-env) $ psamm-model search reaction --compound 'manni[c]'
An additional searching function called
duplicatescheck is also included in PSAMM.
This function will search through a model and compare all of the reactions in the network
to each other. Any reactions that have all of the same metabolites consumed and produced
will then be reported. This can be a helpful function to use if there a multiple people
working on the construction of a model as it allows for an automated checking that
two individuals did not add the same reaction to the reconstruction. The
function can be run through the following command:
(psamm-env) $ psamm-model duplicatescheck
The additional tags
--compare-stoichiometry can be added to
the command to take into account the reaction directionality and metabolite stoichiometry
when comparing two different reactions.