Model file format

The primary model definition file is the model.yaml file. When creating a new model this file should be placed in a new directory. The following can be used as a template:

---
name: Escherichia coli test model
biomass: Biomass
extracellular: e

compartments:
  - id: e
    name: Extracellular
  - id: p
    name: Periplasm
    adjacent_to: [e, c]
  - id: c
    name: Cytosol
    adjacent_to: p

compounds:
  - include: ../path/to/ModelSEED_cpds.tsv
    format: modelseed

reactions:
  - include: reactions/reactions.tsv
  - include: reactions/biomass.yaml

exchange:
  - include: exchange.yaml
limits:
  - include: limits.yaml

model:
  - include: model_def.tsv

Biomass

The optional biomass key specifies the default reaction to use for various analyses (e.g. FBA, FVA, etc.)

Extracellular Compartment

The optional extracellular key specifies the default string for the extracellular compartment on compounds. If this option is not specified it will be assumed that the extracellular compartment is called e.

Default Compartment

The optional default_compartment key specifies the default compartment that is used if a compound in a reaction does not explicitly specify a compartment. For example, the reaction |x[e]| + |atp| => |x| + |adp| + |pi| does not specify a compartment on four of the compounds so those four would automatically be presumed to be in the default compartment (or c if no default compartment is specified).

Compartments

The compartments key is a list of compartment information for the model. Compartments must always have an id but can also have additional user defined properties. The adjacent_to property is used to define the boundaries between compartments. Notice that the adjacency can be specified as a single compartment or a list of compartments. Note that it is sufficient to specify that p is adjacent to e. It is then inferred that e is adjacent to p so it is optional to specify both directions of adjacency.

Compounds

The optional compounds key is a list of compound information. For some of the model checks the compound information is required. This section can also include external files that contain compound information. If the file is a ModelSEED compound table, the format key must be set to modelseed. If the file is a YAML file, the file should have a .yaml extension. The following fragment is an example of a YAML formatted compound file:

- id: ac
  name: Acetate
  formula: C2H3O2
  charge: -1

- id: acac
  name: Acetoacetate
  # ...

The following compound properties are recognized:

Property Type Description
id string Compound ID (required)
name string Name of compound
formula string Compound formula (e.g. C6H12O6)
charge integer Formal charge
kegg string KEGG ID (reference to compound in KEGG database)
cas string CAS number

Reactions

The key reactions specifies a list of files that will be used to define the reactions in the model. The reaction files can be formatted as either tab-separated (.tsv) or YAML files (.yaml). The TSV file may be adequate for most of the reaction definitions while certain particularly complex reactions (e.g. biomass reaction) may be specified using a YAML file.

The TSV format is a tab-separated table where each row contains the reaction ID in addition to other data columns. The header must specify the type of each column. The column equation will be parsed as ModelSEED reaction equations.

id      equation
ADE2t   |ade[e]| + |h[e]| <=> |ade[c]| + |ade[c]|
ADK1    |amp| + |atp| <=> (2) |adp|

Any .yaml or .yml file in the reactions specification will be parsed as a reaction definition file but in YAML format. This format is particularly useful for very long reactions containing many different compounds (e.g. the biomass reaction). It also allows adding more annotations because of the structured nature of the YAML format. The following snippet is an example of a YAML reaction file:

# Biomass composition
- id: Biomass
  equation:
    reversible: no
    left:
      - id: cpd00032 # Oxaloacetate
        value: 1
      - id: cpd00022 # Acetyl-CoA
        value: 1
      - id: cpd00035 # L-Alanine
        value: 0.02
      # ...
    right:
      - id: Biomass
        value: 1
      # ...

Reactions in YAML files can also be defined using ModelSEED formatted reaction equations. The | is a special character in YAML so the reaction equations have to be quoted with ' or, alternatively, using the > for a multiline quote:

- id: ADE2t
  equation: >
    |ade[e]| + |h[e]| <=>
    |ade[c]| + |h[c]|
- id: ADK1
  equation: '|amp| + |atp| <=> (2) |adp|'

The following reaction properties are recognized:

Property Type Description
id string Reaction ID (required)
name string Name of reaction
equation string or dict Reaction equation formula
ec string EC number
genes string Gene association rule

The genes property can be used to specifiy which genes enable a reaction. Complex gene association rules can be used when a reaction is enabled by a group of genes or when multiple genes can independently enable a reaction:

- id: ADK1
  equation: '|amp| + |atp| <=> (2) |adp|'
  genes: gene_0001 or (gene_0002 and gene_0003)

Exchange compounds

The exchange key provides a way of defining the compounds that can enter and exit the model system (the boundary conditions). This includes compounds that can enter the system (the medium) and compounds that are allowed to exit the system, like metabolic byproducts. In most cases, all compounds that occur in the extracellular space should also be defined in the exchange compounds (with lower limit of zero) so that they are allowed to leave the model system, and PSAMM will generate a warning if this is not the case for some compounds. Compounds that are allowed to be taken up (the medium) should in addition be specified with a negative lower limit indicating the maximum allowed uptake.

The following fragment is an example of the exchange.yaml file:

compartment: e  # default compartment
compounds:
  - id: ac      # Acetate
  - id: co2
  - id: o2
  - id: glcD    # D-Glucose with uptake limit of 10
    lower: -10
  # ...

When an exchange file is specified, the corresponding exchange reactions are automatically added. For example, if the compounds o2 in compartment e is in the exchange file, the exchange reaction EX_o2_e is added to the model. The desired ID for the exchange reaction can be set explicitly using the reaction attribute.

The exchange set can also be specified using a TSV-file as the following fragment shows. The second column specifies the compartment while third and fourth columns specify the lower and upper bounds, respectively. Both can be omitted or specified as - to use the default flux bounds:

# Acetate exchange with default lower and upper bounds
ac      e
# D-Glucose with uptake limit of 10
glcD    e       -10
# CO2 exchange with production limit of 50 and default uptake limit
co2     e       -       50

Multiple exchange files can be included from the main exchange.yaml file, and these will be combined to form the final set of exchange reactions used for the simulations.

Reaction flux limits

The optional limits property lists the files that are to be combined and applied as the reaction flux limits. This can be used to limit certain reactions in the model. The following fragment is an example of a limits file in the YAML format. The lower and upper specifies the flux bounds and they are both optional. The fixed key is a shortcut to set both lower and upper to its value:

- reaction: ADK1
  upper: 10
- reaction: ADE2t
  lower: -50
  upper: 50
- reaction: DHPTDNRN
  fixed: 0

The limits can also be specified using a TSV-file as shown in the following fragment:

# Make ADE2t irreversible by imposing a lower bound of 0
ADE2t    0
# Only allow limited flux on ADK1
ADK1     -10    10

Model Definition

The model property can be used to include a table file that specifies a subset of reactions that are used in the model. If no model definition file is given then all the reactions in the model will be used:

ACALD
ACALDt
ACKr
...