scadnano2contact.py

This script takes a scadnano JSON design file of a DNA origami and creates a contact map. The scaffold is assumed to be non-circular, i.e. a linear strand, in the scadnano design.

Note

The scaffold strand must be marked in the scadnano design before saving the scadnano .sc design file.

Usage:

python3 scadnano2contact.py origami_name

The above command reads scadnano file _assets/origami_name.sc and outputs a contact map file _assets/origami_name.csv.

The following scadnano origami design features are all supported:

  • Insertions, deletions and staple loopouts in designs

  • Staples with dangling ends that don’t basepair with the scaffold

  • Designs on any type of grid (square, hex, honeycomb, none)

  • Designs with or without the DNA sequence assigned

Note

  • It is advisable to specify scaffold and staple sequences before saving the .sc file. If sequences are not specified, all bases will be set as “N”: however, the contact map will still be derived.

  • scadnano supports circular scaffolds, but does not recommend using them in designs. Therefore, this conversion script always assumes that the scaffold strand is LINEAR. Once produced, the contact map can have the scaffold modified to circular via the contactutils.py script.

  • scadnano allows insertions to be placed at offsets that are crossovers, but this should be regarded as bad practice: it leads to unpredictable behaviour both in scadnano and in this algorithm. (And it also makes no sense.)

  • It is good practice to derive a contact map for an origami via both scadnano2contact.py and oxdna2contact.py routes, and verify that they match.

Challenge Tests

The scadnano2contact script handles the following pathological stress tests correctly.

TEST 1: This example is a modified example from the the scadnano web interface. It has deletions, insertions, staple dangles on both staple ends, a scaffold loopout, a staple loopout, and a 5Biosg staple 5’ modification, all in the same design!

scadnano interface example image

TEST 2: A staple begins as ssDNA at the 5’ end on a different row before hybridising with the scaffold:

Staple overshoots scaffold, different row image

TEST 3: A staple is drawn that does not contact the scaffold. This staple is left out of the origami contact map.

Staple does not bind with scaffold image

Technical Note: scadnano File Format Requirements

The scadnano2contact parser relies on the scadnano JSON output file having certain minimal features.

Warning

The scadnano output file format may change with future releases of scadnano. Documented below are the features of the current (Fall 2021) output file format of scadnano that the parser relies on. If the output format changes, the new format should be mappable to these minimal features.

The design used in TEST 1 above has this scadnano JSON design file

The scadnano2contact parser uses/relies on these minimal features of the file:

  • Only the "strands" block is used in the JSON file (other blocks like "helices" and "modifications_in_design" are ignored).

  • Only "sequence", "is_scaffold", and "domains" attributes of a strand are used. The first two are optional, but the latter "domains" attribute is required for each strand.

  • The scaffold is a strand like any other, but has "is_scaffold": true.

  • The domains of a strand are either (a) consecutive runs of bases that are on the same row (“helix”) of the design grid or (b) loopouts.

  • A loopout is a number of bases not hybridised to the scaffold. Loopouts have a simple domain specification, indicating how many bases are not hybridised to the scaffold:

    {"loopout": 3}
    
  • Standard domains have a domain specification like:

    {"helix": 1, "forward": true, "start": 3, "end": 24, "deletions": [20], "insertions": [[14, 1]]}
    
  • The core attributes, always present, are:

    • "helix": 1 The row in the “main view” scadnano grid that the strand domain is on.

    • "forward": true The direction of the strand domain: true means the strand runs from 5’ to 3’ as it goes left to right across the grid.

    • "start": 3 The offset (column of the grid) where the strand domain starts.

    • "end": 24 The offset (column of the grid) immediately after where the strand domain ends. The actual offset where the strand domain ends is thus "end"- 1. This is a strange feature of scadnano.

  • The further optional attributes, existing only on some strand domains are:

    • "deletions": [20] A list of which offsets in the strand domain are deletions, and do not exist as physical bases. At the point of a deletion, neither the staple base nor the scaffold base hybridised to exists. Deletions simply have the effect of making a strand domain physically shorter than the length it occupies in the schematic. They are used for helix twist correction.

    • "insertions": [[26, 2]] A list of two-element lists, detailing base insertions in the strand domain. The first number in each sublist is the offset of insertion; the second number is the insertion length (2 minimum). At the point of an insertion, both staple bases and complementary scaffold bases are added (insertions do not create bulge loops). Insertions simply have the effect of making a strand domain physically longer than the length it occupies in the schematic. They are used for helix twist correction.