contactutils.py

This script is a set of 21 utility functions for manipulating and querying origami contact map files.

In all cases below, the origami specified is origami_name and is assumed to have a contact map file existing at _assets/origami_name.csv.

1. Scaffold Operations

1.1 Change Scaffold Sequence

Warning

This operation re-writes the original contact map file supplied.

Change the scaffold sequence in a contact map and re-derive the complementary staple sequences for the new scaffold sequence. General syntax:

python3 contactutils.py scaffold origami_name <scaffold seq file> <scaffold material> <staples material>

For example :

python3 contactutils.py scaffold origami_name sequence.txt dna dna

will take contact map file _assets/origami_name.csv and apply the new scaffold sequence given in _assets/sequence.txt treating the scaffold as DNA. The complementary DNA staples will be derived.

Staple dangles or loopouts that do not contact the scaffold retain their current sequences.

File sequence.txt is simply a one-line text file listing the new scaffold sequence from 5’ to 3’.

1.2 Make Scaffold Circular

Warning

This operation re-writes the original contact map file supplied.

Convert the scaffold strand in the contact map to be circular:

python3 contactutils.py circ origami_name

1.3 Make Scaffold Linear

Warning

This operation re-writes the original contact map file supplied.

Convert a circular scaffold in the contact map to be linear:

python3 contactutils.py linear origami_name <base index>

A circular scaffold can be nicked between two bases in order to make it into a linear scaffold:

---->3' 5'---->

The scaffold base number in the current origami design to place the new physical 3’ end of the scaffold must be specified.

For example:

python3 contactutils.py linear origami_name 40

would place the physical 3’ end of the scaffold at scaffold base index 40 in the origami_name design, a scaffold break would be between bases 40 and 41, and base 41 would be the new physical 5’ end of the scaffold. After this, all of the indexes in the origami contact map are re-organised, such that base index 0 is now where the new scaffold 5’ end is.

Note

A warning is created if a circular scaffold is nicked at a location that creates a single base pair domain. The contact map can still be created, but this contact map cannot be transformed into a domain-level graph.

1.4 List Thermodynamically Valid Scaffold Rotation Offsets

Many origami designs can have the scaffold strand “rotated” through the structure, keeping the design invariant. This command displays all of the “rotation offsets” for the scaffold which are thermodynamically valid:

python3 contactutils.py thermo origami_name

“Thermodynamically valid” means that the rotation offset does not cause staple sections shorter than 8bp on the origami. The offsets returned can be safely used with the rotate command (below).

1.5 Rotate Scaffold

Warning

This operation re-writes the original contact map file supplied.

Move the scaffold strand around the origami structure, such that the scaffold 3’ end is at the rotation offset supplied:

python3 contactutils.py rotate origami_name <rotation offset> <scaffold material> <staples material>

For example:

python3 contactutils.py rotate origami_name 50 dna dna

rotates the 3’ end of the scaffold to base index 50 of the current origami design, where the scaffold and staples in the contact map are dna.

Note

  • The rotation offset supplied can be any integer from 0 to (scaffold length - 1).

  • Circular scaffolds have no problems rotating to all rotation offset numbers.

  • Linear scaffolds have minor problems rotating to some rotation offset numbers. Some offsets cause a single base pair domain to appear in the contact map. This generates a warning that the contact map may have limited functionality as it can no longer be transformed into a domain-level graph (but the contact map is still created).

  • Rotating a linear scaffold origami necessarily changes all of the base numbers in the contact map (as the physical 3’ and 5’ scaffold ends have moved). Rotating a circular scaffold origami leaves base numbers the same, as it’s just the scaffold sequence which is shifted.

1.6 Delete All 1nt Scaffold Domains

Remove all scaffold domains that consist of a single unhybridised nucleotide:

python3 contactutils.py del1nt origami_name

The modified contact map is saved to _assets/origami_nameD.csv where D = scaffold nucleotides Deleted.

This operation is sometimes necessary in order to allow the contact map to be converted to a domain-level graph.

An error occurs if a scaffold domain consisting of a single hybridised nucleotide is found in the contact map.

1.7 Delete Staples to Resolve 1nt Scaffold Domains

Remove staples from the contact map such that no single unhybridised nucleotides on the scaffold exist:

python3 contactutils.py rs1nt origami_name

The modified contact map is saved to _assets/origami_nameR.csv where R = staples Removed.

This operation is sometimes necessary in order to allow the contact map to be converted to a domain-level graph.

An error occurs if a scaffold domain consisting of a single hybridised nucleotide is found in the contact map.

2. Staple Operations

2.1 Check Staple Sequences

Check that an origami contact map has staple sequences matching a supplied set of staple sequences:

python3 contactutils.py checkstaples origami_name staples.csv

If the contact map contains exactly those staple sequences in staples.csv, then a match is reported. The staples in the contact map don’t need to be in the same order as staples in staples.csv, only 1) the same number of staples must exist and 2) the same sequences must exist.

2.2 Remove Staples

Warning

This operation re-writes the original contact map file supplied.

Remove a list of staples from the contact map:

python3 contactutils.py del origami_name 12,34,32,10

will delete staples with id 12,34,32 and 10 from the contact map file. Note that the staple list should have no spaces.

3. Sequence Operations

3.1 Change Sequences to RNA

Warning

This operation re-writes the original contact map file supplied.

Change scaffold and staple sequences in the contact map to RNA:

python3 contactutils.py rna origami_name

This operation replaces all T bases with U.

3.2 Set “Lost” Unhybridised Unknown Bases to T

Warning

This operation re-writes the original contact map file supplied.

Set scaffold and staple bases which are lost to T:

python3 contactutils.py tlost origami_name

Lost bases are those that are not hybridised with anything, and which have strange latter assignments like ?, N etc. Note that unhybridised bases that have legal base letters (A, C, G, T, U) are not changed by this operation.

3.3 Set All Bases to T

Warning

This operation re-writes the original contact map file supplied.

Set all scaffold and staple bases to T:

python3 contactutils.py tflush origami_name

4. Statistics Operations

4.1 Origami Statistics

Display stats about origami scaffold length, number of staples, staple split distribution:

python3 contactutils.py stats origami_name

4.2 Scaffold Sequence Repeat Plot

Produce a plot showing how many times k-mers of different sizes are repeated in the origami scaffold sequence:

python3 contactutils.py repeats origami_name

A k-mer is simply a run of k bases on the scaffold.

5. Comparison Operations

5.1 Test Equivalence of Two Contact Maps

Test if two origami contact maps are equivalent:

python3 contactutils.py diff origami_name1 origami_name2

“Equivalence” means that both contact maps have:

  • The same scaffold type

  • The same listed 5’ to 3’ scaffold sequence

  • The same staple connectivity with the scaffold

  • The same staple sequences

Note that the ordering of staples (the id assigned to each staple) is allowed to differ.

If two contact maps are equivalent, they specify exactly the same origami design schematic (perhaps with staples numbered differently).

5.2 Calculate Hamming Distance

Calculate the hamming distance between an original origami contact map and a modified origami contact map:

python3 contactutils.py hamming origami_original origami2

The hamming distance is the number of scaffold bases in origami2 which need to have their staple hybridisation partner changed, in order to match origami_original.

The minimum hamming distance is 0, and the maximum is the scaffold length.

This operation only succeeds when:

  • Scaffolds in the two contact maps supplied are either both LINEAR, or both CIRCULAR.

  • Scaffolds have the same lengths and sequences

6. Export Operations

6.1 Display as Base Pair Matrix

Display contact map as a base-pairing matrix:

python3 contactutils.py matrix origami_name

The matrix has scaffold bases as rows, and staple bases as columns. An element in the matrix is black if there is a base pair, white if not, and grey if the scaffold or staple base never pairs (i.e. stays single stranded) in the origami design. The y-axis just has the first and last scaffold bases marked. The first base (0) is at the scaffold 5’. The x-axis has the staple id marked on the first 5’ base of each staple. No sequences or other marks are shown due to excessive label density.

Warning

Only use this command with small (<1000nt scaffold) origamis. For larger origamis, the matrix takes an inordinate amount of time to produce, and the base pair matrix elements are also too small to see.

6.2 To FASTA File

Write all sequences in a contact map to a FASTA file. Usage:

python3 contactutils.py fasta origami_name

The FASTA file is created at _assets/origami_name_fasta.txt.

6.3 To REVNANO Input File

Convert contact map to a REVNANO sequence input file, keeping just sequence information:

python3 contactutils.py rev origami_name

The REVNANO input file is created at _assets/origami_name.rev.

7. Other Operations

7.1 Move Ambiguous Junctions

Warning

This operation re-writes the original contact map file supplied.

Move the end of an ambiguous hybridised staple section:

python3 contactutils.py move origami_name <scaffold base> <number of bases to move>

For example:

python3 contactutils.py move origami_name 90,-2

will move scaffold base 90 “back” by 2 bases. The script checks if the proposed move is valid, and if so, consistently updates other affected strand lengths to accommodate the change. A warning is generated if the proposed move cannot be executed due to sequence mismatches.

Note that the list of numbers after the origami_name argument should contain no spaces.

7.2 Verify Contact Map Integrity

Perform basic integrity checks on a contact map:

python3 contactutils.py verify origami_name

The following three validity tests are performed and all must be passed:

  • Test 1: In the st part of the contact map, each scaffold base index only appears once. That is, a scaffold base is never hybridised to two staples simultaneously.

  • Test 2: The secondary sc part of the contact map is consistent with the primary st part, and can be fully derived from it.

  • Test 3: Where staples hybridise the scaffold, there is always a Watson-Crick complementary base pair (for DNA or RNA).

Also, warning messages are produced if potential anomalous (but not critical) situations arise, such as:

  • A homogeneous scaffold sequence is present in the contact map (all bases same letter)

  • The scaffold contain base letters outside of the DNA and RNA alphabet “A”,”C”,”G”,”T”,”U”

  • The staples contain base letters outside of the DNA and RNA alphabet “A”,”C”,”G”,”T”,”U”

  • Staples with very short sections which hybridise to the scaffold (<5nt) are present. This could signal an error with derivation of the contact map. Note that short sections which do not hybridise the scaffold are treated as staple loopouts or overhangs and do not cause a warning.

When creating contact maps, scadnano2contact.py and oxdna2contact.py use this function to double-check a derived contact map.