contact2dlgraph.py

Note

This is a helper script. It is not directly called by the user. Below are some technical notes about this script.

This script converts an origami contact map to an undirected domain-level graph of the origami connectivity. The graph is stored as a Python NetworkX object. Below are described some technical attributes and properties of the domain-level graph.

Graph Nodes

Nodes of the graph correspond to individual scaffold bases. Each node is named as an i53 base index, which is simply the number of positions the base is away from the 5’ end of the scaffold. The scaffold 5’ base has i53 = 0. The scaffold 3’ base has i53 = (scaffold length - 1).

Note

A graph edge is contained between two graph nodes of different i53 index. This means that 1nt domains cannot be accomodated on the graph. The smallest domain allowed is 2 nucleotides.

Graph nodes also have the following optional attributes:

  • "type" The type of the node, present if the node has a special status: "dangle" when the node is not part of the scaffold, but is rather connected to a staple dangle; "5prime" or "3prime" when the node is at the 5’ or 3’ end of the scaffold, respectively. Note that circular scaffold origamis with a staple section bridging the 5’ and 3’ ends do not have these nodes (see below).

  • "staple_id" The id of the staple that this node is part of (when that staple is hybridised). Nodes that are not ever part of staples do not have this attribute.

Graph Edge Types

There exist five edge types:

  1. domains (runs of bases which are either single stranded scaffold, or double stranded staple-scaffold helices)

  2. nicks (which can also be scaffold crossovers)

  3. staple crossovers

  4. staple loopouts

  5. staple end dangles

The majority of nucleic acid origami schematics (2D, 3D, 2D wireframe, 3D wireframe) can be reduced to these five edge types. A small fraction of origamis also have other features, such as:

  • intentional secondary structure on staple loopouts (or on single-stranded scaffold sections)

  • staple dangles which hybridise together to form e.g. split RNA aptamers

These minority features are not included in the domain-level graph.

Note

Single nucleotide staple dangling ends are permitted in the domain-level graph, since the i53 index of the dangling node is set to a distinct number. Also, single nucleotide loopout crossovers are permitted in the domain-level graph. In the latter case, the nodes at each end of the crossover have distinct i53 indexes.

Graph Edge Attributes

Edges of the domain-level graph are named as the scaffold bases they connect between i.e. (i53 index, i53 index). All edges of the domain-level graph have the following core attributes:

  • "type" Either: "ss_domain", "ds_domain", "ghost_nick", "crossover", "loopout" or "dangle".

  • "scaffold_domain_id" The id of the domain on the scaffold that the edge represents, if the edge type is "ss_domain" or "ds_domain". Other edge types have this attribute set to -1.

  • "runlen" The number of bases on the edge. See Edge Display Lengths below.

  • "schematiclen" The size to (ideally) draw the edge in the origami schematic. (see Edge Run and Display Lengths below).

  • "source" The id of the edge node closest to the scaffold 5’, unless the edge is a "crossover", "loopout" or "dangle", in which case, this is the id of the edge closest to staple 5’.

  • "target" The id of the edge node closest to the scaffold 3’, unless the edge is a "crossover", "loopout" or "dangle", in which case, this is the id of the edge closest to staple 3’.

Some edges also have the following optional attributes:

  • "scaffold_sequence53" The base sequence of the scaffold domain on this edge in direction 5’ scaffold to 3’ scaffold. For edge types "ss_domain" or "ds_domain".

  • "staple_sequence53" The base sequence of the staple section on this edge in direction 5’ staple to 3’ staple (i.e. 3’ scaffold to 5’ scaffold). For edge types "ds_domain", "loopout" or "dangle".

  • "base_i53" List of i53 scaffold indexes for all bases on this edge, in direction 5’ scaffold to 3’ scaffold. For edge types "ss_domain" or "ds_domain". Base indexes are stored to avoid complications with circular scaffolds when graph editing takes place.

  • "staple_id" The id of the staple that this edge is part of, for edge types "ds_domain", "crossover", "loopout" or "dangle".

  • "section_id" The staple section id of this edge, for edge types "ds_domain", "loopout" or "dangle". (Crossovers do not have a section id, since they are what separate staple sections).

Graph Edge Display Lengths

A subtle point is that edge lengths in a drawn origami schematic are not equal to the number of bases an edge contains. Rather, edges have a schematic display size equal to the number of gaps they contain between bases, not the total number of bases (think of drawing a cadnano schematic on a square grid). Therefore:

  • "ss_domain" and "ds_domain", edges have "schematiclen" = "runlen" -1. This is because the number of gaps on these edges is one less than the number of bases.

  • "dangle" edges have have "schematiclen" = "runlen". This is because e.g. a 1nt dangle would be connected to the scaffold by 1 gap, a 2nt dangle by 2 gaps and so on.

  • "loopout" edges have "schematiclen" = "runlen" +1. This is because, e.g. a 1nt loopout has the single unattached base joined to the origami by two gaps.

  • "crossover" edges have "schematiclen" = 3. They consist of a single gap (0 bases), and so would be expected to have "schematiclen" = 1. However a length of 3 is chosen to emphasise crossovers in the schematic and to separate “rows” of the origami, as is traditionally done.

  • "ghost_nick" edges have "schematiclen" = 1 because they consist of a single gap (0 bases). Note that some "ghost_nick" edges are actually scaffold crossovers and should have "schematiclen" = 3 like staple crossovers, to emphasise them. However, this assignment can only be done when scaffold crossovers are detected at a later stage.

Scaffold End Domains

When the scaffold strand is linear (panels (a) and (b) below), then there always exist distinct domains at each end of the scaffold, regardless whether a staple hybridises across the 5’ and 3’ scaffold ends or not. There is always a 5’ node and a 3’ node in the origami graph.

When the scaffold is circular (panels (c) and (d) below), a subtle cases arises in numbering the scaffold end domains (panel (d)).

A circular scaffold has no physical 5’ and 3’ ends, because there is no break in the scaffold. The scaffold ends are “virtual” and are designated as the 5’ and 3’ ends of the sequence as it is listed linearly in a text file. If two domains exist each side of the loop point (panel c) – either double stranded domains as pictured, or one double stranded, one single stranded – then the domains at the virtual beginning and virtual end of the scaffold are distinct, and there exists virtual 5’ and 3’ nodes in the origami graph.

However, when a staple spans the loop point, or when the loop point is part of a large single stranded scaffold domain (panel d), the virtual beginning and virtual end of the circular scaffold are the same domain. In this case, there does NOT exist virtual 5’ and 3’ nodes in the origami graph as they are deleted.

In a circular scaffold, bases are given i53 indexes corresponding to how the scaffold is written linearly in a text file. When the loop is joined, there is a discontinuity in the i53 indexes. This is not so with a linear scaffold where base i53 indexes always increase towards the 3’.

End Domains for Linear and Circular Scaffolds

3. Dangling Node IDs

Unlike other nodes, dangling nodes (at the end of single stranded staple dangles) are not specific bases on the scaffold strand. Therefore, dangling nodes are given an id that is a large number (e.g 1000000) plus the i53 id of the scaffold node that they connect to.