Verifying a DNA strand polyhedra

 

For the sake of this problem, any desired shape composed of strands is essentially a graph. That is, we can forget about the angles between edges, the rigidity of structure, etc. What we care about is how many edges coincide at each point, and how these coincidences correspond to each other. All the important information is preserved in an abstract graph.

Given the graph of a desired shape, we can construct a kind of dual of the graph by turning each edge into two nodes, one for each "side" of the edge. Consider this example for the tetrahedron:

Each edge of the old graph has become two paired nodes in the new. Edge 1 becomes nodes 1 and 1'. Because the upper (red) side of edge 1 connects to edges 2 and 5, node 1' is connected to nodes 2 and 5 in the new graph. In general, what begins as edge A becomes nodes A and A', each representing a side of edge A. An edge exists between two nodes in the new graph if and only if the corresponding sides of edges in the original graph were directly connected. (This is not really a 'dual' in some sense in that if we took the dual of the dual, we would arrive at something very different than the original.)

This new graph has some interesting properties worth noting.

  • If the original graph has n nodes, then the new graph is made up of n different components. (Each component is connected within itself but not to the others.)
  • Each connected component is a clique with as many nodes as the degree of the old node it represents.

    If we have a directed edge-list in the original graph, say 1->2->3, then we can draw a corresponding directed pseud-path in the new graph: 1'->2, 2'->3. The path is no longer connected, but more importantly it is true to the representation we chose when creating it.

                                    - Verification -                                

    Suppose we have a list of directed edge-lists and we'd like to check that it gives us the correct graph. For our purposes, we'll assume that each edge occurs at most twice in the list and the given edge-lists include no more edges than those in the goal graph. Let's use the name G' to denote the dual (in the sense just described) of the goal graph.

    Suppose the goal graph has n nodes. Begin with an empty graph G of 2*n nodes labelled 1,1',2,2',... ,n,n'. For each edge-list, add the corresponding directed pseudo-path to G. Each time a new pseudo-path is started, it is important to begin choose between n and n' so that each node has at most one incoming and one outgoing edge. This ensures that all DNA matches are between exactly two substrings oriented in opposite directions. It will never be impossible to follow this rule since each edge occurs at most twice in the entire list.

    Once this process is complete, we have a directed graph G. Now forget about the directions and close all connected components. (Here by "close" I mean to connect any two nodes which are path-connected.) The given edge-lists will work if and only if this resulting graph G is equivalent to the dual graph G' of our goal graph.

                                    - Finding Solutions -                                

    The dual G' can also be used to help find a solution. We just need to form edge-lists until each clique is a connected component. Here is one possible procedure: start with the empty graph G. While there are any zero-degree nodes left, build a new edge list by starting at any sub-two-degree node and continue in a directed pseudo-path as long as possible.