Skip to content

SnapGene .dna File Format Specification

This document describes the binary format of SnapGene .dna files based on reverse engineering and analysis of files produced by SnapGene versions 5.x–7.x.

File Header

Every .dna file starts with a fixed 19-byte header:

OffsetSizeValueDescription
010x09 (\t)Magic byte
140x00000E (14)Header length (big-endian uint32)
58SnapGeneASCII title
132variestype_of_sequence (big-endian uint16)
152variesexport_version (big-endian uint16)
172variesimport_version (big-endian uint16)

type_of_sequence values:

ValueMeaning
1DNA
2Protein
7RNA

TLV Block Format

After the header, the file contains a sequence of TLV (Type-Length-Value) blocks:

FieldSizeDescription
type1 byteBlock type ID (unsigned)
length4 bytesData length (big-endian uint32)
datalength bytesBlock payload

Blocks appear in any order. Some types can appear multiple times (e.g., block 11 for each history node, block 16 for each trace). Unknown block types should be skipped.

Block Reference

Block 0 — DNA Sequence

Uncompressed DNA sequence with property flags.

OffsetSizeDescription
01Property flags byte
1NASCII sequence data

Property flags (bitmask):

BitMaskMeaning
00x01Circular topology (0=linear)
10x02Double-stranded (0=single)
20x04Dam methylated
30x08Dcm methylated
40x10EcoKI methylated

Block 1 — Compressed DNA Sequence

2-bit encoded DNA used in history nodes for compact storage.

OffsetSizeDescription
04compressed_length (big-endian uint32) — total bytes of remaining data
44uncompressed_length (big-endian uint32) — number of bases
814Metadata header (see below)
22N2-bit GATC-encoded sequence data

14-byte metadata header:

OffsetSizeDescription
01format_version — typically 30 (0x1e)
13Reserved (always 0x000000)
41strandedness_flag — 1 = double-stranded
53Reserved (always 0x000000)
82property_flags (big-endian uint16) — 1 = default, 257 = extended
102Reserved (always 0x0000)
122header_seq_length (big-endian uint16) — matches uncompressed_length

2-bit encoding (2 bits per base, 4 bases per byte, MSB first for full bytes):

BitsBase
00G
01A
10T
11C

Total bytes = ceil(uncompressed_length * 2 / 8).

When the final byte holds fewer than 4 bases, SnapGene right-aligns that partial tail in the low bits of the byte. For example, a 3-base tail uses bit pairs at offsets 4, 2, and 0.

Block 5 — Primers (XML)

XML block parsed by xmltodict. Top-level element: <Primers>.

xml
<Primers nextValidID="3">
  <HybridizationParams
    minContinuousMatchLen="10"
    allowMismatch="1"
    minMeltingTemperature="40"
    showAdditionalFivePrimeMatches="1"
    minimumFivePrimeAnnealing="15"
  />
  <Primer name="Forward" sequence="ATGCATGCATGC">
    <BindingSite
      location="100-112"
      boundStrand="0"
      annealedBases="ATGCATGCATGC"
      meltingTemperature="36"
    >
      <Component hybridizedRange="100-112" bases="ATGCATGCATGC"/>
    </BindingSite>
    <BindingSite simplified="1" location="100-112" boundStrand="0"
      annealedBases="ATGCATGCATGC" meltingTemperature="36">
      <Component hybridizedRange="100-112" bases="ATGCATGCATGC"/>
    </BindingSite>
  </Primer>
</Primers>

Each <Primer> can contain multiple <BindingSite> child elements. SnapGene generates both detailed and simplified (attribute simplified="1") versions. The boundStrand attribute is "0" for forward/top and "1" for reverse/bottom. The location uses 1-based coordinates.

Block 6 — Notes (XML)

File-level metadata. Top-level element: <Notes>.

xml
<Notes>
  <UUID>550e8400-e29b-41d4-a716-446655440000</UUID>
  <Type>Synthetic</Type>
  <Description>My plasmid</Description>
  <ConfirmedExperimentally>0</ConfirmedExperimentally>
  <CustomMapLabel>plasmid.dna</CustomMapLabel>
  <UseCustomMapLabel>1</UseCustomMapLabel>
  <SequenceClass>UNA</SequenceClass>
  <Created UTC="2024-01-15.12:00:00">2024-01-15.12:00:00</Created>
  <LastModified UTC="2024-06-01.15:30:00">2024-06-01.15:30:00</LastModified>
</Notes>

Block 7 — History Tree (LZMA-compressed XML)

The entire block is LZMA-compressed. After decompression, it contains XML with a recursive <Node> tree describing the cloning operation history. Root node represents the current file state; children are previous states (tree grows backward in time).

xml
<HistoryTree>
  <Node ID="2" name="Final.dna" type="DNA" seqLen="5000"
        circular="1" strandedness="double" operation="insertFragment"
        upstreamModification="Unmodified" downstreamModification="Unmodified">
    <InputSummary manipulation="insert" val1="100" val2="4900"
                  name1="EcoRI" siteCount1="1" name2="BamHI" siteCount2="1"/>
    <Node ID="0" name="Vector.dna" type="DNA" seqLen="4000"
          circular="1" strandedness="double" operation="invalid"
          resurrectable="1" .../>
    <Node ID="1" name="Insert.dna" type="DNA" seqLen="1000"
          circular="0" strandedness="double" operation="invalid"
          resurrectable="1" .../>
  </Node>
</HistoryTree>

Node fields:

FieldTypeDescription
IDstringUnique node identifier (integer as string)
namestringSequence filename
typestring"DNA", "RNA", or "Protein"
seqLenstringSequence length (integer as string)
circularstring"0" (linear) or "1" (circular)
strandednessstring"single" or "double"
operationstringOperation that created this state (see below)
upstreamStickinessstringUpstream sticky end length
downstreamStickinessstringDownstream sticky end length
upstreamModificationstringe.g., "Unmodified", "FivePrimePhosphorylated"
downstreamModificationstringe.g., "Unmodified"
resurrectablestring"1" if the user can restore this state

Attribute order convention: SnapGene expects attributes in this order: name, type, seqLen, strandedness, ID, circular, [resurrectable], operation.

Operation types:

OperationCategoryDescription
invalidbaseOriginal/imported file (leaf node)
makeDnacreateCreated new DNA sequence
makeRnacreateCreated new RNA sequence
makeProteincreateCreated new protein sequence
amplifyFragmentcloningPCR amplification
insertFragmentcloningSingle fragment insertion
insertFragmentscloningMultiple fragment insertion
replaceeditSequence edit/substitution
digestcloningRestriction digest
ligateFragmentscloningLigation of fragments
gatewayLRCloningcloningGateway LR reaction
gatewayBPCloningcloningGateway BP reaction
gibsonAssemblycloningGibson assembly
goldenGateAssemblycloningGolden Gate assembly
restrictionCloningcloningRestriction cloning
taCloningcloningTA cloning
topoCloningcloningTOPO cloning
inFusionCloningcloningIn-Fusion cloning
flipeditReverse complement
newFileFromSelectioneditExtract subsequence to new file
primerDirectedMutagenesiseditSite-directed mutagenesis
changeMethylationmetadataChange methylation status
changePhosphorylationmetadataChange phosphorylation status
changeStrandednessmetadataChange single/double stranded
changeTopologymetadataChange linear/circular topology

Nested elements:

  • <InputSummary>Required on every non-leaf node (SnapGene segfaults without it). Contains: manipulation, val1, val2, name1/siteCount1 (enzymes). An empty <InputSummary/> is valid.
  • <Oligo> — Primer used in PCR: name, sequence, phosphorylated (optional)
  • <Parameter> — Key-value pairs: name, val
  • <RegeneratedSite> — Restriction site regenerated by cloning
  • <HistoryColors> — Strand coloring for history map display
  • <Features> — Snapshot of features at this history state
  • <Primers> — Primer binding sites used in this operation
  • <Node> — Child nodes (single dict or list)

Block 8 — Sequence Properties (XML)

Additional sequence properties. Top-level element: <AdditionalSequenceProperties>.

xml
<AdditionalSequenceProperties>
  <UpstreamStickiness>0</UpstreamStickiness>
  <DownstreamStickiness>0</DownstreamStickiness>
  <UpstreamModification>Unmodified</UpstreamModification>
  <DownstreamModification>Unmodified</DownstreamModification>
</AdditionalSequenceProperties>

Block 10 — Features (XML)

Annotation features with segment ranges, qualifiers, and strand mapping.

xml
<Features>
  <Feature name="GFP" type="CDS" directionality="1">
    <Segment range="101-820" color="#00ff00"/>
    <Q name="label"><V text="GFP"/></Q>
    <Q name="note"><V text="Green fluorescent protein"/></Q>
    <Q name="translation"><V text="MVSK..."/></Q>
  </Feature>
</Features>

Strand mapping (from directionality attribute):

ValueStrand
0. (none)
1+ (forward)
2- (reverse)
3= (both)

Parsed format (after sgffp processing):

json
{
    "features": [
        {
            "name": "GFP",
            "type": "CDS",
            "strand": "+",
            "start": 100,
            "end": 820,
            "color": "#00ff00",
            "segments": [{"range": "101-820", "color": "#00ff00"}],
            "qualifiers": {"label": "GFP", "note": "Green fluorescent protein"}
        }
    ]
}

start is 0-based (XML range "101" minus 1), end is 1-based.

Block 11 — History Nodes (Binary)

Each block 11 entry stores a sequence snapshot for one history state. Multiple block 11 entries exist (one per non-root tree node).

Binary layout:

OffsetSizeDescription
04node_index (big-endian uint32) — links to tree node ID
41sequence_type (see below)
5+variesSequence data (type-dependent)
...variesNested TLV blocks (node content)

sequence_type values:

ValueFormatDescription
0Block 0 formatUncompressed DNA (4B length + ASCII)
1Block 1 formatCompressed DNA (recommended, SnapGene default)
21Block 0 formatProtein sequence
29Modifier-only (no sequence data)
32Block 0 formatRNA sequence

Block 14 — Custom Enzyme Sets (XML)

User-defined enzyme groups. Top-level element: <CustomEnzymeSets>.

xml
<CustomEnzymeSets>
  <CustomEnzymeSet type="1" name="MCS Enzymes"
    enzymeNames="Acc65I AscI BamHI EcoRI HindIII KpnI NotI SmaI XmaI"/>
</CustomEnzymeSets>

Block 16 — Trace Container

Container wrapping a ZTR trace (block 18) with optional properties (block 8). Block 18 never appears as a standalone top-level block.

Multiple traces = multiple block 16 entries.

OffsetSizeDescription
04Flags (big-endian uint32): 0 = forward, 1 = reverse
4NNested TLV blocks (block 18, optionally block 8)

Block 17 — Alignable Sequences (XML)

Sequence alignment data. Top-level element: <AlignableSequences>.

xml
<AlignableSequences trimStringency="Medium">
  <Sequence name="ref_seq" sequence="ATGC..."/>
</AlignableSequences>

Block 18 — ZTR Trace Data

Only appears inside block 16 containers.

ZTR format (Staden Package) for Sanger sequencing chromatograms.

Header:

OffsetSizeDescription
08Magic: \xaeZTR\r\n\x1a\n
82Version (typically \x01\x02)

Chunks follow the header:

FieldSizeDescription
type4ASCII chunk type
metadata_length4Metadata length (big-endian uint32)
metadataNMetadata bytes
data_length4Chunk data length (big-endian uint32)
dataNChunk data (may be compressed)

Compression: First byte of chunk data: 0x00 = raw, 0x02 = zlib.

Chunk types:

TypeDescriptionData format
BASEBase callsPadding + ASCII bases
BPOSBase-to-sample positionsPadding + big-endian uint32 per base
CNF4Confidence scores1 byte per base
SMP4Combined ACGT samplesPadding + big-endian uint16 (A,C,G,T sequential)
SAMPSingle channel samplesMetadata: channel letter. Data: big-endian uint16
TEXTMetadata key-value pairsNull-terminated pairs
CLIPQuality clip boundariesLeft uint32 + right uint32
COMMCommentsASCII text

Block 20 — Strand Colors (XML)

Per-strand color highlighting. Top-level element: <StrandColors>.

xml
<StrandColors>
  <TopStrand><ColorRange range="314..354" colors="magenta"/></TopStrand>
  <BottomStrand><ColorRange range="318..358" colors="magenta"/></BottomStrand>
</StrandColors>

Block 21 — Protein Sequence

Same format as Block 0. Sequence contains amino acid single-letter codes.

Block 28 — Enzyme Visibilities (XML)

Override enzyme visibility. Top-level element: <EnzymeVisibilities>.

xml
<EnzymeVisibilities vals=""/>

Block 29 — History Modifier (LZMA XML)

Metadata-only history changes. LZMA-compressed XML. Rarely present.

Block 30 — History Node Content (LZMA TLV)

LZMA-compressed nested TLV blocks inside block 11 node_info. Contains the full file state (features, primers, notes, etc.) at that history point.

Block 32 — RNA Sequence

Same format as Block 0. Contains RNA sequence (ACGU bases).

Block 34 — RNA Structure Predictions (LZMA JSON)

LZMA-compressed JSON with RNA secondary structure prediction data.

json
{
  "bondedPairs": [[22, 30], [23, 29]],
  "parameters": {"noClosingGU": false, "noLP": false, "temperature": 37},
  "probabilityMatrix": [[0, 5, 0.077]],
  "revision": 1,
  "stats": {
    "ensembleDiversity": 213.88,
    "freeEnergyEnsemble": -97.31,
    "freeEnergyOptimal": -75.7,
    "freqOptimalEnsemble": 0.0
  },
  "suboptimalStructures": [{"bondedPairs": []}]
}

Blocks Not Parsed

Blocks 2, 3, and 13 are auto-generated by SnapGene and not parsed:

  • Block 2 — Enzyme cut position index (rare, superseded by block 3)
  • Block 3 — Restriction enzyme recognition map (473 built-in enzymes)
  • Block 13 — Enzyme display/filter settings (always 345 bytes)

Released under the MIT License.