Skip to content

Internals

SCHEME Dict

The SCHEME dictionary in sgffp.parsers maps block type IDs to parser functions. This is the central dispatch table used by parse_blocks().

python
from sgffp.parsers import SCHEME
Block IDParserContent
0parse_sequenceDNA sequence
1parse_compressed_dna2-bit compressed DNA
5parse_xmlPrimers (XML)
6parse_xmlNotes (XML)
7parse_lzma_xmlHistory tree (LZMA XML)
8parse_xmlSequence properties (XML)
10parse_featuresFeatures (XML + qualifier extraction)
11parse_history_nodeHistory node (binary)
14parse_xmlCustom enzyme sets (XML)
16parse_trace_containerTrace container (flags + nested TLV)
17parse_xmlAlignable sequences (XML)
18parse_ztrZTR trace data (inside block 16)
20parse_xmlStrand colors (XML)
21parse_sequenceProtein sequence
28parse_xmlEnzyme visibilities (XML)
29parse_lzma_xmlHistory modifier (LZMA XML)
30parse_lzma_nestedHistory node content (LZMA nested TLV)
32parse_sequenceRNA sequence
34parse_lzma_jsonRNA structure predictions (LZMA JSON)

Blocks not in SCHEME (2, 3, 13) are skipped — SnapGene regenerates them on import.


Parser Functions

All parsers accept data: bytes and return a parsed dict (or None on failure).

parse_blocks(stream) → Dict[int, List[Any]]

Read TLV blocks from a binary stream and dispatch each to its SCHEME parser. Returns the top-level blocks dict used by SgffObject.

parse_sequence(data) → Dict

Parse uncompressed sequence (blocks 0, 21, 32). Returns:

python
{
    "sequence": str,
    "length": int,
    "topology": "linear" | "circular",
    "strandedness": "single" | "double",
    "dam_methylated": bool,
    "dcm_methylated": bool,
    "ecoki_methylated": bool,
}

parse_compressed_dna(data) → Dict

Parse 2-bit GATC-encoded DNA (block 1). Returns sequence plus metadata header fields (format_version, strandedness_flag, property_flags, header_seq_length).

parse_xml(data) → Dict | None

Parse XML block via xmltodict with clean JSON keys (strips @ prefixes, converts #text to _text).

parse_lzma_xml(data) → Dict | None

Decompress LZMA, then parse XML.

parse_lzma_json(data) → Any | None

Decompress LZMA, then parse JSON.

parse_lzma_nested(data) → Dict[int, List] | None

Decompress LZMA, then parse as nested TLV blocks.

parse_features(data) → Dict | None

Parse feature XML with qualifier extraction and strand mapping. Returns:

python
{
    "features": [{"name": str, "type": str, "strand": str, "start": int, "end": int, ...}],
    "wrapper_extras": {"nextValidID": str, ...},
}

parse_ztr(data) → Dict | None

Parse ZTR-format Sanger sequencing trace. Handles chunks: BASE, BPOS, CNF4, SMP4, SAMP, TEXT, CLIP, COMM. Supports raw (format 0) and zlib (format 2) compression.

parse_trace_container(data) → Dict

Parse block 16: 4-byte flags header + nested TLV blocks.

parse_history_node(data) → Dict

Parse block 11 binary: node_index, sequence_type, sequence data, nested TLV blocks.

octet_to_dna(raw_data, base_count) → bytes

Convert 2-bit encoded bytes to ASCII DNA. Encoding: G=00, A=01, T=10, C=11.


SgffModel

Base class for all block-backed models.

python
from sgffp.models.base import SgffModel
MemberDescription
BLOCK_IDSTuple of relevant block type IDs
exists → boolTrue if any relevant blocks exist

Protected Helpers

MethodDescription
_get_block(block_id) → Any | NoneGet first item from block
_set_block(block_id, value)Set block value (replaces)
_get_blocks(block_id) → ListGet all items from block
_set_blocks(block_id, values)Set all block values
_remove_block(block_id) → boolRemove block entirely

SgffListModel[T]

Generic base for list-backed models. Extends SgffModel.

python
from sgffp.models.base import SgffListModel
MemberDescription
items → List[T]Lazily loaded item list
add(item)Append item and sync
remove(idx) → boolRemove by index and sync
clear()Remove all items and sync
len(model)Item count
model[idx]Item at index
for item in modelIterate items

Abstract Methods (subclass must implement)

MethodDescription
_load() → List[T]Parse items from block storage
_sync()Write items back to block storage

Subclasses

  • SgffFeatureList (block 10)
  • SgffPrimerList (block 5)
  • SgffAlignmentList (block 17)
  • SgffTraceList (block 16)

Released under the MIT License.