Skip to content

Reading & Writing

SgffReader

From a file path

python
from sgffp import SgffReader

sgff = SgffReader.from_file("plasmid.dna")

From bytes

python
with open("plasmid.dna", "rb") as f:
    data = f.read()

sgff = SgffReader.from_bytes(data)

From a BinaryIO stream

python
with open("plasmid.dna", "rb") as f:
    sgff = SgffReader(f).read()

From stdin

python
import sys
sgff = SgffReader(sys.stdin.buffer).read()

SgffWriter

To a file path

python
from sgffp import SgffWriter

SgffWriter.to_file(sgff, "output.dna")

To bytes

python
data = SgffWriter.to_bytes(sgff)

To a BinaryIO stream

python
from io import BytesIO

buf = BytesIO()
SgffWriter(buf).write(sgff)
raw = buf.getvalue()

Round-Trip Fidelity

sgffp aims for byte-level round-trip fidelity for most block types:

python
original = open("plasmid.dna", "rb").read()
sgff = SgffReader.from_bytes(original)
output = SgffWriter.to_bytes(sgff)

assert original == output  # passes for most files

What preserves fidelity:

  • XML attribute ordering via the extras dict pattern
  • Raw qualifier lists (raw_qualifiers) on features
  • Wrapper-level extras (e.g., nextValidID on feature/primer blocks)
  • LZMA compression for history blocks (7, 29, 30)
  • ZTR binary format for trace blocks (18)
  • SnapGene XML conventions (element names uppercase, attribute names lowercase)

Known limitations:

  • Block 18 (ZTR trace) round-trips may differ at the byte level due to compression format differences
  • LZMA-compressed blocks (7, 29, 30, 34) may produce different compressed output with identical decompressed content
  • Blocks 2, 3, 13 are auto-generated by SnapGene and are not parsed/written

Block Filtering

You can filter blocks before writing to strip unnecessary data:

python
from sgffp import SgffObject, SgffWriter

# Keep only sequence and features
filtered = SgffObject(cookie=sgff.cookie)
for block_type in (0, 10):
    if block_type in sgff.blocks:
        filtered.blocks[block_type] = sgff.blocks[block_type]

SgffWriter.to_file(filtered, "minimal.dna")

Or use the CLI:

bash
sff filter input.dna -k 0,10 -o minimal.dna

Released under the MIT License.