Reading & Writing
SgffReader
From a file path
python
from sgffp import SgffReader
sgff = SgffReader.from_file("plasmid.dna")From bytes
python
with open("plasmid.dna", "rb") as f:
data = f.read()
sgff = SgffReader.from_bytes(data)From a BinaryIO stream
python
with open("plasmid.dna", "rb") as f:
sgff = SgffReader(f).read()From stdin
python
import sys
sgff = SgffReader(sys.stdin.buffer).read()SgffWriter
To a file path
python
from sgffp import SgffWriter
SgffWriter.to_file(sgff, "output.dna")To bytes
python
data = SgffWriter.to_bytes(sgff)To a BinaryIO stream
python
from io import BytesIO
buf = BytesIO()
SgffWriter(buf).write(sgff)
raw = buf.getvalue()Round-Trip Fidelity
sgffp aims for byte-level round-trip fidelity for most block types:
python
original = open("plasmid.dna", "rb").read()
sgff = SgffReader.from_bytes(original)
output = SgffWriter.to_bytes(sgff)
assert original == output # passes for most filesWhat preserves fidelity:
- XML attribute ordering via the
extrasdict pattern - Raw qualifier lists (
raw_qualifiers) on features - Wrapper-level extras (e.g.,
nextValidIDon feature/primer blocks) - LZMA compression for history blocks (7, 29, 30)
- ZTR binary format for trace blocks (18)
- SnapGene XML conventions (element names uppercase, attribute names lowercase)
Known limitations:
- Block 18 (ZTR trace) round-trips may differ at the byte level due to compression format differences
- LZMA-compressed blocks (7, 29, 30, 34) may produce different compressed output with identical decompressed content
- Blocks 2, 3, 13 are auto-generated by SnapGene and are not parsed/written
Block Filtering
You can filter blocks before writing to strip unnecessary data:
python
from sgffp import SgffObject, SgffWriter
# Keep only sequence and features
filtered = SgffObject(cookie=sgff.cookie)
for block_type in (0, 10):
if block_type in sgff.blocks:
filtered.blocks[block_type] = sgff.blocks[block_type]
SgffWriter.to_file(filtered, "minimal.dna")Or use the CLI:
bash
sff filter input.dna -k 0,10 -o minimal.dna