molex
molex (molecular exchange) is a Rust library for parsing, analyzing, transforming, and serializing molecular structure data. It supports PDB, mmCIF, BinaryCIF, MRC/CCP4 density maps, and DCD trajectories.
Key concepts
-
MoleculeEntityrepresents a single molecule: a protein chain, a DNA/RNA strand, a ligand, an ion, or a group of waters. Parsing a structure file produces aVec<MoleculeEntity>. -
Atomholds a position, element, atom name, occupancy, and B-factor. Residue and chain context live on the entity that contains the atom. -
Coordsis a binary serialization format used for FFI and IPC (e.g. iceoryx zero-copy between processes). -
Analysis includes covalent bond inference, DSSP hydrogen bond detection, disulfide bridges, and secondary structure classification.
-
VoxelGridandDensityrepresent 3D volumetric data (electron density, cryo-EM maps).
Crate features
| Feature | Description |
|---|---|
default | Core Rust library (no Python) |
python | PyO3 bindings + AtomWorks interop |
API documentation
For the full Rust API reference, run:
cargo doc --open --document-private-items
Installation
As a Rust dependency
Add to your Cargo.toml:
[dependencies]
molex = "0.1"
To enable Python bindings (PyO3 + NumPy + AtomWorks interop):
[dependencies]
molex = { version = "0.1", features = ["python"] }
As a Python package
cd crates/molex
maturin develop --release --features python
python -c "import molex; print('OK')"
Dependencies
molex pulls in:
- glam – 3D math (
Vec3,Mat4) - pdbtbx – PDB format parsing
- ndarray – 3D arrays for density grids
- rmp – MessagePack for BinaryCIF decoding
- flate2 – gzip decompression
- thiserror – error types
With the python feature:
- pyo3 – Python FFI
- numpy – NumPy array interop
Quick Start
Parse a PDB file into entities
use molex::adapters::pdb::pdb_file_to_entities;
use std::path::Path;
let entities = pdb_file_to_entities(Path::new("1ubq.pdb"))?;
for e in &entities {
println!("{}: {} atoms", e.label(), e.atom_count());
}
// Output:
// Protein A: 660 atoms
// Water (58 molecules): 58 atoms
Auto-detect format by extension
structure_file_to_entities dispatches on .pdb/.ent vs mmCIF:
use molex::adapters::pdb::structure_file_to_entities;
use std::path::Path;
let entities = structure_file_to_entities(Path::new("3nez.cif"))?;
Work with entities
use molex::{MoleculeEntity, MoleculeType};
// Filter to protein chains
let proteins: Vec<_> = entities.iter()
.filter(|e| e.molecule_type() == MoleculeType::Protein)
.collect();
// Access protein-specific data
for entity in &proteins {
let protein = entity.as_protein().unwrap();
let backbone = protein.to_backbone();
println!(
"Chain {}: {} residues, {} segments",
protein.pdb_chain_id as char,
protein.residues.len(),
protein.segment_count(),
);
}
Run DSSP secondary structure assignment
use molex::analysis::{detect_dssp, SSType};
let protein = entities[0].as_protein().unwrap();
let backbone = protein.to_backbone();
let (ss_types, hbonds) = detect_dssp(&backbone);
for (i, ss) in ss_types.iter().enumerate() {
println!("Residue {}: {:?}", i, ss); // Helix, Sheet, or Coil
}
Serialize to COORDS binary (for FFI/IPC)
use molex::ops::codec::{serialize, merge_entities};
let coords = merge_entities(&entities);
let bytes = serialize(&coords)?;
// Send `bytes` over FFI, IPC, or network
Python usage
import molex
# PDB round-trip
coords_bytes = molex.pdb_to_coords(pdb_string)
pdb_back = molex.coords_to_pdb(coords_bytes)
# Entity-aware AtomArray conversion (for ML pipelines)
atom_array = molex.entities_to_atom_array(assembly_bytes)
assembly_bytes = molex.atom_array_to_entities(atom_array)
Architecture Overview
Module layout
molex/src/
├── adapters/ File format parsers (PDB, mmCIF, BinaryCIF, MRC, DCD, AtomWorks)
├── analysis/ Structural analysis (bonds, secondary structure, AABB)
├── element.rs Element enum (symbols, covalent radii, colors)
├── entity/ Entity system
│ ├── molecule/ MoleculeEntity enum + subtypes (protein, nucleic acid, small molecule, bulk)
│ └── surface/ Surface types (VoxelGrid, Density)
├── ops/ Operations
│ ├── codec/ Wire formats (COORDS01, ASSEM01), serialize/deserialize, split/merge
│ └── transform/ Kabsch alignment, CA extraction, backbone segments
├── ffi.rs C FFI bindings
├── python.rs PyO3 bindings
└── lib.rs Crate root, re-exports
Entity-first design
Entities (Vec<MoleculeEntity>) are the primary data model.
Adapters parse files into entities. The *_to_entities functions are the primary API. The *_to_coords functions parse to entities, then flatten to Coords via merge_entities.
Analysis operates on &[Atom], &[ResidueBackbone], or &[MoleculeEntity].
Coords is a flat, column-oriented wire format for FFI and IPC (parallel arrays of x/y/z, chain IDs, residue names, etc.).
Entity classification
When a file is parsed, atoms are grouped into entities by chain ID, residue name, and molecule type. The classify_residue function maps 3-letter residue codes to MoleculeType values using lookup tables:
| MoleculeType | Examples |
|---|---|
Protein | Standard amino acids (ALA, GLY, …) |
DNA | DA, DT, DC, DG |
RNA | A, U, C, G |
Ligand | ATP, HEM, NAG, … |
Ion | ZN, MG, CA, FE, … |
Water | HOH, WAT, DOD |
Lipid | OLC, PLM, … |
Cofactor | HEM, NAD, FAD, … |
Solvent | GOL, EDO, PEG, … |
Type hierarchy
MoleculeEntity (enum)
├── Protein(ProteinEntity) -- chain with residues, segment breaks
├── NucleicAcid(NAEntity) -- DNA/RNA chain with residues
├── SmallMolecule(SmallMoleculeEntity) -- single ligand, ion, cofactor, lipid
└── Bulk(BulkEntity) -- grouped water or solvent molecules
Entity trait -- id(), molecule_type(), atoms(), positions(), atom_count()
Polymer trait -- residues(), segment_breaks(), segment_count(), segment_range()
ProteinEntity and NAEntity implement both Entity and Polymer. SmallMoleculeEntity and BulkEntity implement Entity only.
Surface types
The entity::surface module provides volumetric data types:
VoxelGrid– a generic 3D grid (ndarray::Array3<f32>) with crystallographic cell metadata. Handles fractional-to-Cartesian coordinate conversion.Density– wrapsVoxelGridwith density-specific metadata. Constructed by the MRC adapter.
Binary formats
molex defines two compact binary formats for IPC:
- COORDS01 – flat atom array with element data (magic:
COORDS01) - ASSEM01 – entity-aware format preserving molecule type metadata per entity (magic:
ASSEM01\0)
Data Flow
Overview
┌──────────────┐
PDB / mmCIF / BCIF ──>│ ├──> Vec<MoleculeEntity>
MRC / CCP4 ──>│ Adapters ├──> Density (SurfaceEntity)
DCD ──>│ ├──> Vec<DcdFrame>
└──────┬───────┘
│
v
┌───────────────────────┐
│ Entities │
│ │
│ MoleculeEntity │
│ SurfaceEntity │
└──┬────────┬────────┬──┘
│ │ │
v v v
┌──────────┐ ┌────────┐ ┌─────────┐
│ Analysis │ │ Ops │ │ Codec │
│ │ │ │ │ │
│ dssp │ │ kabsch │ │serialize│
│ bonds │ │ align │ │serialize│
│ disulfide│ │extract │ │_assembly│
│ aabb │ │ │ │ │
└──────────┘ └────────┘ └────┬────┘
│
v
FFI / IPC / Python
Analysis, Transform, and Codec are independent — use any combination depending on what you need.
1. Parsing
Every structure adapter returns Vec<MoleculeEntity>:
let entities = pdb_file_to_entities(Path::new("1ubq.pdb"))?;
let entities = mmcif_file_to_entities(Path::new("3nez.cif"))?;
let entities = bcif_file_to_entities(Path::new("1ubq.bcif"))?;
Density and trajectory adapters return their own types:
let density = mrc_file_to_density(Path::new("emd_1234.map"))?;
let frames = dcd_file_to_frames(Path::new("trajectory.dcd"))?;
2. Entity splitting
split_into_entities groups atoms by:
- Chain ID + molecule type for polymers (one entity per chain)
- Chain ID + residue number for small molecules (one entity each)
- All waters into a single
Bulkentity - All solvents into a single
Bulkentity
Each entity gets a unique EntityId.
3. Analysis
let (ss_types, hbonds) = detect_dssp(&backbone_residues);
let bonds = infer_bonds(&atoms, DEFAULT_TOLERANCE);
let disulfides = detect_disulfide_bonds(&atoms);
let aabb = entity.aabb();
4. Transforms
let (rotation, translation) = kabsch_alignment(&reference_ca, &target_ca);
transform_entities(&mut entities, rotation, translation);
let ca_positions = extract_ca_positions(&entities);
5. Serialization
For sending to C/C++/Python consumers:
// COORDS01 (flat atom array)
let bytes = serialize(&merge_entities(&entities))?;
// ASSEM01 (preserves entity types)
let bytes = serialize_assembly(&entities)?;
Entity System
The entity system lives in molex::entity (source: src/entity/).
MoleculeEntity
The central type. An enum with four variants:
pub enum MoleculeEntity {
Protein(ProteinEntity),
NucleicAcid(NAEntity),
SmallMolecule(SmallMoleculeEntity),
Bulk(BulkEntity),
}
Common methods available on all variants (delegated to the inner type):
| Method | Returns | Description |
|---|---|---|
id() | EntityId | Unique identifier |
molecule_type() | MoleculeType | Classification (Protein, DNA, Ligand, etc.) |
atom_set() | &[Atom] | All atoms |
atom_count() | usize | Number of atoms |
positions() | Vec<Vec3> | All atom positions |
label() | String | Human-readable label (e.g. “Protein A”, “Ligand (ATP)”) |
aabb() | Option<Aabb> | Axis-aligned bounding box |
to_coords() | Coords | Flatten to wire format |
residue_count() | usize | Residues (polymer) or molecules (bulk) |
Variant-specific downcasting: as_protein(), as_nucleic_acid(), as_small_molecule(), as_bulk().
MoleculeType
pub enum MoleculeType {
Protein, DNA, RNA, Ligand, Ion, Water, Lipid, Cofactor, Solvent,
}
Residue names are mapped to molecule types by classify_residue() using built-in lookup tables.
Atom
pub struct Atom {
pub position: Vec3, // 3D position in angstroms
pub occupancy: f32, // 0.0 to 1.0
pub b_factor: f32, // temperature factor
pub element: Element, // chemical element
pub name: [u8; 4], // PDB atom name (e.g. b"CA ")
}
Residue name, residue number, and chain ID are stored on the entity/residue that owns the atom.
ProteinEntity
A single protein chain. Implements both Entity and Polymer traits.
pub struct ProteinEntity {
pub id: EntityId,
pub atoms: Vec<Atom>,
pub residues: Vec<Residue>, // name, number, atom_range
pub segment_breaks: Vec<usize>, // backbone gap indices
pub pdb_chain_id: u8,
}
Derived views (computed on each call, not cached):
to_backbone() -> Vec<ResidueBackbone>– N, CA, C, O positions per residueto_protein_residues(is_hydrophobic, get_bonds) -> Vec<ProteinResidue>– full residue view with sidechain atoms and bond topologyto_interleaved_segments() -> Vec<Vec<Vec3>>– N/CA/C positions per continuous segment (for spline rendering)
Segment breaks are computed automatically from C(i)->N(i+1) distance > 2.0 angstroms.
NAEntity
A single DNA or RNA chain. Same structure as ProteinEntity (atoms, residues, segment breaks, chain ID). Implements Entity and Polymer.
SmallMoleculeEntity
A single non-polymer molecule (ligand, ion, cofactor, lipid). Implements Entity only.
BulkEntity
A group of identical small molecules (water, solvent). All atoms stored in a single flat Vec<Atom>. Implements Entity only.
EntityId
An opaque identifier allocated by EntityIdAllocator. Entities within a structure have unique IDs. The allocator is a simple incrementing counter.
Entity and Polymer traits
pub trait Entity {
fn id(&self) -> EntityId;
fn molecule_type(&self) -> MoleculeType;
fn atoms(&self) -> &[Atom];
fn positions(&self) -> Vec<Vec3>; // default impl
fn atom_count(&self) -> usize; // default impl
}
pub trait Polymer: Entity {
fn residues(&self) -> &[Residue];
fn residue_count(&self) -> usize; // default impl
fn segment_breaks(&self) -> &[usize];
fn segment_count(&self) -> usize; // default impl
fn segment_range(&self, idx: usize) -> Range<usize>; // default impl
fn segment_residues(&self, idx: usize) -> &[Residue]; // default impl
}
Surface types
The entity::surface module provides:
VoxelGrid– 3DArray3<f32>grid with unit cell metadata (dimensions, angles, origin, sampling intervals). Providesvoxel_size()andfrac_to_cart_matrix()for coordinate conversion.Density– wrapsVoxelGridwith density map metadata. Constructed bymrc_file_to_density()ormrc_to_density().
Adapters
Format adapters live in molex::adapters (source: src/adapters/). Each adapter’s primary API returns Vec<MoleculeEntity>. Coords-based variants are available for FFI/IPC consumers.
PDB (adapters::pdb)
Parses PDB files using the pdbtbx crate. Handles non-standard lines (GROMACS/MemProtMD output) via sanitization.
// Primary API
pdb_file_to_entities(path: &Path) -> Result<Vec<MoleculeEntity>, CoordsError>
pdb_str_to_entities(pdb_str: &str) -> Result<Vec<MoleculeEntity>, CoordsError>
structure_file_to_entities(path: &Path) -> Result<Vec<MoleculeEntity>, CoordsError>
// Derived Coords API
pdb_file_to_coords(path: &Path) -> Result<Coords, CoordsError>
pdb_str_to_coords(pdb_str: &str) -> Result<Coords, CoordsError>
pdb_to_coords(pdb_str: &str) -> Result<Vec<u8>, CoordsError> // serialized bytes
// Coords -> PDB
coords_to_pdb(coords_bytes: &[u8]) -> Result<String, CoordsError>
structure_file_to_entities auto-detects PDB vs mmCIF by file extension (.pdb/.ent -> PDB, everything else -> mmCIF).
mmCIF (adapters::cif)
Custom DOM-based parser (no external crate). Parses CIF text into a DOM, then extracts atom sites via typed extractors.
mmcif_file_to_entities(path: &Path) -> Result<Vec<MoleculeEntity>, CoordsError>
mmcif_str_to_entities(cif_str: &str) -> Result<Vec<MoleculeEntity>, CoordsError>
mmcif_file_to_coords(path: &Path) -> Result<Coords, CoordsError>
mmcif_str_to_coords(cif_str: &str) -> Result<Coords, CoordsError>
BinaryCIF (adapters::bcif)
Decodes BinaryCIF (MessagePack-encoded CIF) with column-level codecs.
bcif_file_to_entities(path: &Path) -> Result<Vec<MoleculeEntity>, CoordsError>
bcif_to_entities(data: &[u8]) -> Result<Vec<MoleculeEntity>, CoordsError>
bcif_file_to_coords(path: &Path) -> Result<Coords, CoordsError>
bcif_to_coords(data: &[u8]) -> Result<Coords, CoordsError>
MRC/CCP4 density maps (adapters::mrc)
Parses MRC/CCP4 format electron density maps into Density (which wraps VoxelGrid).
mrc_file_to_density(path: &Path) -> Result<Density, DensityError>
mrc_to_density(data: &[u8]) -> Result<Density, DensityError>
DCD trajectories (adapters::dcd)
Reads DCD binary trajectory files (CHARMM/NAMD format).
dcd_file_to_frames(path: &Path) -> Result<Vec<DcdFrame>, CoordsError>
pub struct DcdHeader { /* timestep, n_atoms, etc. */ }
pub struct DcdFrame { pub x: Vec<f32>, pub y: Vec<f32>, pub z: Vec<f32> }
pub struct DcdReader<R> { /* streaming reader */ }
AtomWorks (adapters::atomworks, feature = python)
Bidirectional conversion between molex entities and Biotite AtomArray objects with AtomWorks annotations. Requires the python feature.
Entity-aware functions (preserve molecule type, entity ID, chain grouping):
entities_to_atom_array(assembly_bytes: Vec<u8>) -> PyResult<PyObject>
entities_to_atom_array_plus(assembly_bytes: Vec<u8>) -> PyResult<PyObject>
atom_array_to_entities(atom_array: PyObject) -> PyResult<Vec<u8>>
entities_to_atom_array_parsed(assembly_bytes: Vec<u8>, filename: &str) -> PyResult<PyObject>
parse_file_to_entities(path: &str) -> PyResult<Vec<u8>>
parse_file_full(path: &str) -> PyResult<PyObject>
Flat Coords functions:
coords_to_atom_array(py: Python, coords_bytes: Vec<u8>) -> PyResult<PyObject>
coords_to_atom_array_plus(py: Python, coords_bytes: Vec<u8>) -> PyResult<PyObject>
atom_array_to_coords(atom_array: PyObject) -> PyResult<Vec<u8>>
Analysis
Structural analysis lives in molex::analysis (source: src/analysis/). All analysis functions operate on entity-level types (&[Atom], &[ResidueBackbone]).
Secondary structure (analysis::ss)
DSSP-based secondary structure classification.
use molex::analysis::{detect_dssp, resolve_ss, SSType};
// Full DSSP: detect H-bonds, then classify
let (ss_types, hbonds) = detect_dssp(&backbone_residues);
// ss_types: Vec<SSType> -- one per residue (Helix, Sheet, or Coil)
// hbonds: Vec<HBond> -- backbone H-bond pairs that produced the assignment
// With optional override (e.g. from mmCIF annotation)
let ss = resolve_ss(Some(&override_ss), &backbone_residues);
// Falls back to DSSP if override is None
SSType is a Q3 classification:
pub enum SSType { Helix, Sheet, Coil }
Each variant has a .color() method returning an RGB [f32; 3] for rendering.
Short isolated segments (1-residue helix/sheet runs) are automatically merged to Coil by merge_short_segments.
SS from string
analysis::ss::from_string parses secondary structure strings (e.g. "HHHCCCEEE") into Vec<SSType>.
Bond detection (analysis::bonds)
Covalent bonds
use molex::analysis::{infer_bonds, InferredBond, BondOrder, DEFAULT_TOLERANCE};
let bonds: Vec<InferredBond> = infer_bonds(atoms, tolerance);
// InferredBond { atom_a: usize, atom_b: usize, order: BondOrder }
// BondOrder: Single, Double, Triple, Aromatic
Distance-based inference using element covalent radii with a configurable tolerance (default: DEFAULT_TOLERANCE).
Hydrogen bonds
use molex::analysis::detect_hbonds;
let hbonds: Vec<HBond> = detect_hbonds(&backbone_residues);
// HBond { donor: usize, acceptor: usize } -- residue indices
DSSP-style backbone N-H…O=C hydrogen bond detection using electrostatic energy criteria.
Disulfide bonds
use molex::analysis::{detect_disulfide_bonds, DisulfideBond};
let disulfides: Vec<DisulfideBond> = detect_disulfide_bonds(atoms);
Detects CYS SG-SG bonds by distance.
Bounding box (analysis::aabb)
use molex::analysis::Aabb;
let aabb = Aabb::from_positions(&positions)?;
aabb.center(); // Vec3 -- geometric center
aabb.extents(); // Vec3 -- size along each axis
aabb.radius(); // f32 -- bounding sphere radius
let merged = aabb.union(&other_aabb);
let combined = Aabb::from_aabbs(&[aabb1, aabb2, aabb3])?;
Also available directly on entities: entity.aabb().
Codec
Wire formats, serialization, and entity splitting/merging live in molex::ops::codec (source: src/ops/codec/).
Coords
The flat, column-oriented wire format:
pub struct Coords {
pub num_atoms: usize,
pub atoms: Vec<CoordsAtom>, // x, y, z, occupancy, b_factor
pub chain_ids: Vec<u8>, // per-atom chain ID byte
pub res_names: Vec<[u8; 3]>, // per-atom residue name
pub res_nums: Vec<i32>, // per-atom residue number
pub atom_names: Vec<[u8; 4]>, // per-atom PDB atom name
pub elements: Vec<Element>, // per-atom element
}
Coords is a serialization/interop format for FFI and IPC.
Wire formats
COORDS01
Flat atom array binary format. Header magic: COORDS01 (backward-compatible reader also handles COORDS00 which omits elements).
use molex::ops::codec::{serialize, deserialize};
let bytes: Vec<u8> = serialize(&coords)?;
let coords: Coords = deserialize(&bytes)?;
ASSEM01
Entity-aware binary format that preserves molecule type metadata per entity. Header magic: ASSEM01\0. Use this when you need to round-trip entities without re-running residue classification.
use molex::ops::codec::{serialize_assembly, deserialize_assembly};
let bytes: Vec<u8> = serialize_assembly(&entities)?;
let entities: Vec<MoleculeEntity> = deserialize_assembly(&bytes)?;
Entity splitting and merging
use molex::ops::codec::{split_into_entities, merge_entities};
// Coords -> entities (classifies residues, groups by chain)
let entities: Vec<MoleculeEntity> = split_into_entities(&coords);
// Entities -> Coords (flattens back)
let coords: Coords = merge_entities(&entities);
split_into_entities groups atoms by:
- Chain ID + molecule type for polymers (one entity per chain)
- Chain ID + residue number for small molecules (one entity per molecule)
- All waters into a single
Bulkentity - All solvents into a single
Bulkentity
Transforms (ops::transform)
Structural alignment and extraction utilities:
use molex::ops::transform::*;
// Kabsch alignment (minimize RMSD between point sets)
let (rotation, translation) = kabsch_alignment(&source, &target);
let (rotation, translation, scale) = kabsch_alignment_with_scale(&source, &target);
// Apply transform to entities
transform_entities(&mut entities, &rotation, &translation);
// Align entities to a reference structure
align_to_reference(&mut mobile, &reference);
// Extract alpha-carbon positions
let ca_positions: Vec<Vec3> = extract_ca_positions(&entities);
let ca_by_chain: Vec<Vec<Vec3>> = extract_ca_from_chains(&entities);
// Get continuous backbone segments
let segments: Vec<Vec<Vec3>> = extract_backbone_segments(&entities);
// Compute centroid
let center: Vec3 = centroid(&positions);
ChainIdMapper
Maps multi-character chain ID strings (e.g. “AA”, “AB” for structures with >26 chains) to unique u8 values for the Coords format:
let mut mapper = ChainIdMapper::new();
let id = mapper.get_or_assign("AA"); // assigns a unique byte
Error type
All codec operations return CoordsError:
pub enum CoordsError {
InvalidFormat(String),
PdbParseError(String),
SerializationError(String),
}
C FFI
C-compatible bindings live in molex::ffi (source: src/ffi.rs). These expose COORDS conversion functions for consumption from C, C++, Swift, or any language with C FFI support.
Result type
typedef struct {
const uint8_t *data; // output bytes, or NULL on error
size_t len; // data length
size_t data_len; // allocated capacity
const char *error; // error string, or NULL on success
} CoordsResult;
Functions
pdb_to_coords_bytes
Parse a PDB string into COORDS binary format.
CoordsResult pdb_to_coords_bytes(const char *pdb_ptr, size_t pdb_len);
coords_to_pdb
Convert COORDS binary to a PDB-format string. Returns a null-terminated C string. The caller must free the string with coords_free_string.
const char *coords_to_pdb(const uint8_t *coords_ptr, size_t coords_len, size_t *out_len);
coords_from_coords
Deserialize and re-serialize COORDS bytes (round-trip validation).
CoordsResult coords_from_coords(const uint8_t *coords_ptr, size_t coords_len);
coords_from_backbone
Build COORDS from backbone positions. Currently returns an error (not yet implemented).
CoordsResult coords_from_backbone(
const float *positions,
size_t num_res,
const char *sequence,
const int32_t *chain_breaks,
size_t chain_break_count
);
Memory management
void coords_free_result(const CoordsResult *result);
void coords_free_string(const char *s);
All pointers returned by FFI functions must be freed using the corresponding free function. Do not use free() directly.
Python Bindings
Python bindings are available when molex is built with the python feature. The module is exposed as import molex via PyO3.
Installation
cd crates/molex
maturin develop --release --features python
Core COORDS functions (python.rs)
These operate on serialized COORDS bytes:
import molex
# Parse PDB string to COORDS bytes
coords_bytes = molex.pdb_to_coords(pdb_string)
# Parse mmCIF string to COORDS bytes
coords_bytes = molex.mmcif_to_coords(cif_string)
# Convert COORDS bytes to PDB string
pdb_string = molex.coords_to_pdb(coords_bytes)
# Round-trip validation
validated = molex.deserialize_coords_py(coords_bytes)
AtomWorks interop (adapters::atomworks)
Entity-aware conversions between molex’s ASSEM01 binary format and Biotite AtomArray objects with AtomWorks annotations (entity_id, mol_type, pn_unit_iid, chain_type).
molex -> AtomWorks
import molex
# Convert ASSEM01 bytes to AtomArray (basic annotations)
atom_array = molex.entities_to_atom_array(assembly_bytes)
# Convert with full bond list and chain type annotations
atom_array = molex.entities_to_atom_array_plus(assembly_bytes)
# Convert with AtomWorks cleaning pipeline (leaving group removal,
# charge correction, missing atom imputation)
atom_array = molex.entities_to_atom_array_parsed(assembly_bytes, "3nez.cif.gz")
AtomWorks -> molex
# Convert AtomArray back to ASSEM01 bytes (preserves entity metadata)
assembly_bytes = molex.atom_array_to_entities(atom_array)
File-based shortcuts
# Parse structure file directly to ASSEM01 bytes via AtomWorks
assembly_bytes = molex.parse_file_to_entities("3nez.cif.gz")
# Parse to AtomArray with full AtomWorks pipeline
atom_array = molex.parse_file_full("3nez.cif.gz")
Flat Coords functions
Flat Coords-based conversions are available for working with the COORDS01 format directly:
atom_array = molex.coords_to_atom_array(coords_bytes)
atom_array = molex.coords_to_atom_array_plus(coords_bytes)
assembly_bytes = molex.atom_array_to_coords(atom_array)
These operate on the flat COORDS01 format and do not include entity metadata.