molex

molex (molecular exchange) is a Rust library for parsing, analyzing, transforming, and serializing molecular structure data. It supports PDB, mmCIF, BinaryCIF, MRC/CCP4 density maps, and DCD trajectories.

Key concepts

MoleculeEntity represents a single molecule: a protein chain, a DNA/RNA strand, a ligand, an ion, or a group of waters. Parsing a structure file produces a Vec<MoleculeEntity>.
Atom holds a position, element, atom name, occupancy, and B-factor. Residue and chain context live on the entity that contains the atom.
Coords is a binary serialization format used for FFI and IPC (e.g. iceoryx zero-copy between processes).
Analysis includes covalent bond inference, DSSP hydrogen bond detection, disulfide bridges, and secondary structure classification.
VoxelGrid and Density represent 3D volumetric data (electron density, cryo-EM maps).

Crate features

Feature	Description
`default`	Core Rust library (no Python)
`python`	PyO3 bindings + AtomWorks interop

API documentation

For the full Rust API reference, run:

cargo doc --open --document-private-items

Installation

As a Rust dependency

Add to your Cargo.toml:

[dependencies]
molex = "0.1"

To enable Python bindings (PyO3 + NumPy + AtomWorks interop):

[dependencies]
molex = { version = "0.1", features = ["python"] }

As a Python package

cd crates/molex
maturin develop --release --features python

python -c "import molex; print('OK')"

Dependencies

molex pulls in:

glam – 3D math (Vec3, Mat4)
pdbtbx – PDB format parsing
ndarray – 3D arrays for density grids
rmp – MessagePack for BinaryCIF decoding
flate2 – gzip decompression
thiserror – error types

With the python feature:

pyo3 – Python FFI
numpy – NumPy array interop

Quick Start

Parse a PDB file into entities

use molex::adapters::pdb::pdb_file_to_entities;
use std::path::Path;

let entities = pdb_file_to_entities(Path::new("1ubq.pdb"))?;
for e in &entities {
    println!("{}: {} atoms", e.label(), e.atom_count());
}
// Output:
//   Protein A: 660 atoms
//   Water (58 molecules): 58 atoms

Auto-detect format by extension

structure_file_to_entities dispatches on .pdb/.ent vs mmCIF:

use molex::adapters::pdb::structure_file_to_entities;
use std::path::Path;

let entities = structure_file_to_entities(Path::new("3nez.cif"))?;

Work with entities

use molex::{MoleculeEntity, MoleculeType};

// Filter to protein chains
let proteins: Vec<_> = entities.iter()
    .filter(|e| e.molecule_type() == MoleculeType::Protein)
    .collect();

// Access protein-specific data
for entity in &proteins {
    let protein = entity.as_protein().unwrap();
    let backbone = protein.to_backbone();
    println!(
        "Chain {}: {} residues, {} segments",
        protein.pdb_chain_id as char,
        protein.residues.len(),
        protein.segment_count(),
    );
}

Run DSSP secondary structure assignment

use molex::analysis::{detect_dssp, SSType};

let protein = entities[0].as_protein().unwrap();
let backbone = protein.to_backbone();
let (ss_types, hbonds) = detect_dssp(&backbone);
for (i, ss) in ss_types.iter().enumerate() {
    println!("Residue {}: {:?}", i, ss); // Helix, Sheet, or Coil
}

Serialize to COORDS binary (for FFI/IPC)

use molex::ops::codec::{serialize, merge_entities};

let coords = merge_entities(&entities);
let bytes = serialize(&coords)?;
// Send `bytes` over FFI, IPC, or network

Python usage

import molex

# PDB round-trip
coords_bytes = molex.pdb_to_coords(pdb_string)
pdb_back = molex.coords_to_pdb(coords_bytes)

# Entity-aware AtomArray conversion (for ML pipelines)
atom_array = molex.entities_to_atom_array(assembly_bytes)
assembly_bytes = molex.atom_array_to_entities(atom_array)

Architecture Overview

Module layout

molex/src/
├── adapters/         File format parsers (PDB, mmCIF, BinaryCIF, MRC, DCD, AtomWorks)
├── analysis/         Structural analysis (bonds, secondary structure, AABB)
├── element.rs        Element enum (symbols, covalent radii, colors)
├── entity/           Entity system
│   ├── molecule/     MoleculeEntity enum + subtypes (protein, nucleic acid, small molecule, bulk)
│   └── surface/      Surface types (VoxelGrid, Density)
├── ops/              Operations
│   ├── codec/        Wire formats (COORDS01, ASSEM01), serialize/deserialize, split/merge
│   └── transform/    Kabsch alignment, CA extraction, backbone segments
├── ffi.rs            C FFI bindings
├── python.rs         PyO3 bindings
└── lib.rs            Crate root, re-exports

Entity-first design

Entities (Vec<MoleculeEntity>) are the primary data model.

Adapters parse files into entities. The *_to_entities functions are the primary API. The *_to_coords functions parse to entities, then flatten to Coords via merge_entities.

Analysis operates on &[Atom], &[ResidueBackbone], or &[MoleculeEntity].

Coords is a flat, column-oriented wire format for FFI and IPC (parallel arrays of x/y/z, chain IDs, residue names, etc.).

Entity classification

When a file is parsed, atoms are grouped into entities by chain ID, residue name, and molecule type. The classify_residue function maps 3-letter residue codes to MoleculeType values using lookup tables:

MoleculeType	Examples
`Protein`	Standard amino acids (ALA, GLY, …)
`DNA`	DA, DT, DC, DG
`RNA`	A, U, C, G
`Ligand`	ATP, HEM, NAG, …
`Ion`	ZN, MG, CA, FE, …
`Water`	HOH, WAT, DOD
`Lipid`	OLC, PLM, …
`Cofactor`	HEM, NAD, FAD, …
`Solvent`	GOL, EDO, PEG, …

Type hierarchy

MoleculeEntity (enum)
├── Protein(ProteinEntity)      -- chain with residues, segment breaks
├── NucleicAcid(NAEntity)       -- DNA/RNA chain with residues
├── SmallMolecule(SmallMoleculeEntity) -- single ligand, ion, cofactor, lipid
└── Bulk(BulkEntity)            -- grouped water or solvent molecules

Entity trait   -- id(), molecule_type(), atoms(), positions(), atom_count()
Polymer trait  -- residues(), segment_breaks(), segment_count(), segment_range()

ProteinEntity and NAEntity implement both Entity and Polymer. SmallMoleculeEntity and BulkEntity implement Entity only.

Surface types

The entity::surface module provides volumetric data types:

VoxelGrid – a generic 3D grid (ndarray::Array3<f32>) with crystallographic cell metadata. Handles fractional-to-Cartesian coordinate conversion.
Density – wraps VoxelGrid with density-specific metadata. Constructed by the MRC adapter.

Binary formats

molex defines two compact binary formats for IPC:

COORDS01 – flat atom array with element data (magic: COORDS01)
ASSEM01 – entity-aware format preserving molecule type metadata per entity (magic: ASSEM01\0)

Data Flow

Overview

                       ┌──────────────┐
 PDB / mmCIF / BCIF ──>│              ├──> Vec<MoleculeEntity>
 MRC / CCP4         ──>│   Adapters   ├──> Density (SurfaceEntity)
 DCD                ──>│              ├──> Vec<DcdFrame>
                       └──────┬───────┘
                              │
                              v
                  ┌───────────────────────┐
                  │       Entities        │
                  │                       │
                  │  MoleculeEntity       │
                  │  SurfaceEntity        │
                  └──┬────────┬────────┬──┘
                     │        │        │
                     v        v        v
              ┌──────────┐ ┌────────┐ ┌─────────┐
              │ Analysis │ │  Ops   │ │  Codec  │
              │          │ │        │ │         │
              │ dssp     │ │ kabsch │ │serialize│
              │ bonds    │ │ align  │ │serialize│
              │ disulfide│ │extract │ │_assembly│
              │ aabb     │ │        │ │         │
              └──────────┘ └────────┘ └────┬────┘
                                           │
                                           v
                                  FFI / IPC / Python

Analysis, Transform, and Codec are independent — use any combination depending on what you need.

1. Parsing

Every structure adapter returns Vec<MoleculeEntity>:

let entities = pdb_file_to_entities(Path::new("1ubq.pdb"))?;
let entities = mmcif_file_to_entities(Path::new("3nez.cif"))?;
let entities = bcif_file_to_entities(Path::new("1ubq.bcif"))?;

Density and trajectory adapters return their own types:

let density = mrc_file_to_density(Path::new("emd_1234.map"))?;
let frames = dcd_file_to_frames(Path::new("trajectory.dcd"))?;

2. Entity splitting

split_into_entities groups atoms by:

Chain ID + molecule type for polymers (one entity per chain)
Chain ID + residue number for small molecules (one entity each)
All waters into a single Bulk entity
All solvents into a single Bulk entity

Each entity gets a unique EntityId.

3. Analysis

let (ss_types, hbonds) = detect_dssp(&backbone_residues);
let bonds = infer_bonds(&atoms, DEFAULT_TOLERANCE);
let disulfides = detect_disulfide_bonds(&atoms);
let aabb = entity.aabb();

4. Transforms

let (rotation, translation) = kabsch_alignment(&reference_ca, &target_ca);
transform_entities(&mut entities, rotation, translation);
let ca_positions = extract_ca_positions(&entities);

5. Serialization

For sending to C/C++/Python consumers:

// COORDS01 (flat atom array)
let bytes = serialize(&merge_entities(&entities))?;

// ASSEM01 (preserves entity types)
let bytes = serialize_assembly(&entities)?;

Entity System

The entity system lives in molex::entity (source: src/entity/).

MoleculeEntity

The central type. An enum with four variants:

pub enum MoleculeEntity {
    Protein(ProteinEntity),
    NucleicAcid(NAEntity),
    SmallMolecule(SmallMoleculeEntity),
    Bulk(BulkEntity),
}

Common methods available on all variants (delegated to the inner type):

Method	Returns	Description
`id()`	`EntityId`	Unique identifier
`molecule_type()`	`MoleculeType`	Classification (Protein, DNA, Ligand, etc.)
`atom_set()`	`&[Atom]`	All atoms
`atom_count()`	`usize`	Number of atoms
`positions()`	`Vec<Vec3>`	All atom positions
`label()`	`String`	Human-readable label (e.g. “Protein A”, “Ligand (ATP)”)
`aabb()`	`Option<Aabb>`	Axis-aligned bounding box
`to_coords()`	`Coords`	Flatten to wire format
`residue_count()`	`usize`	Residues (polymer) or molecules (bulk)

Variant-specific downcasting: as_protein(), as_nucleic_acid(), as_small_molecule(), as_bulk().

MoleculeType

pub enum MoleculeType {
    Protein, DNA, RNA, Ligand, Ion, Water, Lipid, Cofactor, Solvent,
}

Residue names are mapped to molecule types by classify_residue() using built-in lookup tables.

Atom

pub struct Atom {
    pub position: Vec3,    // 3D position in angstroms
    pub occupancy: f32,    // 0.0 to 1.0
    pub b_factor: f32,     // temperature factor
    pub element: Element,  // chemical element
    pub name: [u8; 4],     // PDB atom name (e.g. b"CA  ")
}

Residue name, residue number, and chain ID are stored on the entity/residue that owns the atom.

ProteinEntity

A single protein chain. Implements both Entity and Polymer traits.

pub struct ProteinEntity {
    pub id: EntityId,
    pub atoms: Vec<Atom>,
    pub residues: Vec<Residue>,       // name, number, atom_range
    pub segment_breaks: Vec<usize>,   // backbone gap indices
    pub pdb_chain_id: u8,
}

Derived views (computed on each call, not cached):

to_backbone() -> Vec<ResidueBackbone> – N, CA, C, O positions per residue
to_protein_residues(is_hydrophobic, get_bonds) -> Vec<ProteinResidue> – full residue view with sidechain atoms and bond topology
to_interleaved_segments() -> Vec<Vec<Vec3>> – N/CA/C positions per continuous segment (for spline rendering)

Segment breaks are computed automatically from C(i)->N(i+1) distance > 2.0 angstroms.

NAEntity

A single DNA or RNA chain. Same structure as ProteinEntity (atoms, residues, segment breaks, chain ID). Implements Entity and Polymer.

SmallMoleculeEntity

A single non-polymer molecule (ligand, ion, cofactor, lipid). Implements Entity only.

BulkEntity

A group of identical small molecules (water, solvent). All atoms stored in a single flat Vec<Atom>. Implements Entity only.

EntityId

An opaque identifier allocated by EntityIdAllocator. Entities within a structure have unique IDs. The allocator is a simple incrementing counter.

Entity and Polymer traits

pub trait Entity {
    fn id(&self) -> EntityId;
    fn molecule_type(&self) -> MoleculeType;
    fn atoms(&self) -> &[Atom];
    fn positions(&self) -> Vec<Vec3>;     // default impl
    fn atom_count(&self) -> usize;        // default impl
}

pub trait Polymer: Entity {
    fn residues(&self) -> &[Residue];
    fn residue_count(&self) -> usize;     // default impl
    fn segment_breaks(&self) -> &[usize];
    fn segment_count(&self) -> usize;     // default impl
    fn segment_range(&self, idx: usize) -> Range<usize>;  // default impl
    fn segment_residues(&self, idx: usize) -> &[Residue];  // default impl
}

Surface types

The entity::surface module provides:

VoxelGrid – 3D Array3<f32> grid with unit cell metadata (dimensions, angles, origin, sampling intervals). Provides voxel_size() and frac_to_cart_matrix() for coordinate conversion.
Density – wraps VoxelGrid with density map metadata. Constructed by mrc_file_to_density() or mrc_to_density().

Adapters

Format adapters live in molex::adapters (source: src/adapters/). Each adapter’s primary API returns Vec<MoleculeEntity>. Coords-based variants are available for FFI/IPC consumers.

PDB (`adapters::pdb`)

Parses PDB files using the pdbtbx crate. Handles non-standard lines (GROMACS/MemProtMD output) via sanitization.

// Primary API
pdb_file_to_entities(path: &Path) -> Result<Vec<MoleculeEntity>, CoordsError>
pdb_str_to_entities(pdb_str: &str) -> Result<Vec<MoleculeEntity>, CoordsError>
structure_file_to_entities(path: &Path) -> Result<Vec<MoleculeEntity>, CoordsError>

// Derived Coords API
pdb_file_to_coords(path: &Path) -> Result<Coords, CoordsError>
pdb_str_to_coords(pdb_str: &str) -> Result<Coords, CoordsError>
pdb_to_coords(pdb_str: &str) -> Result<Vec<u8>, CoordsError>  // serialized bytes

// Coords -> PDB
coords_to_pdb(coords_bytes: &[u8]) -> Result<String, CoordsError>

structure_file_to_entities auto-detects PDB vs mmCIF by file extension (.pdb/.ent -> PDB, everything else -> mmCIF).

mmCIF (`adapters::cif`)

Custom DOM-based parser (no external crate). Parses CIF text into a DOM, then extracts atom sites via typed extractors.

mmcif_file_to_entities(path: &Path) -> Result<Vec<MoleculeEntity>, CoordsError>
mmcif_str_to_entities(cif_str: &str) -> Result<Vec<MoleculeEntity>, CoordsError>

mmcif_file_to_coords(path: &Path) -> Result<Coords, CoordsError>
mmcif_str_to_coords(cif_str: &str) -> Result<Coords, CoordsError>

BinaryCIF (`adapters::bcif`)

Decodes BinaryCIF (MessagePack-encoded CIF) with column-level codecs.

bcif_file_to_entities(path: &Path) -> Result<Vec<MoleculeEntity>, CoordsError>
bcif_to_entities(data: &[u8]) -> Result<Vec<MoleculeEntity>, CoordsError>

bcif_file_to_coords(path: &Path) -> Result<Coords, CoordsError>
bcif_to_coords(data: &[u8]) -> Result<Coords, CoordsError>

MRC/CCP4 density maps (`adapters::mrc`)

Parses MRC/CCP4 format electron density maps into Density (which wraps VoxelGrid).

mrc_file_to_density(path: &Path) -> Result<Density, DensityError>
mrc_to_density(data: &[u8]) -> Result<Density, DensityError>

DCD trajectories (`adapters::dcd`)

Reads DCD binary trajectory files (CHARMM/NAMD format).

dcd_file_to_frames(path: &Path) -> Result<Vec<DcdFrame>, CoordsError>

pub struct DcdHeader { /* timestep, n_atoms, etc. */ }
pub struct DcdFrame { pub x: Vec<f32>, pub y: Vec<f32>, pub z: Vec<f32> }
pub struct DcdReader<R> { /* streaming reader */ }

AtomWorks (`adapters::atomworks`, feature = `python`)

Bidirectional conversion between molex entities and Biotite AtomArray objects with AtomWorks annotations. Requires the python feature.

Entity-aware functions (preserve molecule type, entity ID, chain grouping):

entities_to_atom_array(assembly_bytes: Vec<u8>) -> PyResult<PyObject>
entities_to_atom_array_plus(assembly_bytes: Vec<u8>) -> PyResult<PyObject>
atom_array_to_entities(atom_array: PyObject) -> PyResult<Vec<u8>>
entities_to_atom_array_parsed(assembly_bytes: Vec<u8>, filename: &str) -> PyResult<PyObject>
parse_file_to_entities(path: &str) -> PyResult<Vec<u8>>
parse_file_full(path: &str) -> PyResult<PyObject>

Flat Coords functions:

coords_to_atom_array(py: Python, coords_bytes: Vec<u8>) -> PyResult<PyObject>
coords_to_atom_array_plus(py: Python, coords_bytes: Vec<u8>) -> PyResult<PyObject>
atom_array_to_coords(atom_array: PyObject) -> PyResult<Vec<u8>>

Analysis

Structural analysis lives in molex::analysis (source: src/analysis/). All analysis functions operate on entity-level types (&[Atom], &[ResidueBackbone]).

Secondary structure (`analysis::ss`)

DSSP-based secondary structure classification.

use molex::analysis::{detect_dssp, resolve_ss, SSType};

// Full DSSP: detect H-bonds, then classify
let (ss_types, hbonds) = detect_dssp(&backbone_residues);
// ss_types: Vec<SSType> -- one per residue (Helix, Sheet, or Coil)
// hbonds: Vec<HBond> -- backbone H-bond pairs that produced the assignment

// With optional override (e.g. from mmCIF annotation)
let ss = resolve_ss(Some(&override_ss), &backbone_residues);
// Falls back to DSSP if override is None

SSType is a Q3 classification:

pub enum SSType { Helix, Sheet, Coil }

Each variant has a .color() method returning an RGB [f32; 3] for rendering.

Short isolated segments (1-residue helix/sheet runs) are automatically merged to Coil by merge_short_segments.

SS from string

analysis::ss::from_string parses secondary structure strings (e.g. "HHHCCCEEE") into Vec<SSType>.

Bond detection (`analysis::bonds`)

Covalent bonds

use molex::analysis::{infer_bonds, InferredBond, BondOrder, DEFAULT_TOLERANCE};

let bonds: Vec<InferredBond> = infer_bonds(atoms, tolerance);
// InferredBond { atom_a: usize, atom_b: usize, order: BondOrder }
// BondOrder: Single, Double, Triple, Aromatic

Distance-based inference using element covalent radii with a configurable tolerance (default: DEFAULT_TOLERANCE).

Hydrogen bonds

use molex::analysis::detect_hbonds;

let hbonds: Vec<HBond> = detect_hbonds(&backbone_residues);
// HBond { donor: usize, acceptor: usize } -- residue indices

DSSP-style backbone N-H…O=C hydrogen bond detection using electrostatic energy criteria.

Disulfide bonds

use molex::analysis::{detect_disulfide_bonds, DisulfideBond};

let disulfides: Vec<DisulfideBond> = detect_disulfide_bonds(atoms);

Detects CYS SG-SG bonds by distance.

Bounding box (`analysis::aabb`)

use molex::analysis::Aabb;

let aabb = Aabb::from_positions(&positions)?;
aabb.center();   // Vec3 -- geometric center
aabb.extents();  // Vec3 -- size along each axis
aabb.radius();   // f32 -- bounding sphere radius

let merged = aabb.union(&other_aabb);
let combined = Aabb::from_aabbs(&[aabb1, aabb2, aabb3])?;

Also available directly on entities: entity.aabb().

Codec

Wire formats, serialization, and entity splitting/merging live in molex::ops::codec (source: src/ops/codec/).

Coords

The flat, column-oriented wire format:

pub struct Coords {
    pub num_atoms: usize,
    pub atoms: Vec<CoordsAtom>,       // x, y, z, occupancy, b_factor
    pub chain_ids: Vec<u8>,           // per-atom chain ID byte
    pub res_names: Vec<[u8; 3]>,      // per-atom residue name
    pub res_nums: Vec<i32>,           // per-atom residue number
    pub atom_names: Vec<[u8; 4]>,     // per-atom PDB atom name
    pub elements: Vec<Element>,       // per-atom element
}

Coords is a serialization/interop format for FFI and IPC.

Wire formats

COORDS01

Flat atom array binary format. Header magic: COORDS01 (backward-compatible reader also handles COORDS00 which omits elements).

use molex::ops::codec::{serialize, deserialize};

let bytes: Vec<u8> = serialize(&coords)?;
let coords: Coords = deserialize(&bytes)?;

ASSEM01

Entity-aware binary format that preserves molecule type metadata per entity. Header magic: ASSEM01\0. Use this when you need to round-trip entities without re-running residue classification.

use molex::ops::codec::{serialize_assembly, deserialize_assembly};

let bytes: Vec<u8> = serialize_assembly(&entities)?;
let entities: Vec<MoleculeEntity> = deserialize_assembly(&bytes)?;

Entity splitting and merging

use molex::ops::codec::{split_into_entities, merge_entities};

// Coords -> entities (classifies residues, groups by chain)
let entities: Vec<MoleculeEntity> = split_into_entities(&coords);

// Entities -> Coords (flattens back)
let coords: Coords = merge_entities(&entities);

split_into_entities groups atoms by:

Chain ID + molecule type for polymers (one entity per chain)
Chain ID + residue number for small molecules (one entity per molecule)
All waters into a single Bulk entity
All solvents into a single Bulk entity

Transforms (`ops::transform`)

Structural alignment and extraction utilities:

use molex::ops::transform::*;

// Kabsch alignment (minimize RMSD between point sets)
let (rotation, translation) = kabsch_alignment(&source, &target);
let (rotation, translation, scale) = kabsch_alignment_with_scale(&source, &target);

// Apply transform to entities
transform_entities(&mut entities, &rotation, &translation);

// Align entities to a reference structure
align_to_reference(&mut mobile, &reference);

// Extract alpha-carbon positions
let ca_positions: Vec<Vec3> = extract_ca_positions(&entities);
let ca_by_chain: Vec<Vec<Vec3>> = extract_ca_from_chains(&entities);

// Get continuous backbone segments
let segments: Vec<Vec<Vec3>> = extract_backbone_segments(&entities);

// Compute centroid
let center: Vec3 = centroid(&positions);

ChainIdMapper

Maps multi-character chain ID strings (e.g. “AA”, “AB” for structures with >26 chains) to unique u8 values for the Coords format:

let mut mapper = ChainIdMapper::new();
let id = mapper.get_or_assign("AA"); // assigns a unique byte

Error type

All codec operations return CoordsError:

pub enum CoordsError {
    InvalidFormat(String),
    PdbParseError(String),
    SerializationError(String),
}

C FFI

C-compatible bindings live in molex::ffi (source: src/ffi.rs). These expose COORDS conversion functions for consumption from C, C++, Swift, or any language with C FFI support.

Result type

typedef struct {
    const uint8_t *data;    // output bytes, or NULL on error
    size_t len;             // data length
    size_t data_len;        // allocated capacity
    const char *error;      // error string, or NULL on success
} CoordsResult;

Functions

`pdb_to_coords_bytes`

Parse a PDB string into COORDS binary format.

CoordsResult pdb_to_coords_bytes(const char *pdb_ptr, size_t pdb_len);

`coords_to_pdb`

Convert COORDS binary to a PDB-format string. Returns a null-terminated C string. The caller must free the string with coords_free_string.

const char *coords_to_pdb(const uint8_t *coords_ptr, size_t coords_len, size_t *out_len);

`coords_from_coords`

Deserialize and re-serialize COORDS bytes (round-trip validation).

CoordsResult coords_from_coords(const uint8_t *coords_ptr, size_t coords_len);

`coords_from_backbone`

Build COORDS from backbone positions. Currently returns an error (not yet implemented).

CoordsResult coords_from_backbone(
    const float *positions,
    size_t num_res,
    const char *sequence,
    const int32_t *chain_breaks,
    size_t chain_break_count
);

Memory management

void coords_free_result(const CoordsResult *result);
void coords_free_string(const char *s);

All pointers returned by FFI functions must be freed using the corresponding free function. Do not use free() directly.

Python Bindings

Python bindings are available when molex is built with the python feature. The module is exposed as import molex via PyO3.

Installation

cd crates/molex
maturin develop --release --features python

Core COORDS functions (`python.rs`)

These operate on serialized COORDS bytes:

import molex

# Parse PDB string to COORDS bytes
coords_bytes = molex.pdb_to_coords(pdb_string)

# Parse mmCIF string to COORDS bytes
coords_bytes = molex.mmcif_to_coords(cif_string)

# Convert COORDS bytes to PDB string
pdb_string = molex.coords_to_pdb(coords_bytes)

# Round-trip validation
validated = molex.deserialize_coords_py(coords_bytes)

AtomWorks interop (`adapters::atomworks`)

Entity-aware conversions between molex’s ASSEM01 binary format and Biotite AtomArray objects with AtomWorks annotations (entity_id, mol_type, pn_unit_iid, chain_type).

molex -> AtomWorks

import molex

# Convert ASSEM01 bytes to AtomArray (basic annotations)
atom_array = molex.entities_to_atom_array(assembly_bytes)

# Convert with full bond list and chain type annotations
atom_array = molex.entities_to_atom_array_plus(assembly_bytes)

# Convert with AtomWorks cleaning pipeline (leaving group removal,
# charge correction, missing atom imputation)
atom_array = molex.entities_to_atom_array_parsed(assembly_bytes, "3nez.cif.gz")

AtomWorks -> molex

# Convert AtomArray back to ASSEM01 bytes (preserves entity metadata)
assembly_bytes = molex.atom_array_to_entities(atom_array)

File-based shortcuts

# Parse structure file directly to ASSEM01 bytes via AtomWorks
assembly_bytes = molex.parse_file_to_entities("3nez.cif.gz")

# Parse to AtomArray with full AtomWorks pipeline
atom_array = molex.parse_file_full("3nez.cif.gz")

Flat Coords functions

Flat Coords-based conversions are available for working with the COORDS01 format directly:

atom_array = molex.coords_to_atom_array(coords_bytes)
atom_array = molex.coords_to_atom_array_plus(coords_bytes)
assembly_bytes = molex.atom_array_to_coords(atom_array)

These operate on the flat COORDS01 format and do not include entity metadata.

Keyboard shortcuts

molex