pyvista-zstd#

Seamlessly compress VTK datasets using Zstandard.

Read in VTK datasets 37x faster, write 14x faster, all while using 28% less space versus VTK’s modern XML format.

Read/Write Speedup and Compression Ratios#

File Type / Method	Write Speed	Compression Ratio	Notes
Legacy VTK (.vtk)	465 MB/s	0.88	Significant overhead
VTK XML, none	256 MB/s	0.70	Significant overhead
VTK XML, zlib	105 MB/s	2.52	VTK Default
VTK XML, lz4	401 MB/s	1.47
VTK XML, lzma	9.93 MB/s	3.10
VTK HDF (.vtkhdf), lvl0	1733 MB/s	0.93	No compression
VTK HDF (.vtkhdf), lvl4	137 MB/s	2.37	Default compression
pyvista-zstd (.pv), lvl3	711 MB/s	3.02	Threads = 0
pyvista-zstd (.pv), lvl3	1845 MB/s	3.02	Threads = 4
pyvista-zstd (.pv), lvl22	15.8 MB/s	3.79	All threads (-1)

Usage#

Install with:

pip install pyvista-zstd

Compatible with all VTK dataset types. Uses PyVista under the hood.

import pyvista_zstd

# create and write out
ds = pv.Sphere()
pyvista_zstd.write(ds, "dataset.pv")

# read in and show these are identical
ds_in = pyvista_zstd.read("dataset.pv")
assert ds == ds_in

Alternative VTK example

import vtk
import pyvista_zstd

# create dataset using VTK source
sphere_source = vtk.vtkSphereSource()
sphere_source.SetRadius(1.0)
sphere_source.SetThetaResolution(32)
sphere_source.SetPhiResolution(32)
sphere_source.Update()

vtk_ds = sphere_source.GetOutput()

# read back
pyvista_zstd.write(vtk_ds, "sphere.pv")
ds_in = pyvista_zstd.read("sphere.pv")

PyVista Integration#

When pyvista-zstd is installed, it automatically registers with PyVista’s reader registry. This means pv.read() handles .pv files directly:

import pyvista as pv

mesh = pv.read("dataset.pv")

No additional imports needed. This works via PyVista’s pyvista.readers entry point group, so the registration happens at install time.

Rational#

VTK’s XML writer is flexible and supports most datasets, but its compression is limited to a single thread, has only a subset of compression algorithms, and the XML format adds significant overhead.

To demonstrate this, the following example writes out a single file without compression. This example requires pyvista>=0.47.0 for the compression parameter.

>>> import numpy as np
>>> import pyvista as pv
>>> ugrid = pv.ImageData(dimensions=(200, 200, 200)).to_tetrahedra()
>>> ugrid["pdata"] = np.random.random(ugrid.n_points)
>>> ugrid["cdata"] = np.random.random(ugrid.n_cells)
>>> nbytes = (
...     ugrid.points.nbytes
...     + ugrid.cell_connectivity.nbytes
...     + ugrid.offset.nbytes
...     + ugrid.celltypes.nbytes
...     + ugrid["pdata"].nbytes
...     + ugrid["cdata"].nbytes
... )
>>> print(f"Size in memory: {nbytes / 1024**2:.2f} MB")

Size in memory: 1993.89 MB

Save using VTK XML format

>>> from pathlib import Path
>>> import time
>>> tmp_path = Path("/tmp/ds.vtu")
>>> tstart = time.time()
>>> ugrid.save(tmp_path, compression=None)
>>> print(f"Written without compression in {time.time() - tstart:.2f} seconds")
>>> nbytes_disk = tmp_path.stat().st_size
>>> print(f"  File size:            {nbytes_disk / 1024**2:.2f} MB")
>>> print(f"  Compression Ratio:    {nbytes / nbytes_disk}")
>>> print()

Written without compression in 7.93 seconds
File size:            2858.94 MB
Compression Ratio:    0.6974239255525742

This amounts to around a 43% overhead using VTK’s XML writer. Using the default compression we can get the file size down to 791 MB, but it takes 19 seconds to compress.

>>> tstart = time.time()
>>> ugrid.save(tmp_path, compression='zlib')  # default
>>> print(f"Compressed in {time.time() - tstart:.2f} seconds")
>>> nbytes_disk = tmp_path.stat().st_size
>>> print(f"  File size:            {nbytes_disk / 1024**2:.2f} MB")
>>> print(f"  Compression Ratio:    {nbytes / nbytes_disk}")
>>> print()

Compressed in 18.83 seconds
File size:            791.05 MB
Compression Ratio:    2.5205590295735663

Clearly there’s room for improvement here as this amounts to a compression rate of 105.89 MB/s.

VTK Compression with Zstandard: pyvista-zstd#

This library, pyvista-zstd, writes out VTK datasets with minimal overhead and uses Zstandard for compression. Moreover, it’s been implemented with multi-threading support for both read and write operations.

Let’s compress that file again but this time using pyvista-zstd:

>>> import pyvista_zstd
>>> tmp_path = Path("/tmp/ds.pv")
>>> tstart = time.time()
>>> pyvista_zstd.write(ugrid, tmp_path)
>>> print(f"Compressed pyvista_zstd in {time.time() - tstart:.2f} seconds")
>>> nbytes_disk = tmp_path.stat().st_size
>>> print(f"  File size:            {nbytes_disk / 1024**2:.2f} MB")
>>> print(f"  Compression Ratio:    {nbytes / nbytes_disk}")

Compressed pyvista_zstd in 0.92 seconds
Threads:              -1
File size:            660.41 MB
Compression Ratio:    3.019175309922273

This gives us a write performance of 2167 MB/s using the default number of threads and compression level, resulting in a 20x speedup in write performance versus VTK’s XML writer. This speedup is most noticeable for larger files:

Even when disabling multi-threading we can still achieve excellent performance:

>>> tstart = time.time()
>>> pyvista_zstd.write(ugrid, tmp_path, n_threads=0)
>>> print(f"Compressed pyvista_zstd in {time.time() - tstart:.2f} seconds")
>>> nbytes_disk = tmp_path.stat().st_size
>>> print(f"  File size:            {nbytes_disk / 1024**2:.2f} MB")
>>> print(f"  Compression Ratio:    {nbytes / nbytes_disk}")

Compressed pyvista_zstd in 2.91 seconds
Threads:              0
File size:            660.47 MB
Compression Ratio:    3.0188911592355683

This amounts to a single-core compression rate of 685.18 MB/s, which is in agreement with Zstandard’s benchmarks.

Note that the benefit of threading drops off rapidly past 8 threads, though part of this is due to the performance versus efficiency cores of the CPU used for benchmarking (see below).

Read/Write Speed versus Number of Threads#

Reading in the dataset is also fast. Comparing with VTK’s XML reader using defaults:

Read VTK XML

>>> print(f"Read VTK XML:")
>>> timeit pv.read("/tmp/ds.vtu")
6.22 s ± 9.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Read zstd

>>> print(f"Read zstd:")
>>> timeit pyvista_zstd.read("/tmp/ds.pv")
563 ms ± 7.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This is an 11x speedup for this dataset versus VTK’s XML, and it’s still fast even with multi-threading disabled:

>>> timeit pyvista_zstd.read("/tmp/ds.pv", n_threads=0)
1.11 s ± 4.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This amounts to 1796 MB/s for a single core, which is also in agreement with Zstandard’s benchmarks.

Additionally, you can control Zstandard’s compression level by setting level=. A quick benchmark for this dataset indicates the defaults give a reasonable performance versus size tradeoff:

Read/Write Speed versus Compression Level#

Note that both pyvista-zstd and VTK’s XML default compression give relatively constant compression ratios for this dataset across varying file sizes:

These benchmarks were performed on an i9-14900KF running the Linux kernel 6.12.41 using zstandard==0.24.0 from PyPI. Storage was a 2TB Samsung 990 Pro without LUKS mounted at /tmp.

Additional Information#

The benchmarks/ directory contains additional benchmarks using many datasets, including all applicable datasets in pyvista.examples (see PyVista Dataset Gallery).

pyvista-zstd

Contents