pyvista-zstd#
Seamlessly compress VTK datasets using Zstandard.
Read in VTK datasets 37x faster, write 14x faster, all while using 28% less space versus VTK’s modern XML format.
Read/Write Speedup and Compression Ratios#
File Type / Method |
Write Speed |
Compression Ratio |
Notes |
|---|---|---|---|
Legacy VTK (.vtk) |
465 MB/s |
0.88 |
Significant overhead |
VTK XML, none |
256 MB/s |
0.70 |
Significant overhead |
VTK XML, zlib |
105 MB/s |
2.52 |
VTK Default |
VTK XML, lz4 |
401 MB/s |
1.47 |
|
VTK XML, lzma |
9.93 MB/s |
3.10 |
|
VTK HDF (.vtkhdf), lvl0 |
1733 MB/s |
0.93 |
No compression |
VTK HDF (.vtkhdf), lvl4 |
137 MB/s |
2.37 |
Default compression |
pyvista-zstd (.pv), lvl3 |
711 MB/s |
3.02 |
Threads = 0 |
pyvista-zstd (.pv), lvl3 |
1845 MB/s |
3.02 |
Threads = 4 |
pyvista-zstd (.pv), lvl22 |
15.8 MB/s |
3.79 |
All threads (-1) |
Usage#
Install with:
pip install pyvista-zstd
Compatible with all VTK dataset types. Uses PyVista under the hood.
import pyvista_zstd
# create and write out
ds = pv.Sphere()
pyvista_zstd.write(ds, "dataset.pv")
# read in and show these are identical
ds_in = pyvista_zstd.read("dataset.pv")
assert ds == ds_in
Alternative VTK example
import vtk
import pyvista_zstd
# create dataset using VTK source
sphere_source = vtk.vtkSphereSource()
sphere_source.SetRadius(1.0)
sphere_source.SetThetaResolution(32)
sphere_source.SetPhiResolution(32)
sphere_source.Update()
vtk_ds = sphere_source.GetOutput()
# read back
pyvista_zstd.write(vtk_ds, "sphere.pv")
ds_in = pyvista_zstd.read("sphere.pv")
PyVista Integration#
When pyvista-zstd is installed, it automatically registers with PyVista’s
reader registry. This means pv.read() handles .pv files
directly:
import pyvista as pv
mesh = pv.read("dataset.pv")
No additional imports needed. This works via PyVista’s pyvista.readers
entry point group, so the registration happens at install time.
Rational#
VTK’s XML writer is flexible and supports most datasets, but its compression is limited to a single thread, has only a subset of compression algorithms, and the XML format adds significant overhead.
To demonstrate this, the following example writes out a single file
without compression. This example requires pyvista>=0.47.0 for the
compression parameter.
>>> import numpy as np
>>> import pyvista as pv
>>> ugrid = pv.ImageData(dimensions=(200, 200, 200)).to_tetrahedra()
>>> ugrid["pdata"] = np.random.random(ugrid.n_points)
>>> ugrid["cdata"] = np.random.random(ugrid.n_cells)
>>> nbytes = (
... ugrid.points.nbytes
... + ugrid.cell_connectivity.nbytes
... + ugrid.offset.nbytes
... + ugrid.celltypes.nbytes
... + ugrid["pdata"].nbytes
... + ugrid["cdata"].nbytes
... )
>>> print(f"Size in memory: {nbytes / 1024**2:.2f} MB")
Size in memory: 1993.89 MB
Save using VTK XML format
>>> from pathlib import Path
>>> import time
>>> tmp_path = Path("/tmp/ds.vtu")
>>> tstart = time.time()
>>> ugrid.save(tmp_path, compression=None)
>>> print(f"Written without compression in {time.time() - tstart:.2f} seconds")
>>> nbytes_disk = tmp_path.stat().st_size
>>> print(f" File size: {nbytes_disk / 1024**2:.2f} MB")
>>> print(f" Compression Ratio: {nbytes / nbytes_disk}")
>>> print()
Written without compression in 7.93 seconds
File size: 2858.94 MB
Compression Ratio: 0.6974239255525742
This amounts to around a 43% overhead using VTK’s XML writer. Using the default compression we can get the file size down to 791 MB, but it takes 19 seconds to compress.
>>> tstart = time.time()
>>> ugrid.save(tmp_path, compression='zlib') # default
>>> print(f"Compressed in {time.time() - tstart:.2f} seconds")
>>> nbytes_disk = tmp_path.stat().st_size
>>> print(f" File size: {nbytes_disk / 1024**2:.2f} MB")
>>> print(f" Compression Ratio: {nbytes / nbytes_disk}")
>>> print()
Compressed in 18.83 seconds
File size: 791.05 MB
Compression Ratio: 2.5205590295735663
Clearly there’s room for improvement here as this amounts to a compression rate of 105.89 MB/s.
VTK Compression with Zstandard: pyvista-zstd#
This library, pyvista-zstd, writes out VTK datasets with minimal overhead
and uses Zstandard for
compression. Moreover, it’s been implemented with multi-threading
support for both read and write operations.
Let’s compress that file again but this time using pyvista-zstd:
>>> import pyvista_zstd
>>> tmp_path = Path("/tmp/ds.pv")
>>> tstart = time.time()
>>> pyvista_zstd.write(ugrid, tmp_path)
>>> print(f"Compressed pyvista_zstd in {time.time() - tstart:.2f} seconds")
>>> nbytes_disk = tmp_path.stat().st_size
>>> print(f" File size: {nbytes_disk / 1024**2:.2f} MB")
>>> print(f" Compression Ratio: {nbytes / nbytes_disk}")
Compressed pyvista_zstd in 0.92 seconds
Threads: -1
File size: 660.41 MB
Compression Ratio: 3.019175309922273
This gives us a write performance of 2167 MB/s using the default number of threads and compression level, resulting in a 20x speedup in write performance versus VTK’s XML writer. This speedup is most noticeable for larger files:
Speedup versus VTK’s XML#
Even when disabling multi-threading we can still achieve excellent performance:
>>> tstart = time.time()
>>> pyvista_zstd.write(ugrid, tmp_path, n_threads=0)
>>> print(f"Compressed pyvista_zstd in {time.time() - tstart:.2f} seconds")
>>> nbytes_disk = tmp_path.stat().st_size
>>> print(f" File size: {nbytes_disk / 1024**2:.2f} MB")
>>> print(f" Compression Ratio: {nbytes / nbytes_disk}")
Compressed pyvista_zstd in 2.91 seconds
Threads: 0
File size: 660.47 MB
Compression Ratio: 3.0188911592355683
This amounts to a single-core compression rate of 685.18 MB/s, which is in agreement with Zstandard’s benchmarks.
Note that the benefit of threading drops off rapidly past 8 threads, though part of this is due to the performance versus efficiency cores of the CPU used for benchmarking (see below).
Read/Write Speed versus Number of Threads#
Reading in the dataset is also fast. Comparing with VTK’s XML reader using defaults:
Read VTK XML
>>> print(f"Read VTK XML:")
>>> timeit pv.read("/tmp/ds.vtu")
6.22 s ± 9.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Read zstd
>>> print(f"Read zstd:")
>>> timeit pyvista_zstd.read("/tmp/ds.pv")
563 ms ± 7.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This is an 11x speedup for this dataset versus VTK’s XML, and it’s still fast even with multi-threading disabled:
>>> timeit pyvista_zstd.read("/tmp/ds.pv", n_threads=0)
1.11 s ± 4.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This amounts to 1796 MB/s for a single core, which is also in agreement with Zstandard’s benchmarks.
Additionally, you can control Zstandard’s compression level by setting
level=. A quick benchmark for this dataset indicates the defaults
give a reasonable performance versus size tradeoff:
Read/Write Speed versus Compression Level#
Note that both pyvista-zstd and VTK’s XML default compression give
relatively constant compression ratios for this dataset across varying
file sizes:
Compression Ratio versus VTK’s XML#
These benchmarks were performed on an i9-14900KF running the Linux
kernel 6.12.41 using zstandard==0.24.0 from PyPI. Storage was a
2TB Samsung 990 Pro without LUKS mounted at /tmp.
Additional Information#
The benchmarks/ directory contains additional benchmarks using many
datasets, including all applicable datasets in pyvista.examples (see
PyVista Dataset
Gallery).