A multi-threaded zip/unzip library and CLI for Rust.
- Parallel compression -- files are compressed concurrently with rayon + flate2 (zlib-rs), then assembled into a valid ZIP archive
- Parallel extraction -- files are decompressed concurrently from mmap'd archives with zero-copy reads
- CRC32 on every file -- SIMD-accelerated (crc32fast), validated on every extraction
- Atomic archive writes -- compression writes to a tempfile, fsyncs, then renames; a crash mid-write never produces a corrupt archive
- Path traversal prevention -- rejects
../attacks, absolute paths, and Windows drive letters before any extraction begins - ZIP64 support -- automatic for >65,535 entries, >4 GB files, or >4 GB offsets
- Zstd compression -- Zstandard (method 93) as an alternative to DEFLATE, with full interop
- Incompressible data detection -- falls back to Stored when compression would inflate the data
- Windows long path support --
\\?\extended-length paths for paths exceeding MAX_PATH (260 chars) - Adaptive memory management -- dynamically sizes the in-memory compression threshold based on available system RAM (up to 400 MB budget), so small files stay in memory while large files stream through temp files
- Deterministic output -- archives are byte-identical across runs (entries sorted by path)
ripzip (parallel, rayon + flate2/zlib-rs) vs the zip crate (single-threaded, miniz_oxide). Both at DEFLATE compression level 1. Best of 5 runs, filesystem caches warm.
CPU: Intel Core i7-14700K (20 cores / 28 threads) -- Windows 11 -- NVMe SSD
| Scenario | Files | Data | ripzip | zip crate | Speedup |
|---|---|---|---|---|---|
| 50k small source files | 50,000 | 14 MB | 378ms (38 MB/s) | 2.40s (6 MB/s) | 6.3x |
| 500 x 10 MB log files | 500 | 5 GB | 488ms (10.2 GB/s) | 2.20s (2.3 GB/s) | 4.5x |
| 100 x 50 MB binary blobs | 100 | 5 GB | 214ms (23.4 GB/s) | 2.29s (2.2 GB/s) | 10.7x |
| Mixed (10k src + 1 GB assets) | 10,050 | 1 GB | 531ms (1.9 GB/s) | 1.04s (967 MB/s) | 2.0x |
| Scenario | Files | Data | ripzip | zip crate | Speedup |
|---|---|---|---|---|---|
| 50k small source files | 50,000 | 14 MB | 27.47s (1 MB/s) | 33.73s (0 MB/s) | 1.2x |
| 500 x 10 MB log files | 500 | 5 GB | 1.13s (4.4 GB/s) | 3.68s (1.4 GB/s) | 3.3x |
| 100 x 50 MB binary blobs | 100 | 5 GB | 1.18s (4.2 GB/s) | 4.45s (1.1 GB/s) | 3.8x |
| Mixed (10k src + 1 GB assets) | 10,050 | 1 GB | 4.24s (237 MB/s) | 6.20s (162 MB/s) | 1.5x |
Takeaway: ripzip compresses 2.0--10.7x faster and extracts 1.2--3.8x faster across all workloads. Speedup scales with individual file size -- the 5 GB binary blob corpus sees the biggest compression wins (10.7x) because all 28 threads are saturated with real DEFLATE work on large chunks. The 50k small files scenario is filesystem-metadata-bound, where parallelism still helps but the per-file overhead floor is higher.
Archive sizes are identical between the two -- same DEFLATE algorithm, same compression level.
| Scenario | Files | Data | Deflate | Zstd | Zstd speedup | Deflate archive | Zstd archive |
|---|---|---|---|---|---|---|---|
| 50k small source files | 50,000 | 14 MB | 378ms (38 MB/s) | 1.10s (13 MB/s) | 0.3x | 10 MB | 10 MB |
| 500 x 10 MB log files | 500 | 5 GB | 488ms (10.2 GB/s) | 213ms (23.5 GB/s) | 2.3x | 62 MB | 592 KB |
| 100 x 50 MB binary blobs | 100 | 5 GB | 214ms (23.4 GB/s) | 163ms (30.7 GB/s) | 1.3x | 64 MB | 495 KB |
| Mixed (10k src + 1 GB assets) | 10,050 | 1 GB | 531ms (1.9 GB/s) | 645ms (1.6 GB/s) | 0.8x | 36 MB | 24 MB |
Takeaway: Zstd achieves dramatically better compression ratios on large files (100x smaller archives for logs/blobs) while being comparable or faster for compression. On many small files, Deflate wins because Zstd's per-file initialization cost is higher. Extraction speeds are nearly identical -- both are I/O-bound at this level of parallelism.
cargo bench -p ripzipAdd to your Cargo.toml:
[dependencies]
ripzip = { path = "ripzip" }use std::path::Path;
use ripzip::{NoProgress, compress_directory, extract_to_directory};
// Compress a directory
use ripzip::CompressionMethod;
compress_directory(
Path::new("my_project/"),
Path::new("my_project.zip"),
1, // compression level (1=fastest, 9=smallest)
CompressionMethod::Deflate, // or CompressionMethod::Zstd
&NoProgress, // or implement ProgressReporter for progress bars
)?;
// Extract an archive
extract_to_directory(
Path::new("my_project.zip"),
Path::new("output/"),
&NoProgress,
)?;
# Ok::<(), ripzip::RipzipError>(())Implement the ProgressReporter trait for real-time progress updates:
use ripzip::ProgressReporter;
struct MyReporter;
impl ProgressReporter for MyReporter {
fn start(&self, total_files: u64, total_bytes: u64) {
println!("Processing {total_files} files ({total_bytes} bytes)");
}
fn progress(&self, bytes_delta: u64) {
// Called from worker threads -- use atomics for aggregation.
// bytes_delta is uncompressed bytes just processed.
}
fn finish(&self) {
println!("Done!");
}
}Progress callbacks fire at chunk granularity (256 KB), so even single large files show smooth progress.
cargo install --path ripzip-cliripzip compress <DIR> -o <FILE> [--level 1-9] [--method deflate|zstd] [--quiet]
ripzip extract <ARCHIVE> [-o <DIR>] [--quiet]
ripzip list <ARCHIVE> [--verbose]
Aliases: c, x, l.
$ ripzip compress my_project/ -o my_project.zip --method zstd
[00:00:00] [####################################] 142.3MB/142.3MB (1.8GB/s)
Created my_project.zip
$ ripzip extract my_project.zip -o output/
[00:00:00] [####################################] 142.3MB/142.3MB (3.2GB/s)
Extracted to output/
$ ripzip list my_project.zip --verbose
Compressed Original Method Name
------------------------------------------------------------
1234 5678 Deflate src/main.rs
0 0 Stored assets/
210 files, 142300000 bytes uncompressed
- CRC32 on every file -- computed during compression, validated during extraction. Tampered or corrupt archives are rejected. On CRC mismatch during extraction, the corrupt output file is deleted.
- Atomic archive writes -- the archive is assembled into a tempfile, fsynced, then renamed. A crash or power loss mid-compression never produces a corrupt
.zipfile. (Extraction writes directly to destination for performance -- the archive is the source of truth and can always be re-extracted.) - Path traversal prevention -- all archive paths are validated before any extraction. Paths containing
.., absolute paths, and Windows drive letters are rejected. - ZIP64 -- automatically used when entry counts exceed 65,535, file sizes exceed 4 GB, or offsets exceed 4 GB.
- fsync before rename -- data is flushed to disk before the atomic rename, ensuring durability.
- Incompressible data detection -- if compression produces output larger than the input, the file is stored uncompressed.
COMPRESSION PIPELINE
walkdir ──> Vec<FileEntry> ──> rayon::par_iter ──> Vec<CompressedEntry>
|
per-file: read + CRC32 + DEFLATE/Zstd
(adaptive threshold: in memory or via temp file)
|
v
sequential ZIP assembly
(local headers + data + central dir + EOCD)
|
fsync + rename
EXTRACTION PIPELINE
open archive ──> mmap (< 2GB) or per-thread file handles (>= 2GB)
|
parse EOCD ──> parse central directory ──> validate all paths
|
create directories (sequential)
|
rayon::par_iter (per file):
zero-copy slice from mmap ──> DEFLATE/Zstd + CRC32 verify ──> write to destination
ripzip-rs/
ripzip/ # Library crate
src/
lib.rs # Public API
error.rs # RipzipError enum
progress.rs # ProgressReporter trait
fs_utils.rs # Path validation, directory walking, long path support
compress/ # Compression pipeline
mod.rs # Orchestrator
parallel.rs # Per-file compression
zip_writer.rs # ZIP format assembler
extract/ # Extraction pipeline
mod.rs # Orchestrator
parallel.rs # Per-file extraction + CRC validation
zip_reader.rs # EOCD + central directory parser
zip_format/ # ZIP binary format
mod.rs # Constants, helpers
local_header.rs # Local file header
central_dir.rs # Central directory entry
eocd.rs # End of Central Directory
zip64.rs # ZIP64 extensions
crc.rs # CRC32 helpers
tests/
integration/ # 73 integration tests across 13 categories
benches/
compare.rs # ripzip vs zip crate benchmarks
ripzip-cli/ # CLI binary (clap + indicatif)
117 tests: 35 unit tests + 82 integration tests (3 ZIP64 stress tests are #[ignore]).
cargo testIntegration test categories: round-trip, empty files, unicode filenames, large files, deep directories, progress callbacks, error handling, CRC validation, path traversal, parallel determinism, binary data, single files, interop with the zip crate (Deflate + Zstd), ZIP64, Windows long paths.
MIT