Use this file to discover all available pages before exploring further.
Dictionary compression can dramatically improve compression ratios for small files that share similar structures. This is especially effective for collections of JSON records, small HTML pages, or similar data.
Traditional compression algorithms rely on finding repetitive patterns within a single file. Small files often don’t have enough repetition to compress well. Dictionaries solve this by:
Pre-learning common patterns from sample data
Sharing patterns across files that have similar structures
Improving compression ratio by 2x or more for small files
Reducing header overhead through entropy tables
Dictionaries are most effective for files under 100KB. The smaller the file, the greater the benefit.
You can also train dictionaries programmatically using the zdict.h API:
#include <zdict.h>// Prepare samples (concatenated in a single buffer)void* samplesBuffer; // All samples concatenatedsize_t* samplesSizes; // Array of individual sample sizesunsigned nbSamples; // Number of samples// Train the dictionarysize_t dictSize = ZDICT_trainFromBuffer( dictBuffer, // Output: dictionary buffer dictBufferCapacity, // Size: typically ~100KB samplesBuffer, // Input: all samples concatenated samplesSizes, // Input: size of each sample nbSamples // Input: number of samples);if (ZDICT_isError(dictSize)) { fprintf(stderr, "Dictionary training failed: %s\n", ZDICT_getErrorName(dictSize));}
static void decompress(const char* fname, const ZSTD_DDict* ddict){ size_t cSize; void* const cBuff = mallocAndLoadFile_orDie(fname, &cSize); unsigned long long const rSize = ZSTD_getFrameContentSize(cBuff, cSize); CHECK(rSize != ZSTD_CONTENTSIZE_ERROR, "%s: not compressed by zstd!", fname); CHECK(rSize != ZSTD_CONTENTSIZE_UNKNOWN, "%s: original size unknown!", fname); void* const rBuff = malloc_orDie((size_t)rSize); /* Check that the dictionary ID matches. * If a non-zstd dictionary is used, then both will be zero. * By default zstd always writes the dictionary ID into the frame. * Zstd will check if there is a dictionary ID mismatch as well. */ unsigned const expectedDictID = ZSTD_getDictID_fromDDict(ddict); unsigned const actualDictID = ZSTD_getDictID_fromFrame(cBuff, cSize); CHECK(actualDictID == expectedDictID, "DictID mismatch: expected %u got %u", expectedDictID, actualDictID); /* Decompress using the dictionary. * If you need to control the decompression parameters, then use the * advanced API: ZSTD_DCtx_setParameter(), ZSTD_DCtx_refDDict(), and * ZSTD_decompressDCtx(). */ ZSTD_DCtx* const dctx = ZSTD_createDCtx(); CHECK(dctx != NULL, "ZSTD_createDCtx() failed!"); size_t const dSize = ZSTD_decompress_usingDDict(dctx, rBuff, rSize, cBuff, cSize, ddict); CHECK_ZSTD(dSize); /* When zstd knows the content size, it will error if it doesn't match. */ CHECK(dSize == rSize, "Impossible because zstd will check this condition!"); printf("%25s : %6u -> %7u \n", fname, (unsigned)cSize, (unsigned)rSize); ZSTD_freeDCtx(dctx); free(rBuff); free(cBuff);}
You can use any buffer as a raw content dictionary without training:
// Use any buffer as a dictionaryvoid* rawDict = myCustomDictionary;size_t rawDictSize = sizeof(myCustomDictionary);// Compress with raw dictionaryZSTD_CCtx_loadDictionary(cctx, rawDict, rawDictSize);ZSTD_compress2(cctx, dst, dstSize, src, srcSize);
Raw dictionaries don’t include entropy tables or dictionary IDs, so they’re less effective than trained dictionaries.