Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/facebook/zstd/llms.txt

Use this file to discover all available pages before exploring further.

Dictionary training analyzes sample data to create optimized dictionaries for compressing similar files. The resulting dictionary can dramatically improve compression ratios for small data.

ZDICT_trainFromBuffer()

Train a dictionary from an array of samples using the fast COVER algorithm.
size_t ZDICT_trainFromBuffer(
    void* dictBuffer,
    size_t dictBufferCapacity,
    const void* samplesBuffer,
    const size_t* samplesSizes,
    unsigned nbSamples
);
dictBuffer
void*
Output buffer where the trained dictionary will be stored.
dictBufferCapacity
size_t
Maximum size of the output dictionary buffer. Recommended: ~100 KB.
samplesBuffer
const void*
Input buffer containing all samples concatenated together.
samplesSizes
const size_t*
Array containing the size of each sample, in order.
nbSamples
unsigned
Number of samples provided. Recommended: provide ~100x the dictionary size in total samples.

Returns

Size of dictionary stored into dictBuffer (<= dictBufferCapacity), or an error code which can be tested with ZDICT_isError().

Notes

  • This function redirects to ZDICT_optimizeTrainFromBuffer_fastCover() with default parameters (d=8, steps=4, f=20, accel=1)
  • Memory usage is about 6 MB
  • Training will fail if there are not enough samples or if samples are too small (< 8 bytes)
  • Recommended to provide a few thousand samples totaling ~100x the target dictionary size

Example

#define DICT_SIZE (100 * 1024)  // 100 KB
#define NUM_SAMPLES 1000

size_t sampleSizes[NUM_SAMPLES];
void* samplesBuffer = /* load concatenated samples */;
void* dictBuffer = malloc(DICT_SIZE);

size_t dictSize = ZDICT_trainFromBuffer(
    dictBuffer, DICT_SIZE,
    samplesBuffer, sampleSizes, NUM_SAMPLES
);

if (ZDICT_isError(dictSize)) {
    fprintf(stderr, "Dictionary training failed: %s\n",
            ZDICT_getErrorName(dictSize));
}

ZDICT_finalizeDictionary()

Convert raw dictionary content into a zstd dictionary by adding headers and entropy tables.
size_t ZDICT_finalizeDictionary(
    void* dstDictBuffer,
    size_t maxDictSize,
    const void* dictContent,
    size_t dictContentSize,
    const void* samplesBuffer,
    const size_t* samplesSizes,
    unsigned nbSamples,
    ZDICT_params_t parameters
);
dstDictBuffer
void*
Output buffer for the finalized dictionary. Can overlap with dictContent.
maxDictSize
size_t
Maximum size of the output dictionary. Must be >= max(dictContentSize, ZDICT_DICTSIZE_MIN).
dictContent
const void*
Raw dictionary content (can be from any source, not just zstd training).
dictContentSize
size_t
Size of the raw dictionary content.
samplesBuffer
const void*
Buffer containing concatenated samples for building entropy tables.
samplesSizes
const size_t*
Array of sizes for each sample.
nbSamples
unsigned
Number of samples provided.
parameters
ZDICT_params_t
Dictionary parameters:
  • compressionLevel: Optimize for specific compression level (0 = default)
  • notificationLevel: Log verbosity (0-4, where 0 = none)
  • dictID: Force specific dictionary ID (0 = auto-generate random ID)

Returns

Size of dictionary stored into dstDictBuffer (<= maxDictSize), or an error code which can be tested with ZDICT_isError().

Notes

  • Adds zstd header with magic number, dictionary ID, and entropy tables
  • Samples are used to construct statistics for the compression level specified
  • If header + content doesn’t fit in maxDictSize, content is truncated from the beginning
  • Most profitable content is presumed to be at the end of the dictionary
  • May fail if not enough samples, samples are uncompressible, or all samples are identical

Example

// Convert raw content to zstd dictionary
char rawDict[1024] = /* custom dictionary content */;
void* samples = /* load samples */;
size_t sampleSizes[100];

ZDICT_params_t params;
memset(&params, 0, sizeof(params));
params.compressionLevel = 3;
params.notificationLevel = 2;  // Show progress
params.dictID = 0;  // Auto-generate

void* dictBuffer = malloc(110 * 1024);
size_t dictSize = ZDICT_finalizeDictionary(
    dictBuffer, 110 * 1024,
    rawDict, sizeof(rawDict),
    samples, sampleSizes, 100,
    params
);

if (ZDICT_isError(dictSize)) {
    fprintf(stderr, "Failed: %s\n", ZDICT_getErrorName(dictSize));
}

Helper Functions

ZDICT_getDictID()

Extract the dictionary ID from a dictionary buffer.
unsigned ZDICT_getDictID(const void* dictBuffer, size_t dictSize);
Returns the dictionary ID, or 0 if the buffer is not a valid zstd dictionary.

ZDICT_isError()

Test if a return value indicates an error.
unsigned ZDICT_isError(size_t errorCode);
Returns 1 if error, 0 otherwise.

ZDICT_getErrorName()

Get a human-readable error message.
const char* ZDICT_getErrorName(size_t errorCode);
Returns a string describing the error.