Sample

SAMPLE generates a randomly-selected subset of records with a uniform probability distribution.

The SAMPLE Int [SEED Int] statement generates a randomly-selected sample of records with a uniform probability distribution. Using SAMPLE in transformation scripts allows you to examine a smaller set of records from your dataset.

You can optionally specify a SEED number so that the statement generates a different sample of records with a uniform probability distribution.

When you run SAMPLE with the same SEED, or omit specifying the seed, it returns the same sample of records.
When you run SAMPLE with a new SEED, it returns a different sample of records.

The syntax is:

`SAMPLE <Int>  [ SEED <Int> ]`

Where <Int> is the positive number for the size of the sample and SEED is optional. The default seed is 42 and it is used when you omit specifying the SEED value.

For example, you can add these statements to your scripts:

SAMPLE 1000;
SAMPLE 1000 SEED 32;

Updated about 3 years ago