Sample
SAMPLE
generates a randomly-selected subset of records with a uniform probability distribution.
The SAMPLE Int [SEED Int]
statement generates a randomly-selected sample of records with a uniform probability distribution. Using SAMPLE
in transformation scripts allows you to examine a smaller set of records from your dataset.
You can optionally specify a SEED
number so that the statement generates a different sample of records with a uniform probability distribution.
- When you run
SAMPLE
with the sameSEED
, or omit specifying the seed, it returns the same sample of records. - When you run
SAMPLE
with a newSEED
, it returns a different sample of records.
The syntax is:
`SAMPLE <Int> [ SEED <Int> ]`
Where <Int>
is the positive number for the size of the sample and SEED
is optional. The default seed is 42 and it is used when you omit specifying the SEED
value.
For example, you can add these statements to your scripts:
SAMPLE 1000;
SAMPLE 1000 SEED 32;
Updated almost 3 years ago