Samplers
Outlines offers different sequence sampling algorithms, and we will integrate more in the future. You can read this blog post for an overview of the different sampling algorithm.
Multinomial sampling
Outlines defaults to the multinomial sampler without top-p or top-k sampling, and temperature equal to 1. Not specifying a sampler is equivalent to:
from outlines import models, generate, samplers
model = models.transformers("mistralai/Mistral-7B-0.1")
sampler = samplers.multinomial()
generator = generate.text(model, sampler)
answer = generator("What is 2+2?")
print(answer)
# 4
You can ask the generator to take multiple samples by passing the number of samples when initializing the sampler:
from outlines import models, generate, samplers
model = models.transformers("mistralai/Mistral-7B-0.1")
sampler = samplers.multinomial(3)
generator = generate.text(model, sampler)
answer = generator("What is 2+2?")
print(answer)
# [4, 4, 4]
If you ask multiple samples for a batch of prompt the returned array will be of shape (num_samples, num_batches)
:
from outlines import models, generate, samplers
model = models.transformers("mistralai/Mistral-7B-0.1")
sampler = samplers.multinomial(3)
generator = generate.text(model, sampler)
answer = generator(["What is 2+2?", "What is 3+3?"])
print(answer)
# [[4, 4, 4], [6, 6, 6]]
Top-k sampling
You can ask Outlines to only consider the top-k logits at each step by specifying the value of the top-k
keyword argument when initializing the sampler.
Top-p sampling
You can ask Outlines to only consider the highest probability tokens such that their cumulative probability is greater than a threshold p
. Specify the top_p
keyword argument when initializing the sampler:
Greedy sampler
You can also use the greedy sampler. For this you need to initialize the generator with the sampler:
from outlines import models, generate, samplers
model = models.transformers("mistralai/Mistral-7B-0.1")
sampler = samplers.greedy()
generator = generate.text(model, sampler)
answer = generator("What is 2+2?")
print(answer)
# 4
You cannot ask for multiple samples with the greedy sampler since it does not clear what the result should be.
Beam Search
Outlines also comes with the Beam Search sampling algorithm:
from outlines import models, generate, samplers
model = models.transformers("mistralai/Mistral-7B-Instruct-v0.2")
sampler = samplers.beam_search(beams=5)
generator = generate.text(model, sampler)
answer = generator("What is 2+2?")
print(answer)
# 4
Compatibility
Only models from the transformers
and exllamav2
libraries are compatible with Beam Search.