* Posts by sn3akylink

1 publicly visible post • joined 15 Jul 2024

Honey, I shrunk the LLM! A beginner's guide to quantization – and testing it

sn3akylink

Can quantized models run on GPU?

hello, I have been playing around with quantized models on google colab paid tier GPUs but I see that they run much slower than their non-quantized version. this article, along with others I've read, talk about quantizing models so they run on GPUs but it has been hard for me to find content related to quantizing models to save money and/or speed up costs. my goal is to be able to run a quantized model on a cheaper GPU but I do not want to sacrifice on inference costs. but maybe I have a misunderstanding of what quantizing offers.