Can quantized models run on GPU?
hello, I have been playing around with quantized models on google colab paid tier GPUs but I see that they run much slower than their non-quantized version. this article, along with others I've read, talk about quantizing models so they run on GPUs but it has been hard for me to find content related to quantizing models to save money and/or speed up costs. my goal is to be able to run a quantized model on a cheaper GPU but I do not want to sacrifice on inference costs. but maybe I have a misunderstanding of what quantizing offers.