Deploy Gemma 2 LLM with Text Generation Inference (TGI) on Google Cloud GPU
- 2024.11.24
- Google Cloud Platform

My medium article with the code and detailed guide:
https://medium.com/@agapie/deploy-gemma-2-llm-with-text-generation-inference-tgi-on-google-cloud-gpu-86093af9e9e2
TGI_DOCKER_URI = “us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu124.2-3.ubuntu2204.py311”
Gemma 2:
https://ai.google.dev/gemma?gad_source=1&gclid=CjwKCAiAl4a6BhBqEiwAqvrqukjyS2ruigrWFgu3RBY7CQTVwLIWo0lc-Xkjh3KOj7tR18uaF77peRoCH4AQAvD_BwE
https://huggingface.co/google/gemma-2-2b-it
https://arxiv.org/abs/2408.00118
Text Generation Inference (TGI):
https://huggingface.co/docs/text-generation-inference/en/index
https://huggingface.co/blog/martinigoyanes/llm-inference-at-scale-with-tgi