A comprehensive guide to running large language models on your PC with NVIDIA RTX graphics cards
7B models with heavy quantization
7B to 13B models with good quality
34B to 70B models with less quantization
After installation, verify everything works by opening Anaconda Prompt and running:
nvidia-smi
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
start_windows.bat # For Windowsstart_linux.sh # For Linux
This will take 10-30 minutes as it downloads all dependencies.
Flexible, can run on CPU/GPU split
GPU-only, faster if model fits VRAM
python download-model.py TheBloke/Llama-3-8B-Instruct-GGUF
Visit TheBloke's Hugging Face for models
Use Q4_K_M for 7B/8B models. Perfect balance of size and quality.
Lower truncation_length from 4096 to 2048 if VRAM is limited.
Use n-gpu-layers slider to balance between VRAM and RAM.
If the setup is too complex, try these user-friendly alternatives:
Graphical interface for downloading and running GGUF models
Simple command-line tool for local LLM deployment