How to Run Your Own AI Models: Getting Started with LM Studio

In my last post, we talked about why you might want to build your own AI sandbox. Now let’s actually build one.

This isn’t difficult, but it does require downloading a few gigabytes of data and waiting for the installation to complete. Grab a coffee, and let’s get started.

What You Need Before You Start

Hardware Check:

Minimum: 8GB RAM, any modern processor (last 5 years)
Comfortable: 16GB RAM, dedicated GPU (optional but helpful)
Ideal: 32GB+ RAM, NVIDIA GPU with 8GB+ VRAM
Storage: 10-50GB free space (models vary in size)

Simple test: Can your computer handle video editing or modern gaming? Then it can probably run AI models.

What this means in practice:

8GB RAM = smaller models (3B parameters), slower responses
16GB RAM = medium models (7-13B parameters), decent speed
32GB+ RAM = larger models (30B+ parameters), faster responses

Step 1: Installing LM Studio

Why LM Studio? Right now, it’s the easiest way to get started. Think of it as iTunes for AI models. It handles all the technical complexity behind a clean interface.

The Download:

Go to lmstudio.ai
Download for your operating system (Windows/Mac/Linux)
Install like any other application
Launch it, you’ll see a clean, modern interface

First impressions tour:

Left sidebar: Your downloaded models
Center: The chat interface (looks familiar, right?)
Top: Search and download new models
Bottom: Settings and system resources

Step 2: Downloading Your First Model

The paradox of choice: There are thousands of models available. For your first time, I’ll keep it simple.

Recommended starter model: Llama 3.2 3B Instruct (or current equivalent)

Why this one?

Small enough to run on most computers (3-4GB download)
Fast responses
Good at following instructions
Won’t max out your system

How to download:

Click the search/download icon (top of LM Studio)
Type “Llama 3.2 3B”
Look for “Instruct” in the name
Find one that says “GGUF” format (this is optimized for local running)
Click download
Wait 5-15 minutes, depending on your connection

While you wait: Brief explanation of what’s downloading: You’re downloading billions of mathematical parameters, compressed into a single file. This is the entire ‘brain’ of the AI, trained on massive amounts of text.

Step 3: Loading and Chatting

Loading your model

When you download a model and “load” it, you’re running or using the model. Think of it like downloading a movie and pressing play, or downloading a program and launching it.

Once downloaded, it appears in your left sidebar
Click it
Click “Load Model” at the top
Watch the loading bar (your computer is reading the model into memory)
When it says “Model loaded,” you’re ready

Your first conversation

Start simple:

“Hello, who are you?”
“What can you help me with?”
“Write a haiku about sandboxes.”

What to notice

Response speed (slower than ChatGPT at first)
Your computer’s fan noise (it’s working!)
Be sure to plug in your computer and watch the battery usage.
No internet required (you can turn off WiFi and it still works)

Understanding What You’re Seeing

The response generation. Watch the text appear word by word. Each word is the result of billions of calculations happening on your machine, right now.

The resource monitor. Be sure to pay attention to LM Studio’s built-in resource display. It will show the following:

RAM usage
GPU usage (if applicable)
Response speed (tokens per second)

What “tokens per second” means: Quick explainer: Roughly 1 token = ¾ of a word. 20 tokens/second = pretty fast. 5 tokens/second = slow but functional.

Common Issues and Quick Fixes

” It’s too slow.”

Try a smaller model (look for 1B or 2B parameter models)
Close other applications
Check if you accidentally downloaded a huge model
You can download and test various models to determine which one works best for your purposes.

” It won’t load.”

Check available RAM
Try restarting LM Studio
Ensure you have enough disk space

” The answers are weird.”

This is normal! Smaller models are less capable than ChatGPT
Try being more specific in your prompts
Remember: this is a 3 billion parameter model vs. ChatGPT’s 100+ billion

” My computer sounds like a jet engine.”

This is expected during model loading and first responses
It should quiet down after the initial processing
You can adjust settings to reduce CPU/GPU usage

What You Just Did

Congratulations, you just downloaded an entire AI model. You hosted it on your own computer. You used this to hold a conversation that never left your machine. Learned what AI actually costs in terms of computing resources

The difference: Every conversation with ChatGPT travels through the internet to OpenAI’s servers. Every conversation you just had stayed within your computer. Your sandbox. Your rules.

Next Steps

Try these experiments:

Turn off your internet and keep chatting (it still works!)
Ask it to write different things (code, poems, explanations)
Notice which tasks it handles well vs. poorly
Check your system resources during different tasks

In the next post: Now that you have a working local AI, let’s talk about choosing better models, measuring actual electricity costs, optimizing performance, and deciding when to use local AI vs. cloud services.

Your personal AI sandbox is now open for business. What will you build first?