This is the companion page for the Local LLMs workshop run at the College of Staten Island. Everything you need to follow along is on the workshop USB stick. This page covers the same material the deck does, plus the step-by-step USB walkthrough so you can repeat the setup later.Documentation Index
Fetch the complete documentation index at: https://docs.nyc-ai.app/llms.txt
Use this file to discover all available pages before exploring further.
Who is running this
Ethan Castro
Hussam Ali
Why this matters
Six reasons we are running this workshop on this campus, in this room, in this decade.| Lever | Why it matters |
|---|---|
| Empire AI | $500M+ committed by New York State. CUNY is one of seven founding institutions. This is happening with or without us. |
| CUNY HPCC | Literally in this building. Any CUNY undergrad doing research can request an account — it is also the on-ramp to Empire AI compute. |
| Career | Applied AI engineer median total comp is 250-400K. |
| Data privacy | Every regulated industry — banking, healthcare, law, education — is converging on the same conclusion: data can’t leave the perimeter. |
| NYC | The Bay Area treats AI like a regional industry. NYC has the talent, the schools, and the capital. |
| CUNY | CUNY is the largest urban public university in the country. The next wave of AI builders should look like the city they come from. |
Cloud LLMs vs. local LLMs
Cloud LLMs
Local LLMs
Benefits of running models locally
Privacy — your data stays with you
Privacy — your data stays with you
Offline access — AI without internet
Offline access — AI without internet
Lower long-term cost
Lower long-term cost
Environmental awareness
Environmental awareness
Customization
Customization
Open source vs. closed source
Open source / open weight
- Model weights can be downloaded.
- Can run locally on your own device or server.
- More privacy and control.
- Community can test, fine-tune, and build on top of it.
- Examples: Llama, Qwen, DeepSeek, Gemma, Mistral.
Closed source / proprietary
- Accessed through an app or API only.
- Weights are not public.
- Easier to use; less control.
- Data handling and cost depend on the provider.
- Examples: ChatGPT, Claude, Gemini.
Big models vs. small models
Large (30B – 2T+ parameters)
- Better general reasoning.
- Handles more complex tasks.
- Usually needs cloud GPUs or big servers.
- More expensive to run.
- Good for: coding, research, planning, agents.
Small (under 20B parameters)
- Faster and cheaper to run.
- Runs on laptops, phones, or modest local servers.
- Better when focused on one task.
- Easier to customize or fine-tune.
- Good for: tutoring, classification, privacy-sensitive tasks, simple assistants.
The current landscape
Two snapshots from Artificial Analysis Intelligence Index v4.0:- Leading models by country. The current frontier is split between the United States (Anthropic, OpenAI, Google, Meta) and China (Kimi, MiMo, Qwen, DeepSeek, GLM, MiniMax), with single entries from France (Muse Spark), South Korea, and the UAE.
- Open weights vs. proprietary. Many of the frontier-quality models scoring in the 50-57 range are now open weight — Kimi K2.6, MiMo, Qwen 3.6, DeepSeek V4 Pro, GLM 5.1, MiniMax M2.7. The cost-of-entry to a strong local model has dropped dramatically.
Step-by-step setup
You need two free desktop apps and two open-weight models. Follow these five steps in order. If you’re at the workshop, the USB has everything preloaded — see the USB shortcut below.Download and install Ollama
Download and install AnythingLLM
Pull the two workshop models
gemma4:e2b is the main workshop model. qwen3.5:0.8b is the smaller alternate. Both will download from ollama.com/library in the background — total around 4-5 GB.Open AnythingLLM and connect to Ollama
- LLM provider: Ollama
- Model:
gemma4:e2b - Click through the remaining onboarding screens (workspace name, telemetry choice, etc.).
Workshop-day USB shortcut
If you’re in the room with us, you don’t need to download anything — the USB stick has all of the above preloaded. Just plug it in and run one file:- macOS
- Windows
Open the USB and run START-MAC.command
START-MAC.command.If macOS blocks it: right-click → Open → confirm Open in the dialog.Install AnythingLLM and Ollama
installers/ will open. Drag each app icon into Applications.Switch back to the Terminal window and press Enter between installs.Wait for the model to extract
models.tar into Ollama’s model store. AnythingLLM launches automatically when it’s done.What is on the USB
| File / folder | What it does |
|---|---|
0-START-HERE.txt | Plain-English fallback instructions. Open this first if anything is unclear. |
START-MAC.command | One file Mac users double-click — opens installers, extracts models, starts Ollama, launches AnythingLLM. |
START-WINDOWS.bat | One file Windows users double-click — runs the PowerShell setup with the right execution policy. |
installers/ | Offline installers for AnythingLLM (anythingllm.com/desktop) and Ollama (ollama.com/download) so the room doesn’t fight WiFi. |
models.tar | Preloaded Ollama model store — gemma4:e2b and qwen3.5:0.8b so you don’t pull from the internet. |
References to Search Through/ | Markdown files for the agent to search — gives you something real to investigate immediately. |
WORKSHOP-GUIDE.html | The visual overview of everything on the stick. Open in any browser. |
Models on the USB
gemma4:e2b
- Quantization: Q4_K_M
- Loaded into RAM on demand
- The default choice for all workshop activities
qwen3.5:0.8b
- 0.8B parameters — runs comfortably on modest laptops
- Loaded only when selected
- Try it after you’ve used Gemma so the difference is obvious
The agent activity — search local files
The headline hands-on moment of the workshop. Instead of explaining vector databases first, you point the agent at a folder and watch it search.Open AnythingLLM with Gemma loaded
Open the prompt sheet
References to Search Through/FILE-SEARCH-AGENT-PROMPTS.md — this has copy-paste prompts you can run.Point the agent at the references folder
PATH_TO_THIS_FOLDER with the actual path on your machine, then send:What’s in the references folder
HPCC questions
Empire AI questions
Model questions
Live demo tasks
Two prompts we run live with the audience to show off multimodal behavior on a tiny local model:Task 1
Task 2
Compatibility
| Device | Status | Note |
|---|---|---|
| Apple Silicon Mac (M1+) | Ready | Uses the bundled Apple Silicon AnythingLLM and Ollama installers. |
| Windows 10 / 11 x64 | Ready | Uses the bundled Windows installers and setup script. |
| Intel Mac | Partial | AnythingLLM Intel build is on the stick. The current Ollama DMG is Apple Silicon only, so Intel Mac users may need Ollama already installed. |
| All devices | Disk space | Use an exFAT-formatted USB, and leave at least 12-15 GB free on the laptop for model extraction. |
After the workshop
If you keep using local models, the natural next steps are:- Run a larger model if your laptop has the RAM. Try
gemma3:12borqwen3:14bfromollama pull. - Move to a real GPU via the CSI HPCC or, for bigger work, Empire AI.
- Wire local models into your IDE — Continue, Cursor, and Aider all support an Ollama endpoint.
- Read the open-weight model cards on Hugging Face before downloading new models — licenses vary widely.