ai:llm
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| ai:llm [2026/01/26 02:01] – created phong2018 | ai:llm [2026/01/26 02:45] (current) – [10. Popular models and which ones can run locally] phong2018 | ||
|---|---|---|---|
| Line 9: | Line 9: | ||
| * **Agent**: software that uses LLM + tools + feedback loop to complete tasks | * **Agent**: software that uses LLM + tools + feedback loop to complete tasks | ||
| + | ===== LLM in one line ===== | ||
| + | A practical mental model: | ||
| + | |||
| + | * **1) Learned experience = Parameters / Weights** | ||
| + | * This is the “knowledge compressed into numbers” learned during training (e.g., 7B/8B/70B parameters). | ||
| + | |||
| + | * **2) What it can produce = Vocabulary** | ||
| + | * The fixed set of tokens the model can output. | ||
| + | |||
| + | * **3) Temporary memory = Context window** | ||
| + | * The maximum number of tokens the model can see at once (prompt + history + retrieved docs + output). | ||
| + | |||
| + | **Note:** The **Tokenizer** is a separate component that converts text into token IDs (it is not the model’s learned experience). | ||
| + | |||
| + | ===== Glossary quick notes ===== | ||
| + | * **parameter** / | ||
| + | * **weight** /weɪt/: trọng số (một kiểu parameter) | ||
| + | * **tokenizer** / | ||
| + | * **vocabulary** / | ||
| + | * **context window** / | ||
| ===== 2. Does an LLM “understand” or only generate probabilistic text? ===== | ===== 2. Does an LLM “understand” or only generate probabilistic text? ===== | ||
| Technically, | Technically, | ||
| Line 125: | Line 145: | ||
| ===== 10. Popular models and which ones can run locally ===== | ===== 10. Popular models and which ones can run locally ===== | ||
| - | Cloud-only (generally not downloadable): | ||
| - | * GPT (OpenAI) | ||
| - | * Claude (Anthropic) | ||
| - | * Gemini (Google) | ||
| - | Open-weight/ | + | ==== Quick mental model fields (apply to every LLM) ==== |
| - | * LLaMA family | + | For each model below, capture: |
| - | * Mistral / Mixtral | + | * **1) Learned experience = Parameters / Weights** (e.g., 7B/ |
| - | * Qwen | + | * **2) What it can produce = Vocabulary** (tokenizer + vocab size; fixed per model) |
| - | * DeepSeek (especially strong for code variants) | + | * **3) Temporary memory = Context window** (max tokens visible at once) |
| - | * Phi (small, efficient) | + | |
| + | ==== Cloud-only (generally not downloadable) ==== | ||
| + | === GPT (OpenAI) === | ||
| + | * **Parameters/ | ||
| + | * **Vocabulary: | ||
| + | * **Context window:** Varies by model tier/ | ||
| + | * Notes: Strong general reasoning + tool ecosystem. | ||
| + | |||
| + | === Claude (Anthropic) === | ||
| + | * **Parameters/ | ||
| + | * **Vocabulary: | ||
| + | * **Context window:** Varies by model tier/ | ||
| + | * Notes: Strong long-form writing and code assistance. | ||
| + | |||
| + | === Gemini (Google) === | ||
| + | * **Parameters/ | ||
| + | * **Vocabulary: | ||
| + | * **Context window:** Varies by model tier/ | ||
| + | * Notes: Strong multimodal and large-context options (depending on version). | ||
| + | |||
| + | ==== Open-weight/ | ||
| + | === LLaMA family (Meta) === | ||
| + | * **Parameters/ | ||
| + | * **Vocabulary: | ||
| + | * **Context window:** Varies by generation (older versions smaller; newer may be larger). | ||
| + | * Local use: Best with quantized GGUF via llama.cpp / Ollama / LM Studio. | ||
| + | |||
| + | === Mistral / Mixtral | ||
| + | * **Parameters/ | ||
| + | * **Vocabulary: | ||
| + | * **Context window:** Varies by release/ | ||
| + | * Local use: Mistral 7B-class is popular for fast local inference. | ||
| + | |||
| + | === Qwen === | ||
| + | * **Parameters/ | ||
| + | * **Vocabulary: | ||
| + | * **Context window:** Varies by release/ | ||
| + | * Local use: Often strong multilingual performance. | ||
| + | |||
| + | === DeepSeek (especially strong for code variants) | ||
| + | * **Parameters/ | ||
| + | * **Vocabulary: | ||
| + | * **Context window:** Varies by release/ | ||
| + | * Local use: Code-focused variants are widely used for dev tasks. | ||
| + | |||
| + | === Phi (small, efficient) | ||
| + | * **Parameters/ | ||
| + | * **Vocabulary: | ||
| + | * **Context window:** Varies by version. | ||
| + | * Local use: Great for low-resource devices; fast inference. | ||
| + | |||
| + | ==== Local runtimes on macOS ==== | ||
| + | === Ollama === | ||
| + | * Purpose: Simplest local runner (download + run models easily). | ||
| + | * Works best with: Quantized GGUF models. | ||
| + | |||
| + | === LM Studio === | ||
| + | * Purpose: GUI app to download, run, and chat with local models. | ||
| + | * Works best with: Quantized GGUF models, easy model management. | ||
| + | |||
| + | === llama.cpp === | ||
| + | * Purpose: High-performance local inference engine for GGUF models. | ||
| + | * Works best with: Fine-grained control and optimization on CPU/Metal. | ||
| - | Local runtimes on macOS: | + | ==== Glossary (hard terms) ==== |
| - | * **Ollama** | + | * **parameter** / |
| - | * **LM Studio** | + | * **weight** /weɪt/: trọng số |
| - | * **llama.cpp** | + | * **vocabulary** / |
| + | * **token** / | ||
| + | * **context window** / | ||
| + | * **proprietary** / | ||
| + | * **open-weight** /ˌoʊpən ˈweɪt/: mở trọng số (công bố weights) | ||
| + | * **quantized** / | ||
| + | * **runtime** / | ||
| + | * **variant** / | ||
| + | * **Mixture-of-Experts (MoE)** / | ||
| ===== 11. Local model size estimates on Mac ===== | ===== 11. Local model size estimates on Mac ===== | ||
ai/llm.1769392869.txt.gz · Last modified: by phong2018
