Differences

This shows you the differences between two versions of the page.

--- ai:llm [2026/01/26 02:01] – created phong2018
+++ ai:llm [2026/01/26 02:45] (current) – [10. Popular models and which ones can run locally] phong2018
@@ Line 9: / Line 9: @@
   * **Agent**: software that uses LLM + tools + feedback loop to complete tasks
+===== LLM in one line =====
+A practical mental model:
+  * **1) Learned experience = Parameters / Weights**
+    * This is the “knowledge compressed into numbers” learned during training (e.g., 7B/8B/70B parameters).
+  * **2) What it can produce = Vocabulary**
+    * The fixed set of tokens the model can output.
+  * **3) Temporary memory = Context window**
+    * The maximum number of tokens the model can see at once (prompt + history + retrieved docs + output).
+**Note:** The **Tokenizer** is a separate component that converts text into token IDs (it is not the model’s learned experience).
+===== Glossary quick notes =====
+  * **parameter** /pəˈræmɪtər/: tham số (trọng số học được)
+  * **weight** /weɪt/: trọng số (một kiểu parameter)
+  * **tokenizer** /ˈtoʊkəˌnaɪzər/: bộ tách & mã hóa text thành token
+  * **vocabulary** /vəˈkæbjəˌleri/: từ điển token
+  * **context window** /ˈkɑːntekst ˈwɪndoʊ/: cửa sổ ngữ cảnh (trí nhớ ngắn hạn)
 ===== 2. Does an LLM “understand” or only generate probabilistic text? =====
 Technically, an LLM is a **probabilistic sequence model**: it estimates a distribution over the next token given previous tokens.
@@ Line 125: / Line 145: @@
 ===== 10. Popular models and which ones can run locally =====
-Cloud-only (generally not downloadable):
-  * GPT (OpenAI)
-  * Claude (Anthropic)
-  * Gemini (Google)
-Open-weight/open-source (often runnable locally):
+==== Quick mental model fields (apply to every LLM) ====
-  * LLaMA family (Meta)
+For each model below, capture:
-  * Mistral / Mixtral
+  * **1) Learned experience = Parameters / Weights** (e.g., 7B/8B/70B)
-  * Qwen
+  * **2) What it can produce = Vocabulary** (tokenizer + vocab size; fixed per model)
-  * DeepSeek (especially strong for code variants)
+  * **3) Temporary memory = Context window** (max tokens visible at once)
-  * Phi (small, efficient)
+==== Cloud-only (generally not downloadable) ====
+=== GPT (OpenAI) ===
+  * **Parameters/Weights:** Not publicly disclosed (varies by product/version).
+  * **Vocabulary:** Proprietary tokenizer; vocab size not consistently published.
+  * **Context window:** Varies by model tier/version (check product docs).
+  * Notes: Strong general reasoning + tool ecosystem.
+=== Claude (Anthropic) ===
+  * **Parameters/Weights:** Not publicly disclosed.
+  * **Vocabulary:** Proprietary tokenizer.
+  * **Context window:** Varies by model tier/version (check product docs).
+  * Notes: Strong long-form writing and code assistance.
+=== Gemini (Google) ===
+  * **Parameters/Weights:** Not publicly disclosed.
+  * **Vocabulary:** Proprietary tokenizer.
+  * **Context window:** Varies by model tier/version (check product docs).
+  * Notes: Strong multimodal and large-context options (depending on version).
+==== Open-weight/open-source (often runnable locally) ====
+=== LLaMA family (Meta) ===
+  * **Parameters/Weights:** Common sizes include 7B/8B/13B/70B (depends on generation).
+  * **Vocabulary:** Fixed per LLaMA generation (tokenizer + vocab size depends on version).
+  * **Context window:** Varies by generation (older versions smaller; newer may be larger).
+  * Local use: Best with quantized GGUF via llama.cpp / Ollama / LM Studio.
+=== Mistral / Mixtral ===
+  * **Parameters/Weights:** Mistral commonly 7B-class; Mixtral uses MoE (Mixture-of-Experts) variants.
+  * **Vocabulary:** Fixed per model release (tokenizer-specific).
+  * **Context window:** Varies by release/version.
+  * Local use: Mistral 7B-class is popular for fast local inference.
+=== Qwen ===
+  * **Parameters/Weights:** Multiple sizes (small -> large; common local picks: ~7B-class).
+  * **Vocabulary:** Fixed per Qwen generation (tokenizer-specific).
+  * **Context window:** Varies by release/version (some versions support larger contexts).
+  * Local use: Often strong multilingual performance.
+=== DeepSeek (especially strong for code variants) ===
+  * **Parameters/Weights:** Multiple sizes; common local coder models are ~6–7B-class.
+  * **Vocabulary:** Fixed per model/tokenizer version.
+  * **Context window:** Varies by release/version.
+  * Local use: Code-focused variants are widely used for dev tasks.
+=== Phi (small, efficient) ===
+  * **Parameters/Weights:** Small models (often ~2–4B-class depending on version).
+  * **Vocabulary:** Fixed per Phi release/tokenizer.
+  * **Context window:** Varies by version.
+  * Local use: Great for low-resource devices; fast inference.
+==== Local runtimes on macOS ====
+=== Ollama ===
+  * Purpose: Simplest local runner (download + run models easily).
+  * Works best with: Quantized GGUF models.
+=== LM Studio ===
+  * Purpose: GUI app to download, run, and chat with local models.
+  * Works best with: Quantized GGUF models, easy model management.
+=== llama.cpp ===
+  * Purpose: High-performance local inference engine for GGUF models.
+  * Works best with: Fine-grained control and optimization on CPU/Metal.
-Local runtimes on macOS:
+==== Glossary (hard terms) ====
-  * **Ollama**
+  * **parameter** /pəˈræmɪtər/: tham số (trọng số học được)
-  * **LM Studio**
+  * **weight** /weɪt/: trọng số
-  * **llama.cpp**
+  * **vocabulary** /vəˈkæbjəˌleri/: từ điển token (tập token có thể sinh)
+  * **token** /ˈtoʊkən/: đơn vị nhỏ (mảnh chữ) của văn bản
+  * **context window** /ˈkɑːntekst ˈwɪndoʊ/: cửa sổ ngữ cảnh (trí nhớ tạm thời)
+  * **proprietary** /prəˈpraɪəˌteri/: độc quyền, không công khai
+  * **open-weight** /ˌoʊpən ˈweɪt/: mở trọng số (công bố weights)
+  * **quantized** /ˈkwɑːntaɪzd/: đã lượng tử hóa (nén độ chính xác số)
+  * **runtime** /ˈrʌnˌtaɪm/: môi trường/chương trình chạy
+  * **variant** /ˈveriənt/: biến thể/phiên bản
+  * **Mixture-of-Experts (MoE)** /ˈmɪkstʃər əv ˈekspɜːrts/: kiến trúc “nhiều chuyên gia” (chỉ kích hoạt một phần model mỗi bước)
 ===== 11. Local model size estimates on Mac =====