User Tools

Site Tools


ai:llm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ai:llm [2026/01/26 02:29] – [1. Overview] phong2018ai:llm [2026/01/26 02:45] (current) – [10. Popular models and which ones can run locally] phong2018
Line 14: Line 14:
   * **1) Learned experience = Parameters / Weights**   * **1) Learned experience = Parameters / Weights**
     * This is the “knowledge compressed into numbers” learned during training (e.g., 7B/8B/70B parameters).     * This is the “knowledge compressed into numbers” learned during training (e.g., 7B/8B/70B parameters).
-    * Not the tokenizer. 
  
   * **2) What it can produce = Vocabulary**   * **2) What it can produce = Vocabulary**
Line 146: Line 145:
  
 ===== 10. Popular models and which ones can run locally ===== ===== 10. Popular models and which ones can run locally =====
-Cloud-only (generally not downloadable): 
-  * GPT (OpenAI) 
-  * Claude (Anthropic) 
-  * Gemini (Google) 
  
-Open-weight/open-source (often runnable locally): +==== Quick mental model fields (apply to every LLM==== 
-  * LLaMA family (Meta) +For each model below, capture
-  Mistral Mixtral +  * **1Learned experience = Parameters / Weights** (e.g., 7B/8B/70B) 
-  * Qwen +  * **2) What it can produce = Vocabulary** (tokenizer + vocab size; fixed per model
-  DeepSeek (especially strong for code variants+  * **3) Temporary memory = Context window** (max tokens visible at once)
-  * Phi (small, efficient)+
  
-Local runtimes on macOS+==== Cloud-only (generally not downloadable) ==== 
-  * **Ollama** +=== GPT (OpenAI) === 
-  * **LM Studio** +  * **Parameters/Weights:** Not publicly disclosed (varies by product/version). 
-  * **llama.cpp**+  * **Vocabulary:** Proprietary tokenizer; vocab size not consistently published. 
 +  * **Context window:** Varies by model tier/version (check product docs). 
 +  * Notes: Strong general reasoning + tool ecosystem. 
 + 
 +=== Claude (Anthropic) === 
 +  * **Parameters/Weights:** Not publicly disclosed. 
 +  * **Vocabulary:** Proprietary tokenizer. 
 +  * **Context window:** Varies by model tier/version (check product docs). 
 +  * Notes: Strong long-form writing and code assistance. 
 + 
 +=== Gemini (Google) === 
 +  * **Parameters/Weights:** Not publicly disclosed. 
 +  * **Vocabulary:** Proprietary tokenizer. 
 +  * **Context window:** Varies by model tier/version (check product docs). 
 +  * Notes: Strong multimodal and large-context options (depending on version). 
 + 
 +==== Open-weight/open-source (often runnable locally) ==== 
 +=== LLaMA family (Meta) === 
 +  * **Parameters/Weights:** Common sizes include 7B/8B/13B/70B (depends on generation). 
 +  * **Vocabulary:** Fixed per LLaMA generation (tokenizer + vocab size depends on version). 
 +  * **Context window:** Varies by generation (older versions smaller; newer may be larger). 
 +  * Local use: Best with quantized GGUF via llama.cpp / Ollama / LM Studio
 + 
 +=== Mistral / Mixtral === 
 +  * **Parameters/Weights:** Mistral commonly 7B-class; Mixtral uses MoE (Mixture-of-Experts) variants. 
 +  * **Vocabulary:** Fixed per model release (tokenizer-specific). 
 +  * **Context window:** Varies by release/version. 
 +  * Local use: Mistral 7B-class is popular for fast local inference. 
 + 
 +=== Qwen === 
 +  * **Parameters/Weights:** Multiple sizes (small -> large; common local picks: ~7B-class). 
 +  * **Vocabulary:** Fixed per Qwen generation (tokenizer-specific). 
 +  * **Context window:** Varies by release/version (some versions support larger contexts). 
 +  * Local use: Often strong multilingual performance. 
 + 
 +=== DeepSeek (especially strong for code variants) === 
 +  * **Parameters/Weights:** Multiple sizes; common local coder models are ~6–7B-class. 
 +  * **Vocabulary:** Fixed per model/tokenizer version. 
 +  * **Context window:** Varies by release/version. 
 +  * Local use: Code-focused variants are widely used for dev tasks. 
 + 
 +=== Phi (small, efficient) === 
 +  * **Parameters/Weights:** Small models (often ~2–4B-class depending on version). 
 +  * **Vocabulary:** Fixed per Phi release/tokenizer. 
 +  * **Context window:** Varies by version. 
 +  * Local use: Great for low-resource devices; fast inference. 
 + 
 +==== Local runtimes on macOS ==== 
 +=== Ollama === 
 +  * Purpose: Simplest local runner (download + run models easily). 
 +  * Works best with: Quantized GGUF models. 
 + 
 +=== LM Studio === 
 +  * Purpose: GUI app to download, run, and chat with local models. 
 +  * Works best with: Quantized GGUF models, easy model management. 
 + 
 +=== llama.cpp === 
 +  Purpose: High-performance local inference engine for GGUF models. 
 +  Works best with: Fine-grained control and optimization on CPU/Metal. 
 + 
 +==== Glossary (hard terms) ==== 
 +  * **parameter** /pəˈræmɪtər/: tham số (trọng số học được) 
 +  * **weight** /weɪt/: trọng số 
 +  * **vocabulary** /vəˈkæbjəˌleri/: từ điển token (tập token có thể sinh) 
 +  * **token** /ˈtoʊkən/: đơn vị nhỏ (mảnh chữ) của văn bản 
 +  * **context window** /ˈkɑːntekst ˈwɪndoʊ/: cửa sổ ngữ cảnh (trí nhớ tạm thời) 
 +  * **proprietary** /prəˈpraɪəˌteri/: độc quyền, không công khai 
 +  * **open-weight** /ˌoʊpən ˈweɪt/: mở trọng số (công bố weights) 
 +  * **quantized** /ˈkwɑːntaɪzd/: đã lượng tử hóa (nén độ chính xác số) 
 +  * **runtime** /ˈrʌnˌtaɪm/: môi trường/chương trình chạy 
 +  * **variant** /ˈveriənt/: biến thể/phiên bản 
 +  * **Mixture-of-Experts (MoE)** /ˈmɪkstʃər əv ˈekspɜːrts/: kiến trúc “nhiều chuyên gia” (chỉ kích hoạt một phần model mỗi bước)
  
 ===== 11. Local model size estimates on Mac ===== ===== 11. Local model size estimates on Mac =====
ai/llm.1769394557.txt.gz · Last modified: by phong2018