Re β¨ Key Features of DeepSeek App:
π Easy login: E-mail/Google Account/Apple ID
βοΈ Cross-platform chat history sync
π Web search & Deep-Think mode
π File upload & text extraction
π 2/3
via Twitter @DeepSeek
π Easy login: E-mail/Google Account/Apple ID
βοΈ Cross-platform chat history sync
π Web search & Deep-Think mode
π File upload & text extraction
π 2/3
via Twitter @DeepSeek
Re β οΈ Important Notice:
β 100% FREE - No ads, no in-app purchases
π‘οΈ Download only from official channels to avoid being misled
π² Search "DeepSeek" in your app store or visit our website for direct links
π 3/3
via Twitter @DeepSeek
β 100% FREE - No ads, no in-app purchases
π‘οΈ Download only from official channels to avoid being misled
π² Search "DeepSeek" in your app store or visit our website for direct links
π 3/3
via Twitter @DeepSeek
π DeepSeek-R1 is here!
β‘ Performance on par with OpenAI-o1
π Fully open-source model & technical report
π MIT licensed: Distill & commercialize freely!
π Website & API are live now! Try DeepThink at http://chat.deepseek.com today!
π 1/n
via Twitter @DeepSeek
β‘ Performance on par with OpenAI-o1
π Fully open-source model & technical report
π MIT licensed: Distill & commercialize freely!
π Website & API are live now! Try DeepThink at http://chat.deepseek.com today!
π 1/n
via Twitter @DeepSeek
Re π οΈ DeepSeek-R1: Technical Highlights
π Large-scale RL in post-training
π Significant performance boost with minimal labeled data
π’ Math, code, and reasoning tasks on par with OpenAI-o1
π More details: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
π 4/n
via Twitter @DeepSeek
π Large-scale RL in post-training
π Significant performance boost with minimal labeled data
π’ Math, code, and reasoning tasks on par with OpenAI-o1
π More details: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
π 4/n
via Twitter @DeepSeek
Re π API Access & Pricing
βοΈ Use DeepSeek-R1 by setting model=deepseek-reasoner
π° $0.14 / million input tokens (cache hit)
π° $0.55 / million input tokens (cache miss)
π° $2.19 / million output tokens
π API guide: https://api-docs.deepseek.com/guides/reasoning_model
π 5/n
via Twitter @DeepSeek
βοΈ Use DeepSeek-R1 by setting model=deepseek-reasoner
π° $0.14 / million input tokens (cache hit)
π° $0.55 / million input tokens (cache miss)
π° $2.19 / million output tokens
π API guide: https://api-docs.deepseek.com/guides/reasoning_model
π 5/n
via Twitter @DeepSeek
To prevent any potential harm, we reiterate that @deepseek_ai is our sole official account on Twitter/X.
Any accounts:
- representing us
- using identical avatars
- using similar names
are impersonations.
Please stay vigilant to avoid being misled!
Any accounts:
- representing us
- using identical avatars
- using similar names
are impersonations.
Please stay vigilant to avoid being misled!
X (formerly Twitter)
DeepSeek (@deepseek_ai) on X
Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.
π’ Terminology Correction: DeepSeek-R1βs code and models are released under the MIT License.
π Excited to see everyoneβs enthusiasm for deploying DeepSeek-R1! Here are our recommended settings for the best experience:
β’ No system prompt
β’ Temperature: 0.6
β’ Official prompts for search & file upload: bit.ly/4hyH8np
β’ Guidelines to mitigate model bypass thinking: bit.ly/4gJrhkF
The official DeepSeek deployment runs the same model as the open-source versionβenjoy the full DeepSeek-R1 experience! π
β’ No system prompt
β’ Temperature: 0.6
β’ Official prompts for search & file upload: bit.ly/4hyH8np
β’ Guidelines to mitigate model bypass thinking: bit.ly/4gJrhkF
The official DeepSeek deployment runs the same model as the open-source versionβenjoy the full DeepSeek-R1 experience! π
π Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference!
Core components of NSA:
β’ Dynamic hierarchical sparse strategy
β’ Coarse-grained token compression
β’ Fine-grained token selection
π‘ With optimized design for modern hardware, NSA speeds up inference while reducing pre-training costsβwithout compromising performance. It matches or outperforms Full Attention models on general benchmarks, long-context tasks, and instruction-based reasoning.
π For more details, check out our paper here: https://arxiv.org/abs/2502.11089
Core components of NSA:
β’ Dynamic hierarchical sparse strategy
β’ Coarse-grained token compression
β’ Fine-grained token selection
π‘ With optimized design for modern hardware, NSA speeds up inference while reducing pre-training costsβwithout compromising performance. It matches or outperforms Full Attention models on general benchmarks, long-context tasks, and instruction-based reasoning.
π For more details, check out our paper here: https://arxiv.org/abs/2502.11089
arXiv.org
Native Sparse Attention: Hardware-Aligned and Natively Trainable...
Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention...
π Day 0: Warming up for #OpenSourceWeek!
We're a tiny team @deepseek_ai
exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency.
These humble building blocks in our online service have been documented, deployed and battle-tested in production.
As part of the open-source community, we believe that every line shared becomes collective momentum that accelerates the journey.
Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.
We're a tiny team @deepseek_ai
exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency.
These humble building blocks in our online service have been documented, deployed and battle-tested in production.
As part of the open-source community, we believe that every line shared becomes collective momentum that accelerates the journey.
Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.
π Day 1 of #OpenSourceWeek: FlashMLA
Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.
β BF16 support
β Paged KV cache (block size 64)
β‘οΈ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800
π Explore on GitHub: https://github.com/deepseek-ai/FlashMLA
Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.
β BF16 support
β Paged KV cache (block size 64)
β‘οΈ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800
π Explore on GitHub: https://github.com/deepseek-ai/FlashMLA
π Day 2 of #OpenSourceWeek: DeepEP
Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.
β Efficient and optimized all-to-all communication
β Both intranode and internode support with NVLink and RDMA
β High-throughput kernels for training and inference prefilling
β Low-latency kernels for inference decoding
β Native FP8 dispatch support
β Flexible GPU resource control for computation-communication overlapping
π GitHub: github.com/deepseek-ai/DeepEP
Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.
β Efficient and optimized all-to-all communication
β Both intranode and internode support with NVLink and RDMA
β High-throughput kernels for training and inference prefilling
β Low-latency kernels for inference decoding
β Native FP8 dispatch support
β Flexible GPU resource control for computation-communication overlapping
π GitHub: github.com/deepseek-ai/DeepEP
π Day 3 of #OpenSourceWeek: DeepGEMM
Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.
β‘οΈ Up to 1350+ FP8 TFLOPS on Hopper GPUs
β No heavy dependency, as clean as a tutorial
β Fully Just-In-Time compiled
β Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
β Supports dense layout and two MoE layouts
π GitHub: https://github.com/deepseek-ai/DeepGEMM
Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.
β‘οΈ Up to 1350+ FP8 TFLOPS on Hopper GPUs
β No heavy dependency, as clean as a tutorial
β Fully Just-In-Time compiled
β Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
β Supports dense layout and two MoE layouts
π GitHub: https://github.com/deepseek-ai/DeepGEMM
π Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies
β DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
π https://github.com/deepseek-ai/DualPipe
β EPLB - an expert-parallel load balancer for V3/R1.
π https://github.com/deepseek-ai/eplb
π Analyze computation-communication overlap in V3/R1.
π https://github.com/deepseek-ai/profile-data
β DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
π https://github.com/deepseek-ai/DualPipe
β EPLB - an expert-parallel load balancer for V3/R1.
π https://github.com/deepseek-ai/eplb
π Analyze computation-communication overlap in V3/R1.
π https://github.com/deepseek-ai/profile-data
π Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access
Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks.
β‘ 6.6 TiB/s aggregate read throughput in a 180-node cluster
β‘ 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster
β‘ 40+ GiB/s peak throughput per client node for KVCache lookup
𧬠Disaggregated architecture with strong consistency semantics
β Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1
π₯ 3FS β github.com/deepseek-ai/3FS
β² Smallpond - data processing framework on 3FS β github.com/deepseek-ai/smallpond
Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks.
β‘ 6.6 TiB/s aggregate read throughput in a 180-node cluster
β‘ 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster
β‘ 40+ GiB/s peak throughput per client node for KVCache lookup
𧬠Disaggregated architecture with strong consistency semantics
β Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1
π₯ 3FS β github.com/deepseek-ai/3FS
β² Smallpond - data processing framework on 3FS β github.com/deepseek-ai/smallpond
π Day 6 of #OpenSourceWeek: One More Thing β DeepSeek-V3/R1 Inference System Overview
Optimized throughput and latency via:
π§ Cross-node EP-powered batch scaling
π Computation-communication overlap
βοΈ Load balancing
Statistics of DeepSeek's Online Service:
β‘ 73.7k/14.8k input/output tokens per second per H800 node
π Cost profit margin 545%
π‘ We hope this week's insights offer value to the community and contribute to our shared AGI goals.
π Deep Dive: bit.ly/4ihZUiO
Optimized throughput and latency via:
π§ Cross-node EP-powered batch scaling
π Computation-communication overlap
βοΈ Load balancing
Statistics of DeepSeek's Online Service:
β‘ 73.7k/14.8k input/output tokens per second per H800 node
π Cost profit margin 545%
π‘ We hope this week's insights offer value to the community and contribute to our shared AGI goals.
π Deep Dive: bit.ly/4ihZUiO
π DeepSeek-V3-0324 is out now!
πΉ Major boost in reasoning performance
πΉ Stronger front-end development skills
πΉ Smarter tool-use capabilities
β For non-complex reasoning tasks, we recommend using V3 β just turn off βDeepThinkβ
π API usage remains unchanged
π Models are now released under the MIT License, just like DeepSeek-R1!
π Open-source weights: https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
πΉ Major boost in reasoning performance
πΉ Stronger front-end development skills
πΉ Smarter tool-use capabilities
β For non-complex reasoning tasks, we recommend using V3 β just turn off βDeepThinkβ
π API usage remains unchanged
π Models are now released under the MIT License, just like DeepSeek-R1!
π Open-source weights: https://huggingface.co/deepseek-ai/DeepSeek-V3-0324