Telegram Group Search
Re ✨ Key Features of DeepSeek App:

πŸ” Easy login: E-mail/Google Account/Apple ID
☁️ Cross-platform chat history sync
πŸ” Web search & Deep-Think mode
πŸ“„ File upload & text extraction

🌟 2/3

via Twitter @DeepSeek
Re ⚠️ Important Notice:

βœ… 100% FREE - No ads, no in-app purchases
πŸ›‘οΈ Download only from official channels to avoid being misled
πŸ“² Search "DeepSeek" in your app store or visit our website for direct links

🌟 3/3

via Twitter @DeepSeek
πŸš€ DeepSeek-R1 is here!

⚑ Performance on par with OpenAI-o1
πŸ“– Fully open-source model & technical report
πŸ† MIT licensed: Distill & commercialize freely!

🌐 Website & API are live now! Try DeepThink at http://chat.deepseek.com today!

πŸ‹ 1/n

via Twitter @DeepSeek
Re πŸ› οΈ DeepSeek-R1: Technical Highlights

πŸ“ˆ Large-scale RL in post-training
πŸ† Significant performance boost with minimal labeled data
πŸ”’ Math, code, and reasoning tasks on par with OpenAI-o1
πŸ“„ More details: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

πŸ‹ 4/n

via Twitter @DeepSeek
Re 🌐 API Access & Pricing

βš™οΈ Use DeepSeek-R1 by setting model=deepseek-reasoner
πŸ’° $0.14 / million input tokens (cache hit)
πŸ’° $0.55 / million input tokens (cache miss)
πŸ’° $2.19 / million output tokens

πŸ“– API guide: https://api-docs.deepseek.com/guides/reasoning_model

πŸ‹ 5/n

via Twitter @DeepSeek
To prevent any potential harm, we reiterate that @deepseek_ai is our sole official account on Twitter/X.

Any accounts:
- representing us
- using identical avatars
- using similar names
are impersonations.

Please stay vigilant to avoid being misled!
πŸ“’ Terminology Correction: DeepSeek-R1’s code and models are released under the MIT License.
πŸŽ‰ Excited to see everyone’s enthusiasm for deploying DeepSeek-R1! Here are our recommended settings for the best experience:

β€’ No system prompt
β€’ Temperature: 0.6
β€’ Official prompts for search & file upload: bit.ly/4hyH8np
β€’ Guidelines to mitigate model bypass thinking: bit.ly/4gJrhkF

The official DeepSeek deployment runs the same model as the open-source versionβ€”enjoy the full DeepSeek-R1 experience! πŸš€
πŸš€ Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference!

Core components of NSA:
β€’ Dynamic hierarchical sparse strategy
β€’ Coarse-grained token compression
β€’ Fine-grained token selection

πŸ’‘ With optimized design for modern hardware, NSA speeds up inference while reducing pre-training costsβ€”without compromising performance. It matches or outperforms Full Attention models on general benchmarks, long-context tasks, and instruction-based reasoning.

πŸ“– For more details, check out our paper here: https://arxiv.org/abs/2502.11089
πŸš€ Day 0: Warming up for #OpenSourceWeek!

We're a tiny team @deepseek_ai
exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency.

These humble building blocks in our online service have been documented, deployed and battle-tested in production.

As part of the open-source community, we believe that every line shared becomes collective momentum that accelerates the journey.

Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.
πŸš€ Day 1 of #OpenSourceWeek: FlashMLA

Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.

βœ… BF16 support
βœ… Paged KV cache (block size 64)
⚑️ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800

πŸ”— Explore on GitHub: https://github.com/deepseek-ai/FlashMLA
πŸš€ Day 2 of #OpenSourceWeek: DeepEP

Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.

βœ… Efficient and optimized all-to-all communication
βœ… Both intranode and internode support with NVLink and RDMA
βœ… High-throughput kernels for training and inference prefilling
βœ… Low-latency kernels for inference decoding
βœ… Native FP8 dispatch support
βœ… Flexible GPU resource control for computation-communication overlapping

πŸ”— GitHub: github.com/deepseek-ai/DeepEP
πŸš€ Day 3 of #OpenSourceWeek: DeepGEMM

Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.

⚑️ Up to 1350+ FP8 TFLOPS on Hopper GPUs
βœ… No heavy dependency, as clean as a tutorial
βœ… Fully Just-In-Time compiled
βœ… Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
βœ… Supports dense layout and two MoE layouts

πŸ”— GitHub: https://github.com/deepseek-ai/DeepGEMM
🚨 Off-Peak Discounts Alert!

Starting today, enjoy off-peak discounts on the DeepSeek API Platform from 16:30–00:30 UTC daily:

πŸ”Ή DeepSeek-V3 at 50% off
πŸ”Ή DeepSeek-R1 at a massive 75% off

Maximize your resources smarter β€” save more during these high-value hours!
πŸš€ Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies

βœ… DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
πŸ”— https://github.com/deepseek-ai/DualPipe

βœ… EPLB - an expert-parallel load balancer for V3/R1.
πŸ”— https://github.com/deepseek-ai/eplb

πŸ“Š Analyze computation-communication overlap in V3/R1.
πŸ”— https://github.com/deepseek-ai/profile-data
πŸš€ Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access

Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks.

⚑ 6.6 TiB/s aggregate read throughput in a 180-node cluster
⚑ 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster
⚑ 40+ GiB/s peak throughput per client node for KVCache lookup
🧬 Disaggregated architecture with strong consistency semantics
βœ… Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1

πŸ“₯ 3FS β†’ github.com/deepseek-ai/3FS
β›² Smallpond - data processing framework on 3FS β†’ github.com/deepseek-ai/smallpond
πŸš€ Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview

Optimized throughput and latency via:
πŸ”§ Cross-node EP-powered batch scaling
πŸ”„ Computation-communication overlap
βš–οΈ Load balancing

Statistics of DeepSeek's Online Service:
⚑ 73.7k/14.8k input/output tokens per second per H800 node
πŸš€ Cost profit margin 545%

πŸ’‘ We hope this week's insights offer value to the community and contribute to our shared AGI goals.
πŸ“– Deep Dive: bit.ly/4ihZUiO
πŸš€ DeepSeek-V3-0324 is out now!

πŸ”Ή Major boost in reasoning performance
πŸ”Ή Stronger front-end development skills
πŸ”Ή Smarter tool-use capabilities

βœ… For non-complex reasoning tasks, we recommend using V3 β€” just turn off β€œDeepThink”
πŸ”Œ API usage remains unchanged
πŸ“œ Models are now released under the MIT License, just like DeepSeek-R1!
πŸ”— Open-source weights: https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
2025/05/12 02:40:52
Back to Top
HTML Embed Code: