
In a significant breakthrough for artificial intelligence, researchers from DeepSeek and Tsinghua University have advanced the field of AI reasoning with their innovative GRM-SPCT technique. This new methodology, referred to as DeepSeek-GRM, merges generative reward modeling (GRM) with Self-Principled Critique Tuning (SPCT) to enhance the reasoning capabilities of large language models (LLMs). The primary goal of this innovation is to










