Can OpenAI’s New o1 Models Transform STEM with Superior Reasoning?

OpenAI has recently unveiled a new family of large language models (LLMs), dubbed “o1,” which aims to deliver superior performance and accuracy in science, technology, engineering, and math (STEM) fields. This launch came as a surprise, as many anticipated the release of either “Strawberry” or GPT-5 instead. The new models, o1-preview and o1-mini, are initially available to ChatGPT Plus users and developers through OpenAI’s paid API, enabling developers to integrate these models into existing third-party applications or create new ones on top of them.

Enhanced Reasoning Capabilities

A key feature of the o1 models is their enhanced “reasoning” capabilities. According to Michelle Pokrass, OpenAI’s API Tech Lead, these models employ a sophisticated reasoning process that involves trying different strategies, recognizing mistakes, and engaging in comprehensive thinking. In tests, o1 models have demonstrated performance on par with PhD students on some of the most challenging benchmarks, particularly excelling in reasoning-related problems.

Current Limitations

The o1 models are currently text-based, meaning they handle text inputs and outputs exclusively and lack the multimodal capabilities of GPT-4o, which can process images and files. They also do not yet support web browsing, restricting their knowledge to data available up to their training cutoff date of October 2023. Additionally, the o1 models are slower than their predecessors, with response times sometimes exceeding a minute.

Early Feedback and Practical Applications

Despite these limitations, early feedback from developers who participated in the alpha testing phase revealed that the o1 models excel in tasks such as coding and drafting legal documents, making them promising candidates for applications that require deep reasoning. However, for applications demanding image inputs, function calling, or faster response times, GPT-4o remains the preferred choice.

Pricing and Access

Pricing for the o1 models varies significantly. The main o1-preview model is the most expensive to date, costing $15 per 1 million input tokens and $60 per 1 million output tokens. Conversely, the o1-mini model is more affordable at $3 per 1 million input tokens and $12 per 1 million output tokens. The new models, capped at 20 requests per minute, are currently accessible to “Tier 5” users—those who have spent at least $1,000 through the API and made payments within the last 30 days. This pricing strategy and rate limit suggest a trial phase where OpenAI will likely adjust pricing based on usage feedback.

Notable Uses During Testing

Among the notable uses of the o1 models during testing include generating comprehensive action plans, white papers, and optimizing organizational workflows. These models have also shown promise in infrastructure design, risk assessment, coding simple programs, filling out requests-for-proposal (RFP) documents, and strategic engagement planning. For instance, some users have employed o1-preview to generate detailed white papers with citations from just a few prompts, balance a city’s power grid, and optimize staff schedules.

Future Opportunities and Challenges

While the o1 models present new opportunities, there are still areas where improvements are necessary. The slower response time and text-only capabilities are significant drawbacks for certain applications. However, the high performance in reasoning tasks makes them valuable for specific use cases, particularly in STEM-related fields.

How to Access the Models

Developers keen on experimenting with OpenAI’s latest offerings can access the o1-preview and o1-mini models through the public API, Microsoft Azure OpenAI Service, Azure AI Studio, and GitHub Models. OpenAI’s continuous development of both the o1 and GPT series ensures that there are numerous options for developers looking to build innovative applications.

In summary, OpenAI’s introduction of the o1 family marks a significant step in the evolution of reasoning-focused LLMs, particularly for STEM applications. While the models have some limitations in speed and input modalities, their advanced reasoning capabilities offer promising avenues for complex problem-solving tasks. As OpenAI continues to refine these models, developers can expect incremental improvements and adjustments in pricing and performance, heralding a new era of AI development.

Explore more

Can Federal Lands Power the Future of AI Infrastructure?

I’m thrilled to sit down with Dominic Jainy, an esteemed IT professional whose deep knowledge of artificial intelligence, machine learning, and blockchain offers a unique perspective on the intersection of technology and federal policy. Today, we’re diving into the US Department of Energy’s ambitious plan to develop a data center at the Savannah River Site in South Carolina. Our conversation

Can Your Mouse Secretly Eavesdrop on Conversations?

In an age where technology permeates every aspect of daily life, the notion that a seemingly harmless device like a computer mouse could pose a privacy threat is startling, raising urgent questions about the security of modern hardware. Picture a high-end optical mouse, designed for precision in gaming or design work, sitting quietly on a desk. What if this device,

Building the Case for EDI in Dynamics 365 Efficiency

In today’s fast-paced business environment, organizations leveraging Microsoft Dynamics 365 Finance & Supply Chain Management (F&SCM) are increasingly faced with the challenge of optimizing their operations to stay competitive, especially when manual processes slow down critical workflows like order processing and invoicing, which can severely impact efficiency. The inefficiencies stemming from outdated methods not only drain resources but also risk

Structured Data Boosts AI Snippets and Search Visibility

In the fast-paced digital arena where search engines are increasingly powered by artificial intelligence, standing out amidst the vast online content is a formidable challenge for any website. AI-driven systems like ChatGPT, Perplexity, and Google AI Mode are redefining how information is retrieved and presented to users, moving beyond traditional keyword searches to dynamic, conversational summaries. At the heart of

How Is Oracle Boosting Cloud Power with AMD and Nvidia?

In an era where artificial intelligence is reshaping industries at an unprecedented pace, the demand for robust cloud infrastructure has never been more critical, and Oracle is stepping up to meet this challenge head-on with strategic alliances that promise to redefine its position in the market. As enterprises increasingly rely on AI-driven solutions for everything from data analytics to generative