Can OpenAI’s New o1 Models Transform STEM with Superior Reasoning?

OpenAI has recently unveiled a new family of large language models (LLMs), dubbed “o1,” which aims to deliver superior performance and accuracy in science, technology, engineering, and math (STEM) fields. This launch came as a surprise, as many anticipated the release of either “Strawberry” or GPT-5 instead. The new models, o1-preview and o1-mini, are initially available to ChatGPT Plus users and developers through OpenAI’s paid API, enabling developers to integrate these models into existing third-party applications or create new ones on top of them.

Enhanced Reasoning Capabilities

A key feature of the o1 models is their enhanced “reasoning” capabilities. According to Michelle Pokrass, OpenAI’s API Tech Lead, these models employ a sophisticated reasoning process that involves trying different strategies, recognizing mistakes, and engaging in comprehensive thinking. In tests, o1 models have demonstrated performance on par with PhD students on some of the most challenging benchmarks, particularly excelling in reasoning-related problems.

Current Limitations

The o1 models are currently text-based, meaning they handle text inputs and outputs exclusively and lack the multimodal capabilities of GPT-4o, which can process images and files. They also do not yet support web browsing, restricting their knowledge to data available up to their training cutoff date of October 2023. Additionally, the o1 models are slower than their predecessors, with response times sometimes exceeding a minute.

Early Feedback and Practical Applications

Despite these limitations, early feedback from developers who participated in the alpha testing phase revealed that the o1 models excel in tasks such as coding and drafting legal documents, making them promising candidates for applications that require deep reasoning. However, for applications demanding image inputs, function calling, or faster response times, GPT-4o remains the preferred choice.

Pricing and Access

Pricing for the o1 models varies significantly. The main o1-preview model is the most expensive to date, costing $15 per 1 million input tokens and $60 per 1 million output tokens. Conversely, the o1-mini model is more affordable at $3 per 1 million input tokens and $12 per 1 million output tokens. The new models, capped at 20 requests per minute, are currently accessible to “Tier 5” users—those who have spent at least $1,000 through the API and made payments within the last 30 days. This pricing strategy and rate limit suggest a trial phase where OpenAI will likely adjust pricing based on usage feedback.

Notable Uses During Testing

Among the notable uses of the o1 models during testing include generating comprehensive action plans, white papers, and optimizing organizational workflows. These models have also shown promise in infrastructure design, risk assessment, coding simple programs, filling out requests-for-proposal (RFP) documents, and strategic engagement planning. For instance, some users have employed o1-preview to generate detailed white papers with citations from just a few prompts, balance a city’s power grid, and optimize staff schedules.

Future Opportunities and Challenges

While the o1 models present new opportunities, there are still areas where improvements are necessary. The slower response time and text-only capabilities are significant drawbacks for certain applications. However, the high performance in reasoning tasks makes them valuable for specific use cases, particularly in STEM-related fields.

How to Access the Models

Developers keen on experimenting with OpenAI’s latest offerings can access the o1-preview and o1-mini models through the public API, Microsoft Azure OpenAI Service, Azure AI Studio, and GitHub Models. OpenAI’s continuous development of both the o1 and GPT series ensures that there are numerous options for developers looking to build innovative applications.

In summary, OpenAI’s introduction of the o1 family marks a significant step in the evolution of reasoning-focused LLMs, particularly for STEM applications. While the models have some limitations in speed and input modalities, their advanced reasoning capabilities offer promising avenues for complex problem-solving tasks. As OpenAI continues to refine these models, developers can expect incremental improvements and adjustments in pricing and performance, heralding a new era of AI development.

Explore more

Your CRM Knows More Than Your Buyer Personas

The immense organizational effort poured into developing a new messaging framework often unfolds in a vacuum, completely disconnected from the verbatim customer insights already being collected across multiple internal departments. A marketing team can dedicate an entire quarter to surveys, audits, and strategic workshops, culminating in a set of polished buyer personas. Simultaneously, the customer success team’s internal communication channels

Embedded Finance Transforms SME Banking in Europe

The financial management of a small European business, once a fragmented process of logging into separate banking portals and filling out cumbersome loan applications, is undergoing a quiet but powerful revolution from within the very software used to run daily operations. This integration of financial services directly into non-financial business platforms is no longer a futuristic concept but a widespread

How Does Embedded Finance Reshape Client Wealth?

The financial health of an entrepreneur is often misunderstood, measured not by the promising numbers on a balance sheet but by the agonizingly long days between issuing an invoice and seeing the cash actually arrive in the bank. For countless small- and medium-sized enterprise (SME) owners, this gap represents the most immediate and significant threat to both their business stability

Tech Solves the Achilles Heel of B2B Attribution

A single B2B transaction often begins its life as a winding, intricate journey encompassing hundreds of digital interactions before culminating in a deal, yet for decades, marketing teams have awarded the entire victory to the final click of a mouse. This oversimplification has created a distorted reality where the true drivers of revenue remain invisible, hidden behind a metric that

Is the Modern Frontend Role a Trojan Horse?

The modern frontend developer job posting has quietly become a Trojan horse, smuggling in a full-stack engineer’s responsibilities under a familiar title and a less-than-commensurate salary. What used to be a clearly defined role centered on user interface and client-side logic has expanded at an astonishing pace, absorbing duties that once belonged squarely to backend and DevOps teams. This is