CALL
  • Business solutions
  • First Impressions of OpenAI o1: How Multi-step Reasoning is a Game Changer

First Impressions of OpenAI o1: How Multi-step Reasoning is a Game Changer

OpenAI unveiled its new o1 models on Thursday, giving ChatGPT users a chance to evaluate the AI that pauses to think before responding. These models, codenamed ‘Strawberry,’ have generated a lot of interest. But is the model worth the hype?

 

Perhaps, but with caveats.

 

Compared to the GPT-4o, the o1 models show both improvements and drawbacks. The o1 handles reasoning and complex questions better, but it costs four times as much to use as the GPT-4o. Moreover, the latter model lacks the multimodality, tools, and speed that were the hallmarks of GPT-4o. Even OpenAI itself admits that GPT-4o is still the preferred choice for most queries, noting that the o1 models are harder to handle for simpler tasks.

 

‘It's progress, but not revolutionary,’ argues Ravid Schwartz Ziv, a professor at New York University who studies AI. ‘There are improvements in some aspects, but not all.’

 

Because o1 is designed to solve complex issues, it is not appropriate to use it for everyday tasks. However, for deeper and more meaningful queries, the model can be a valuable aid.

 

OpenAI o1 reasoning on large questions is different in that it breaks down complex problems into discrete steps and evaluates the correctness of each step. While multi-task reasoning is not a new idea, it is only now that the technology is available for mass use.

 

‘The AI community is excited,’ said Kian Katanforush, CEO of Workera and a Stanford faculty member. ‘If you can combine reinforcement learning techniques with OpenAI's language models, you can build AI that reason incrementally and solve big questions.’

 

But the cost of using OpenAI o1 is quite high. Unlike other models that charge for input and output tokens, o1 adds additional ‘reasoning tokens’ that increase computational resources but remain hidden from the user. Thus, despite the power of o1 for complex tasks, its use can become costly for simple queries.

 

Practical Use In real-world examples, OpenAI o1 shows its usefulness. For example, I asked the model to help plan a family Thanksgiving dinner. After 12 seconds of ‘thinking’, I received a detailed 750-word plan that explained that two ovens would be enough to cook for 11 people with proper planning. However, the model also suggested considering renting a portable oven, which seemed a bit redundant.

 

For less complex enquiries, however, o1 may seem overwhelmed. For example, when asked about places where cedars grow, the o1 produced an 800-word answer with excessive detail, while the GPT-4o handled it more succinctly.

 

Are expectations justified? OpenAI developers failed to fully meet the hyped expectations of the o1. CEO Sam Altman admitted that o1 is not AGI (Artificial General Intelligence) and stated that o1 is still imperfect and remains impressive only on first use.

 

Nevertheless, for AI professionals, o1 represents an important step forward, especially in solving complex problems where GPT-4 fails.

First Impressions of OpenAI o1: AI Models That Pause to Think Before Responding 

 

ANY QUESTIONS?