Cost-efficient reasoning is key to Agentic workflows
At Ninja AI, we believe that cutting-edge AI should be both powerful and accessible, helping users boost productivity without breaking the bank. For the past two years we’ve been focused on building an agentic productivity system, continuously adding the latest AI advancements into Ninja AI to make it smarter, faster, and more capable.
Along the way we’ve introduced features that require sophisticated agentic workflows, such as Deep Research and Multi-Turn File Analysis. We also launched a beta version of a scheduling workflow, allowing Ninja to negotiate meeting times with multiple participants via email.
As we continuously refine these skills, we recognize a critical need—to enhance Ninja’s intelligence and decision-making. Reducing errors in high-risk tasks (e.g., modifying calendar events) and enabling more autonomous workflows (e.g., executing composite tasks that interact with APIs and people) require our agents to make more accurate decisions and predictions in many different types of situations.
We've discovered that incorporating "step-by-step thinking" into our workflows significantly boosts their accuracy and ability to generalize. Step-by-Step Thinking is a process that involves: planning, breaking down tasks, backtracking, verifying and reflecting before executing tasks by intelligent function-calling. Recent reasoning models have successfully applied ‘step-by-step thinking’ to solve complex math, science, and coding problems. However, due to the following limitations, these models aren't suitable for our Ninja Agentic workflows:
First, most current reasoning models are very expensive. For example, a single complex agentic task using OpenAI’s O1 API could cost anywhere between $0.75 to $2.251 - that is “per task” cost which is a price that is economically unsustainable for us as a business and also unviable for customers if we were to pass the costs to them per task.
1Assuming each agentic task requires an estimated 5,000 to 10,000 input tokens and 10,000 to 30,000 output tokens
Second, the more affordable reasoning models do not have the necessary features to power agentic workflows. For example, DeepSeek R1 is a free reasoning model - but it is limited. R1, due to its size, requires Nvidia H200s GPUs (or better) for high latency and low throughput for the model; hence, making it difficult to use it in a real-time task-oriented chat system. Using H200s also makes it expensive to run. Additionally, R1 has challenges handling general capability and software engineering tasks - these limitations are confirmed by the last section of the R1 paper.
Furthermore, existing reasoning models lack the customizations. At Ninja, we are aspiring to build the most advanced agentic system for productivity. As such, we need the ability to fine-tune the models to better suit our needs. This is not possible when accessing current reasoning models via API or using existing large open-sourced reasoning models (such as the 671B param R1).
Given these drawbacks, we decided to design our own reasoning system - SuperAgent-R 2.0 - to help us enable a sustainable agentic system that’s fast, affordable & fine tunable for customers.