In the last week, DeepSeek’s R1 model has been making headlines for outperforming many of the world’s leading foundation models in reasoning, math, and other benchmarks — even though it was orders of magnitude less costly to build.
One of the strategies employed by DeepSeek to reduce costs was to use far less human data than is typical for training foundation models. With a combination of synthetic chain of thought data and a hybrid reinforcement learning and supervised fine-tuning approach, the AI lab was able to achieve desired levels of model performance.
This has prompted many AI leaders to question the significance of human data in training AI. After all, if DeepSeek was able to use so much less human data to attain such impressive results, it must mean that human data isn’t as important as we thought — right?
With Human Data, Quantity Does Not Signal Impact
Often, research labs hypothesize that achieving their model performance goals will require hundreds of thousands of data rows. These requests then go out to their chosen human data providers, who return after weeks or months with terabytes of training data — and an invoice to match.
As DeepSeek has clearly demonstrated, however, there are smarter, more strategic ways to tackle model performance goals than brute-forcing it with huge volumes of human data. The first step is to conduct a thorough series of model evaluations to dig into performance issues and identify the specific situations that cause failure states. These findings can then be leveraged to build a precise, highly targeted training dataset that directly addresses and corrects these failure states.
How Precision (Human) Data Achieves Results with Less
Over the past three years, Invisible Technologies has helped train 80% of the world’s leading foundation models. One AI lab approached the Invisible team with an urgent issue: their foundation model wasn’t meeting some of their safety benchmarks, delaying their deployment timeline. They estimated that the issue would require 100,000 rows of human data to address.
When Invisible’s data strategy team conducted their model evaluations, however, their findings revealed that generally, the model was safe — it just had an identity problem. If a user asked the model to give them potentially harmful information outright, like recommending stocks to purchase, it refused to answer. But if a user first asked the model to assume the identity of a person that would be able to provide such information, such as a financial advisor, the model often bypassed its safety protocols and provided the requested answers.
Armed with a more comprehensive understanding of the model’s failure states, Invisible’s team compiled a strategic, targeted dataset to address them. This dataset comprised only 4,000 data rows — 96% smaller than the dataset originally hypothesized by the AI lab. With this dataset, the model’s safety features showed a 97% improvement.
In the end, the AI lab was able to launch their model faster while meeting the required performance benchmarks, and with far less human data than they originally anticipated.
The Bottom Line: Achieving Massive Model Improvement with Less Human Data Isn’t New
With the advent of DeepSeek’s R1, achieving model performance goals with less human data is no longer just advantageous to competitive AI labs and enterprises — it’s essential. If your team is ready to leverage precision data to launch AI projects faster and at cost, contact the Invisible team for a demo.