AI training data debate illustration showing scientists and tech leaders discussing data shortage and challenges facing future AI models

AI Training Data Problem Sparks Debate as Gary Marcus Says ‘I Warned Them 2 Years Ago’

Artificial Intelligence has become one of the most exciting technologies of our time. From writing emails and generating images to helping businesses automate work and assisting students with research, AI is changing daily life faster than most people imagined.

But behind the excitement, a new debate is growing inside the tech world.

A debate not about how powerful AI has become—but about whether it may be running into a serious limitation.

The latest discussion began after Oracle founder Larry Ellison highlighted what he believes is one of AI’s biggest hidden weaknesses. According to Ellison, many of today’s leading AI models are trained on nearly the same public internet data, making them increasingly similar over time.

His remarks quickly caught the attention of American scientist and AI researcher Gary Marcus.

Marcus responded with a pointed reminder:

“I warned them two years ago.”

That single statement has reignited a larger conversation about the future of artificial intelligence—and whether the AI industry ignored early warnings while racing toward bigger and more expensive models.

The Question Nobody Wanted to Ask

For years, the AI industry has operated on a simple belief.

More data.

More computing power.

Bigger models.

This formula helped create the AI tools people use today. Companies spent billions building massive systems trained on enormous collections of internet content—articles, websites, books, code, forums, and public discussions.

The results were impressive.

AI could suddenly write essays, answer questions, create artwork, and even simulate conversations that felt surprisingly human.

To many people, it looked like technology had entered a new age.

But while investors celebrated and companies competed to release stronger models, some researchers were asking uncomfortable questions.

What happens when everyone trains on the same information?

And what happens when the internet—the very fuel feeding these systems—starts reaching its limits?

Those concerns are now becoming harder to ignore.

What Larry Ellison Actually Said

Larry Ellison, co-founder and CTO of Oracle, recently argued that the biggest AI companies may be facing a problem hidden beneath the hype.

His point was simple.

Most major AI systems—including models developed by OpenAI, Google, Meta, and others—depend heavily on publicly available internet data.

If everyone learns from the same material, the models begin to resemble one another. The result, Ellison suggested, is that AI risks becoming a commodity rather than a truly differentiated technology.

In practical terms, this means companies may struggle to build unique advantages if their AI systems are all studying the same digital library.

Imagine several students preparing for the same exam using exactly the same textbook.

They may answer questions differently.

They may use different words.

But eventually, their knowledge starts overlapping.

That, according to Ellison, is happening in AI.

He believes the future value of artificial intelligence may depend less on public web data and more on private, proprietary information held by businesses and institutions.

Gary Marcus Says the Warning Came Earlier

For Gary Marcus, Ellison’s remarks sounded familiar.

Marcus, a neuroscientist and professor emeritus known for questioning mainstream AI assumptions, responded by saying he had warned the industry about this exact issue years ago.

His criticism goes beyond a single technical concern.

Marcus has long argued that relying only on larger datasets and more computing power may not be enough to create truly intelligent systems. In earlier research and public commentary, he suggested AI needed deeper reasoning and broader approaches rather than endless scaling.

So when he said, “I warned them two years ago,” it was not merely frustration.

It reflected a broader disagreement over how Silicon Valley has pursued AI.

Marcus believes the industry may have underestimated the risks of building systems that increasingly depend on similar training data and similar architectures.

His warning also touched on economics.

If models become too similar, companies may enter aggressive price competition where technological differences shrink and profits become harder to protect.

Why Data Matters So Much

To understand this debate, it helps to understand one basic truth:

AI learns from data.

Data is not just fuel.

It is experience.

Human beings learn through life—through mistakes, emotions, memories, and observation.

AI learns through examples.

If those examples become repetitive, limited, or overly similar, learning can slow.

That is why training data matters so deeply.

Public internet content helped create today’s AI revolution. But the internet is not infinite.

Researchers and technology leaders have increasingly discussed whether high-quality human-created data may be reaching limits. Even figures like Elon Musk have argued that much of the easily available training data has already been consumed.

This does not mean AI progress suddenly stops.

But it may mean the path forward becomes more complicated.

The Growing Debate Around Private Data

If public data is becoming less useful, where does AI go next?

This is where the conversation becomes sensitive.

Ellison argues that private enterprise data may hold the next major advantage. Business records, organizational knowledge, and secure databases could help AI systems become more specialised and valuable.

From a business perspective, the idea makes sense.

A hospital has medical data.

A bank has financial patterns.

A logistics company has supply-chain knowledge.

AI trained responsibly on such information could produce highly tailored results.

But another question immediately appears:

What about privacy?

This is where excitement and anxiety collide.

People already worry about how much personal information technology companies collect. The thought of AI relying increasingly on private or proprietary data creates understandable concern about consent, security, and ownership.

The future of AI may therefore involve not only engineering questions—but ethical ones too.

A Future Built on AI-Generated Data?

Another possibility is synthetic data.

This means AI creating training material for other AI systems.

At first glance, this sounds efficient.

If human data becomes scarce, machines generate more.

Problem solved.

But researchers have warned this approach may create new problems. Studies discussing “model collapse” suggest that repeatedly training AI on AI-generated content can degrade quality and cause systems to lose important diversity and nuance over time.

Think of it like making copies of copies of a photograph.

At first, the image looks fine.

After enough duplication, details fade.

That fear has become part of the modern AI debate.

The Human Story Behind the Technology

It is easy to see this discussion as a battle between billionaires, researchers, and technology companies.

But there is a more human side.

Millions of people now depend on AI in some form.

Students use it to learn.

Employees use it at work.

Small businesses use it to compete.

Writers, designers, programmers, and entrepreneurs increasingly build parts of their lives around these tools.

That is why questions about AI’s future matter.

People are not simply watching a technological race.

They are watching something that may shape jobs, education, creativity, and everyday decision-making.

The debate between Larry Ellison and Gary Marcus reflects something deeper than disagreement.

It reflects uncertainty.

Are today’s AI systems standing at the beginning of limitless progress?

Or are they approaching challenges that require an entirely new direction?

Final Thoughts

For now, AI is not disappearing.

Innovation continues.

Companies are still investing heavily and building increasingly capable systems.

But the recent debate has exposed a reality often hidden beneath headlines.

Artificial intelligence may be powerful—but it is not magical.

It depends on data, design, and human choices.

Larry Ellison believes AI risks becoming too similar because everyone learns from the same digital world. Gary Marcus says that warning should not surprise anyone because he raised it years ago.

Who turns out to be right may shape the next chapter of AI.

And perhaps that is the real story here.

Not whether AI has become intelligent enough—

But whether humanity is wise enough to guide what comes next.

Saturday, May 30, 2026