Close Menu
  • Home
  • AI
    • Web3
    • Gaming
  • Bitcoin
    • CBDCs
    • DeFi
    • Ethereum
    • Layer2
    • Macro
    • Memecoins
    • NFT
    • NFTs
    • Stablecoins
  • Banking
    • Bankruptcy
    • Censorship
    • Crime
  • Policies
    • Regulation
    • Legal
    • Exchanges
    • Privacy
  • All Posts
What's Hot

Strategy Promoted ‘Misleading’ Comparisons to Apple and NVIDIA, According to Wall Street Veteran

Aug. 21, 2025

Bitcoin Treasury KindlyMD Completes $200 Million Fundraising Round to Acquire Additional BTC

Aug. 20, 2025

Ethereum Treasury Seeks to Counter Short Sellers Through ‘Loyalty Payment’ Amid Shares Trading Below Asset Holdings

Aug. 20, 2025
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
Coin Forge HubCoin Forge Hub
Subscribe
  • Home
  • AI
    • Web3
    • Gaming
  • Bitcoin
    • CBDCs
    • DeFi
    • Ethereum
    • Layer2
    • Macro
    • Memecoins
    • NFT
    • NFTs
    • Stablecoins
  • Banking
    • Bankruptcy
    • Censorship
    • Crime
  • Policies
    • Regulation
    • Legal
    • Exchanges
    • Privacy
  • All Posts
Coin Forge HubCoin Forge Hub
Home » Typography Elements » AI Developers Shift to Synthetic Data Amidst Decline of Original Content
AI

AI Developers Shift to Synthetic Data Amidst Decline of Original Content

By adminMar. 13, 2025No Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
AI Developers Shift to Synthetic Data Amidst Decline of Original Content
AI Developers Shift to Synthetic Data Amidst Decline of Original Content
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email

As AI models consume the internet’s free content, a looming crisis is emerging: What happens when there’s nothing left to train on?

A recent Copyleaks report revealed that DeepSeek, a Chinese AI model, often produces responses nearly identical to ChatGPT, raising concerns that it was trained on OpenAI outputs.

That’s led some to suspect the era of “low-hanging fruit” in AI development may be over.

In December, Google CEO Sundar Pichai acknowledged this reality, warning that AI developers are rapidly exhausting the supply of freely available, high-quality training data.

“In the current generation of LLM models, roughly a few companies have converged at the top, but I think we’re all working on our next versions too,” Pichai said at the New York Times’ annual Dealbook Summit in December. “I think the progress is going to get harder.”

With the supply of high-quality training data dwindling, many AI researchers are turning to synthetic data generated by other AI.

Synthetic data isn’t new—it dates back to the late 1960s—and has been used in statistics and machine learning, relying on algorithms and simulations to create artificial datasets that mimic real-world information. But its growing role in AI development sparks fresh concerns, particularly as AI systems integrate into decentralized technologies.

Bootstrapping AI

“Synthetic data has been around in statistics forever—it’s called bootstrapping,” Professor of Software Engineering at MIT Muriel Médar told Decrypt in an interview at ETH Denver 2025. “You start with actual data and think, ‘I want more but don’t want to pay for it. I’ll make it up based on what I have.’”

Medard, the co-founder of decentralized memory infrastructure platform Optimum, said the main challenge in training AI models isn’t the lack of data but rather its accessibility.

“You either search for more or fake it with what you have,” she said. “Accessing data—especially on-chain, where retrieval and updates are crucial—adds another layer of complexity.”

AI developers face mounting privacy restrictions and limited access to real-world datasets, with synthetic data becoming a crucial alternative for model training.

“As privacy restrictions and general content policies are backed with more and more protection, utilizing synthetic data will become a necessity, both out of ease of access and fear of legal recourse,” Senior Solutions Architect at Druid AI Nick Sanchez told Decrypt.

“Currently, it’s not a perfect solution, as synthetic data can contain the same biases you would find in real-world data, but its role in handling consent, copyright, and privacy issues will only grow over time,” he added.

Risks and rewards

As the use of synthetic data grows, so do concerns about its potential for manipulation and misuse.

“Synthetic data itself might be used to insert false information into the training set, intentionally misleading the AI models,” Sanchez said, “This is particularly concerning when applying it to sensitive applications like fraud detection, where bad actors could use the synthetic data to train models that overlook certain fraudulent patterns.”

Blockchain technology could help mitigate the risks of synthetic data, Medard explained, emphasizing that the goal is to make data tamper-proof rather than unchangeable.

“When updating data, you don’t do it willy-nilly—you change a bit and observe,” she said. “When people talk about immutability, they really mean durability, but the full framework matters.”

Edited by Sebastian Sinclair

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleSolana Game ‘Nyan Heroes’ Announces Largest Airdrop to Date for Upcoming Playtest
Next Article An Examination of a Contributor’s Proposal for the Overhaul of Jito’s Tokenomics

Related Posts

When Your Life Partner Undergoes a Software Update

Aug. 20, 2025

Sam Altman’s OpenAI Confirms the Imminent Release of GPT-5

Aug. 13, 2025

Trump Media Explores ‘Truth Search’ Functionality with Perplexity AI

Aug. 8, 2025
Leave A Reply Cancel Reply

Latest Posts

Strategy Promoted ‘Misleading’ Comparisons to Apple and NVIDIA, According to Wall Street Veteran

Aug. 21, 2025

Bitcoin Treasury KindlyMD Completes $200 Million Fundraising Round to Acquire Additional BTC

Aug. 20, 2025

Ethereum Treasury Seeks to Counter Short Sellers Through ‘Loyalty Payment’ Amid Shares Trading Below Asset Holdings

Aug. 20, 2025

Hackers Exploit Fake Captchas to Distribute Lumma Stealer Malware

Aug. 20, 2025
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Don't Miss

Every Bitcoin Holder Will Eventually Become an Internationalist

By adminAug. 15, 1971

Over the weekend, BTC surged back towards the 30-day moving average, hovering around 69k. The opport…

Brave Souls Take the Lead in the Bitcoin Time Tunnel with OKX Web3

May. 22, 2010

The Ultimate Power Play: Masters and Minions in the World of Positions

Jul. 6, 2010
About Us
About Us

Explore the latest developments in cryptocurrency and blockchain technology with comprehensive and timely coverage, in-depth analysis, and expert insights from Coin Forge Hub.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Strategy Promoted ‘Misleading’ Comparisons to Apple and NVIDIA, According to Wall Street Veteran

Aug. 21, 2025

Bitcoin Treasury KindlyMD Completes $200 Million Fundraising Round to Acquire Additional BTC

Aug. 20, 2025

Ethereum Treasury Seeks to Counter Short Sellers Through ‘Loyalty Payment’ Amid Shares Trading Below Asset Holdings

Aug. 20, 2025
Most Popular

Every Bitcoin Holder Will Eventually Become an Internationalist

Aug. 15, 1971

Brave Souls Take the Lead in the Bitcoin Time Tunnel with OKX Web3

May. 22, 2010

The Ultimate Power Play: Masters and Minions in the World of Positions

Jul. 6, 2010
© 2025 Coin Forge Hub All rights reserved.
  • Home
  • AI
    • Web3
    • Gaming
  • Bitcoin
    • CBDCs
    • DeFi
    • Ethereum
    • Layer2
    • Macro
    • Memecoins
    • NFT
    • NFTs
    • Stablecoins
  • Banking
    • Bankruptcy
    • Censorship
    • Crime
  • Policies
    • Regulation
    • Legal
    • Exchanges
    • Privacy
  • All Posts

Type above and press Enter to search. Press Esc to cancel.