State of Generative Media

State of Generative Media

Generative Media in 2025: An Industry at Its Inflection Point

At a Glance

Generative AI is delivering real results—65% of organizations report ROI within 12 months. Yet the gap between personal adoption (89%) and organizational deployment (57%) points to a missing piece: a unified team workspace for generative workflows. Individual tools abound, but what organizations lack is a single environment where teams can access every model, experiment freely, track costs and usage, control permissions, and govern the entire generative process. Closing this workspace gap is what separates companies still experimenting from those scaling generative media into production.

0
of organizations report ROI within 12 months
0
personal adoption of gen AI tools
0
organizational deployment
0
of orgs deployed AI in at least one function

Introduction

Generative media has crossed a threshold. What began as experimental technology confined to research labs now powers production workflows across advertising, entertainment, gaming, and e-commerce. By the end of 2025, 88% of organizations had deployed AI in at least one business function [5], and the gap between personal experimentation and enterprise adoption—89% versus 57%—signals a market still absorbing what has become possible [3].

This report examines the state of generative media across five modalities—image, video, audio, 3D, and emerging world models—and evaluates how industries are integrating these capabilities into real production pipelines.

Adoption Landscape

The adoption story of 2025 is defined by a split: individuals have moved far ahead of the organizations they work for. Personal adoption of generative media tools reached 89%, while organizational deployment lagged at 57% [3]. The gap is even more pronounced in video—62% of individuals use video generation tools personally, but only 32% of organizations have incorporated them into workflows [3].

This isn't a technology problem. It's an organizational one. Among companies that have scaled generative media successfully [1]:

  • 43% redesigned workflows and production pipelines
  • 33% invested in staff training and upskilling
  • 30% allocated dedicated budgets for media generation infrastructure

The preference for buying over building has accelerated sharply—76% of enterprises now purchase AI solutions compared to just 47% in 2024 [17]. The companies seeing returns are the ones that treated adoption as a structural transformation, not a tool swap. And the investment is paying off [1]:

Already profitable
34%
Expecting returns within 12 months
31%
Total achieving ROI within 12 months
65%

Across the broader landscape, 74% of companies report that their generative AI initiatives meet or exceed ROI expectations [5].

The Model Explosion

2025 saw an unprecedented acceleration in model releases across every generative modality. Video generation alone produced eight major releases in ten months. Image models evolved from single-shot generation to sophisticated editing and style transfer. Audio crossed the threshold of human-indistinguishable speech. 3D went from minutes-long generation to sub-three-second asset creation.

The pace has been relentless—major video model releases landed every 1-2 months. And a pattern has emerged: task-specific models consistently outperform general-purpose "omni" models [15]. In total, approximately 1,000 new models were presented for all modalities.

Image Generation

Image generation entered 2025 as the most mature generative modality and only extended its lead. Competition intensified across every major AI lab—Black Forest Labs, OpenAI, ByteDance, Alibaba, and Google DeepMind all shipped significant updates, driving rapid improvements in quality, speed, and controllability [3].

Key releases

Aug 2024
Flux.1 Dev
Black Forest Labs
Open-source quality benchmark
Mar 2025
GPT Image 1
OpenAI
Integrated image generation in ChatGPT
Mid 2025
Seedream 4.0/4.5
ByteDance
Competitive open-weight architecture
Aug 2025
Nano Banana v1
Google DeepMind
Lightweight high-quality generation
Aug 2025
Qwen Image Edit
Alibaba
Instruction-based image editing
Nov 2025
Nano Banana Pro
Google DeepMind
Enhanced quality and performance

Model Gallery

Flux.1 Dev example
Flux.1 Dev
Black Forest Labs
Open-source quality benchmark
GPT Image 1 example
GPT Image 1
OpenAI
Integrated image generation in ChatGPT
Seedream 4.0/4.5 example
Seedream 4.0/4.5
ByteDance
Competitive open-weight architecture
Nano Banana v1 example
Nano Banana v1
Google DeepMind
Lightweight high-quality generation
Qwen Image Edit example
Qwen Image Edit
Alibaba
Instruction-based image editing
Nano Banana Pro example
Nano Banana Pro
Google DeepMind
Enhanced quality and performance

In terms of adoption, Google Gemini leads at 74%, followed by OpenAI and Black Forest Labs (FLUX) [3]. Image generation also has the highest production deployment rate of any modality—44% of organizations run it in production workflows [3]. The maturity gap between image and other modalities is narrowing, but image remains the entry point for most enterprises exploring generative media.

Open-source models have been a defining force. When code and weights are available, teams can test, iterate, and deploy without vendor lock-in. The barriers to self-hosting drop considerably compared to closed alternatives [16].

Video Generation

Video was the breakout modality of 2025. Eight major models shipped in 2025, each pushing boundaries in different directions [10]:

Key releases

Dec 2024
Veo 2
Google DeepMind
Physically accurate video generation
Feb 2025
PixVerse v4
PixVerse
Accessible consumer-grade generation
Apr 2025
Kling 2.0
Kuaishou
First-frame-to-last-frame control
May 2025
Veo 3
Google DeepMind
Native synchronized audio + video
Jun 2025
MiniMax Hailuo 02
MiniMax
Top benchmark performance
Jun 2025
Seedance 1.0
ByteDance
Competitive technical approach
Sep 2025
Sora 2
OpenAI
Multi-shot generation with audio
Dec 2025
Wan 2.6
Alibaba
15-second 1080p video with synchronized audio

Model Gallery

Veo 2
Google DeepMind
Physically accurate video generation
Kling 2.0
Kuaishou
First-frame-to-last-frame control
Veo 3
Google DeepMind
Native synchronized audio + video
Seedance 1.0
ByteDance
Competitive technical approach
Sora 2
OpenAI
Multi-shot generation with audio
Wan 2.6
Alibaba
15-second 1080p video with synchronized audio

Google leads video model adoption at 69%, with Kling and Hailuo as the primary alternatives [3]. Despite this velocity, an adoption gap persists. While 62% of individuals use video generation personally, only 32% of organizations have moved it into production [3]. Video in production sits at 39%—close behind image's 44%, but still constrained by longer generation times, higher compute costs, and the consistency demands of professional workflows [3].

Audio & Speech

Audio generation matured rapidly in 2025, with text-to-speech crossing a critical perceptual threshold.

Text-to-Speech

ElevenLabs' Turbo v2.5 set the production standard at 250–300ms latency. MiniMax's Speech-02 (May 2025) achieved 99% human voice similarity across 32 languages, effectively closing the gap between synthetic and recorded speech [3]. On the open-source side, Kokoro TTS performed well, demonstrating that production-quality voice synthesis is achievable [3].

Music & Sound

ElevenLabs launched Eleven Music in August 2025—the first major AI music model trained entirely on licensed content [3]. This addressed the licensing concerns that had constrained commercial adoption.

Speech-to-Text

Speech-to-Text remained dominated by Whisper v3 and ElevenLabs STT, with improvements focused on latency reduction and multilingual accuracy [3].

3D Generation

3D generation compressed what used to take days or weeks into minutes or seconds. The trajectory through 2025 was steep:

Jan 2025
Hunyuan 3D 2.0
Tencent
High-quality image-to-3D
Apr 2025
HyperRodin Gen 1.5
Deemos
4 billion parameter architecture
Jul 2025
Meshy v5
Meshy
Text-to-3D improvements
Sep 2025
Tripo 3.0
Tripo
3M+ creators, 700+ enterprises [13]
Oct 2025
Meshy v6 preview
Meshy
Recognized in a16z game dev survey [11]
Dec 2025
TRELLIS 2
Microsoft
High-res assets in under 3 seconds

Production-ready models now span multiple input methods: TripoSR and TRELLIS handle image-to-3D conversion [9], Meshy 6 generates from text prompts, and SAM 3D from Meta offers another image-to-3D path [3].

However, meaningful limitations remain. Generated meshes still require topology cleanup for animation workflows. Geometric accuracy deteriorates on intricate mechanical assemblies. Hard-surface modeling—a staple of industrial and product design—continues to demand significant manual refinement [3]. These constraints keep 3D generation firmly in the "augmentation" category for professional pipelines rather than full replacement.

World Models: The Convergence Layer

World models are not an incremental improvement—they are a category shift. By fusing video generation's temporal reasoning with 3D modeling's spatial awareness into real-time interactive systems, they collapse the boundaries between watching and inhabiting generated content [2].

Dec 2024
Genie 2
Google DeepMind
Playable 3D worlds from a single image [2]
Nov 2025
Marble
World Labs
First commercial world model product [3]

Genie 2 proved the concept was viable: a single image in, a navigable 3D environment out, with keyboard and mouse control maintaining coherence for 10–20 seconds and up to a minute in some cases [2]. Marble made it commercial—generating persistent, downloadable 3D environments from text, images, videos, or panoramas, with output as Gaussian splats, meshes, or video, and direct integration into Unity, Unreal Engine, and VR headsets [3].

The downstream applications are transformative. Autonomous vehicle companies can train on photorealistic simulated cities. Game developers can prototype playable environments from a sketch. Architects can walk through spaces that exist only as floor plans [6]. Text-to-game is the inevitable next step after text-to-video—making generated output interactive rather than passive—and the gap is closing fast [6].

Industry Verticals

Adoption varies dramatically by sector, shaped by each industry's risk tolerance, regulatory environment, and creative workflows [3]:

Advertising56%
Campaign visuals, banner ads, social media graphics
Entertainment & Media43%
Storyboarding, pre-visualization, VFX, promotional content
Creative Software31%
Integrated design platforms, editing tools
Education30%
Interactive videos, animated explainers
Retail & E-Commerce19%
Product photography, catalog imagery

Advertising & Marketing

Marketing leads all verticals in generative AI adoption at 75%, up from 61% in 2024 [12]. But adoption doesn't mean transformation—80% of marketers use AI on less than half their work, and only 30% have achieved full integration across the campaign lifecycle [12].

The biggest obstacle is legal, not technical: 94% of agencies cited IP ownership and liability as their primary implementation challenge [7]. Even so, 72% of marketers identified generative AI as the most important trend for H2 2025 [14]. Scaling adoption will require programmatic generation at campaign volume, enforceable brand consistency, and audit trails for legal compliance.

Entertainment

Film and television studios show high awareness but cautious spending. While 68% of media companies report AI adoption, major studios allocate less than 3% of production budgets to generative AI [4]. Instead, they're shifting approximately 7% of operational spending toward AI-enabled tools for contracts, permitting, and production planning [4].

E-Commerce

E-commerce presents a unique constraint: generated content must be indistinguishable from reality. Model creativity cannot interfere with product fidelity—images and videos must faithfully represent every product. This makes e-commerce one of the more demanding verticals for generative media, requiring pixel-level accuracy alongside creative flexibility.

Education

Education remains the sector with the most untapped potential. The bottleneck lies in creating high-quality content at scale that is optimally tailored to each learner [6]. Current generative models struggle with the consistency, controllability, and factual accuracy that educational content demands. Curriculum coherence and cultural sensitivity add further constraints [3]. Education could become one of the largest generative media markets—driven by the need for personalized learning at massive scale [15].

What Comes Next

Three forces will shape generative media through 2026 and beyond [3]:

Multimodal Convergence
World models are collapsing the boundaries between video, 3D, and interactive media. The trajectory from text-to-image to text-to-video to text-to-interactive-environment is compressing faster than most predicted [6][2].
Democratization of Creative Capability
The tools are approaching a point where professional-quality output no longer requires professional-grade expertise. The shift isn't about machines replacing creators—it's about removing barriers for those who lack access to VFX labs and production infrastructure.

The strategic implication is clear: expertise will shift from execution to orchestration. As capability becomes abundant, taste becomes the scarce resource.

Sources

  • 1Deloitte (2024). State of Generative AI in the Enterprise 2024. deloitte.com
  • 2Google DeepMind Blog (2024). deepmind.google
  • 3Artificial Analysis & fal (2025). State of Generative Media Survey Report 2025. artificialanalysis.ai
  • 4Deloitte (2025). Technology, Media & Telecom Predictions 2025. deloitte.com
  • 5McKinsey & Company (2025). The state of AI in 2025. mckinsey.com
  • 6Training Data Podcast (2025). Gorkem Yurtseven, Burkay Gur, Batuhan T. and Sonya Huang. youtube.com
  • 7IAB (2025). State of Data 2025. iab.com
  • 8Aream & Co. (2025). The State of AI in Gaming Survey. areamandco.com
  • 9Tochilkin, D., et al. (2024). TripoSR: Fast 3D Object Reconstruction from a Single Image. arXiv:2403.02151. arxiv.org
  • 10Variety (2025). Video Generation Model Evaluation in 2025. variety.com
  • 11a16z Games (2024). State of AI in Gaming. gamedevreports.substack.com
  • 124As & Forrester (2025). The State of Generative AI Inside US Marketing Agencies. aaaa.org
  • 13Business Wire (2025). businesswire.com
  • 14Mediaocean (2025). 2025 H2 Market Report. mediaocean.com
  • 15Generative Media Conference (October 24, 2025). Gorkem Yurtseven keynote.
  • 16Generative Media Conference (October 24, 2025). Jennifer Li (a16z).
  • 17Menlo Ventures (2025). 2025: The State of Generative AI in the Enterprise. menlovc.com