State of Generative Media

Generative Media in 2025: An Industry at Its Inflection Point

At a Glance

Generative AI is delivering real results—65% of organizations report ROI within 12 months. Yet the gap between personal adoption (89%) and organizational deployment (57%) points to a missing piece: a unified team workspace for generative workflows. Individual tools abound, but what organizations lack is a single environment where teams can access every model, experiment freely, track costs and usage, control permissions, and govern the entire generative process. Closing this workspace gap is what separates companies still experimenting from those scaling generative media into production.

of organizations report ROI within 12 months

personal adoption of gen AI tools

organizational deployment

of orgs deployed AI in at least one function

Introduction

Generative media has crossed a threshold. What began as experimental technology confined to research labs now powers production workflows across advertising, entertainment, gaming, and e-commerce. By the end of 2025, 88% of organizations had deployed AI in at least one business function [5], and the gap between personal experimentation and enterprise adoption—89% versus 57%—signals a market still absorbing what has become possible [3].

This report examines the state of generative media across five modalities—image, video, audio, 3D, and emerging world models—and evaluates how industries are integrating these capabilities into real production pipelines.

Adoption Landscape

The adoption story of 2025 is defined by a split: individuals have moved far ahead of the organizations they work for. Personal adoption of generative media tools reached 89%, while organizational deployment lagged at 57% [3]. The gap is even more pronounced in video—62% of individuals use video generation tools personally, but only 32% of organizations have incorporated them into workflows [3].

This isn't a technology problem. It's an organizational one. Among companies that have scaled generative media successfully [1]:

43% redesigned workflows and production pipelines
33% invested in staff training and upskilling
30% allocated dedicated budgets for media generation infrastructure

The preference for buying over building has accelerated sharply—76% of enterprises now purchase AI solutions compared to just 47% in 2024 [17]. The companies seeing returns are the ones that treated adoption as a structural transformation, not a tool swap. And the investment is paying off [1]:

Already profitable

34%

Expecting returns within 12 months

31%

Total achieving ROI within 12 months

65%

Across the broader landscape, 74% of companies report that their generative AI initiatives meet or exceed ROI expectations [5].

The Model Explosion

2025 saw an unprecedented acceleration in model releases across every generative modality. Video generation alone produced eight major releases in ten months. Image models evolved from single-shot generation to sophisticated editing and style transfer. Audio crossed the threshold of human-indistinguishable speech. 3D went from minutes-long generation to sub-three-second asset creation.

The pace has been relentless—major video model releases landed every 1-2 months. And a pattern has emerged: task-specific models consistently outperform general-purpose "omni" models [15]. In total, approximately 1,000 new models were presented for all modalities.

Image Generation

Image generation entered 2025 as the most mature generative modality and only extended its lead. Competition intensified across every major AI lab—Black Forest Labs, OpenAI, ByteDance, Alibaba, and Google DeepMind all shipped significant updates, driving rapid improvements in quality, speed, and controllability [3].

Key releases

Aug 2024

Flux.1 Dev

Black Forest Labs

Open-source quality benchmark

Mar 2025

GPT Image 1

OpenAI

Integrated image generation in ChatGPT

Mid 2025

Seedream 4.0/4.5

ByteDance

Competitive open-weight architecture

Aug 2025

Nano Banana v1

Google DeepMind

Lightweight high-quality generation

Aug 2025

Qwen Image Edit

Alibaba

Instruction-based image editing

Nov 2025

Nano Banana Pro

Google DeepMind

Enhanced quality and performance

Model Gallery

Flux.1 Dev

Black Forest Labs

Open-source quality benchmark

GPT Image 1

OpenAI

Integrated image generation in ChatGPT

Seedream 4.0/4.5

ByteDance

Competitive open-weight architecture

Nano Banana v1

Google DeepMind

Lightweight high-quality generation

Qwen Image Edit

Alibaba

Instruction-based image editing

Nano Banana Pro

Google DeepMind

Enhanced quality and performance

In terms of adoption, Google Gemini leads at 74%, followed by OpenAI and Black Forest Labs (FLUX) [3]. Image generation also has the highest production deployment rate of any modality—44% of organizations run it in production workflows [3]. The maturity gap between image and other modalities is narrowing, but image remains the entry point for most enterprises exploring generative media.

Open-source models have been a defining force. When code and weights are available, teams can test, iterate, and deploy without vendor lock-in. The barriers to self-hosting drop considerably compared to closed alternatives [16].

Video Generation

Video was the breakout modality of 2025. Eight major models shipped in 2025, each pushing boundaries in different directions [10]:

Key releases

Dec 2024

Veo 2

Google DeepMind

Physically accurate video generation

Feb 2025

PixVerse v4

PixVerse

Accessible consumer-grade generation

Apr 2025

Kling 2.0

Kuaishou

First-frame-to-last-frame control

May 2025

Veo 3

Google DeepMind

Native synchronized audio + video

Jun 2025

MiniMax Hailuo 02

MiniMax

Top benchmark performance

Jun 2025

Seedance 1.0

ByteDance

Competitive technical approach

Sep 2025

Sora 2

OpenAI

Multi-shot generation with audio

Dec 2025

Wan 2.6

Alibaba

15-second 1080p video with synchronized audio

Model Gallery

Veo 2

Google DeepMind

Physically accurate video generation

Kling 2.0

Kuaishou

First-frame-to-last-frame control

Veo 3

Google DeepMind

Native synchronized audio + video

Seedance 1.0

ByteDance

Competitive technical approach

Sora 2

OpenAI

Multi-shot generation with audio

Wan 2.6

Alibaba

15-second 1080p video with synchronized audio

Google leads video model adoption at 69%, with Kling and Hailuo as the primary alternatives [3]. Despite this velocity, an adoption gap persists. While 62% of individuals use video generation personally, only 32% of organizations have moved it into production [3]. Video in production sits at 39%—close behind image's 44%, but still constrained by longer generation times, higher compute costs, and the consistency demands of professional workflows [3].

Audio & Speech

Audio generation matured rapidly in 2025, with text-to-speech crossing a critical perceptual threshold.

Text-to-Speech

ElevenLabs' Turbo v2.5 set the production standard at 250–300ms latency. MiniMax's Speech-02 (May 2025) achieved 99% human voice similarity across 32 languages, effectively closing the gap between synthetic and recorded speech [3]. On the open-source side, Kokoro TTS performed well, demonstrating that production-quality voice synthesis is achievable [3].

Music & Sound

ElevenLabs launched Eleven Music in August 2025—the first major AI music model trained entirely on licensed content [3]. This addressed the licensing concerns that had constrained commercial adoption.

Speech-to-Text

Speech-to-Text remained dominated by Whisper v3 and ElevenLabs STT, with improvements focused on latency reduction and multilingual accuracy [3].

3D Generation

3D generation compressed what used to take days or weeks into minutes or seconds. The trajectory through 2025 was steep:

Jan 2025

Hunyuan 3D 2.0

Tencent

High-quality image-to-3D

Apr 2025

HyperRodin Gen 1.5

Deemos

4 billion parameter architecture

Jul 2025

Meshy v5

Meshy

Text-to-3D improvements

Sep 2025

Tripo 3.0

Tripo

3M+ creators, 700+ enterprises [13]

Oct 2025

Meshy v6 preview

Meshy

Recognized in a16z game dev survey [11]

Dec 2025

TRELLIS 2

Microsoft

High-res assets in under 3 seconds

Production-ready models now span multiple input methods: TripoSR and TRELLIS handle image-to-3D conversion [9], Meshy 6 generates from text prompts, and SAM 3D from Meta offers another image-to-3D path [3].

However, meaningful limitations remain. Generated meshes still require topology cleanup for animation workflows. Geometric accuracy deteriorates on intricate mechanical assemblies. Hard-surface modeling—a staple of industrial and product design—continues to demand significant manual refinement [3]. These constraints keep 3D generation firmly in the "augmentation" category for professional pipelines rather than full replacement.

World Models: The Convergence Layer

World models are not an incremental improvement—they are a category shift. By fusing video generation's temporal reasoning with 3D modeling's spatial awareness into real-time interactive systems, they collapse the boundaries between watching and inhabiting generated content [2].

Dec 2024

Genie 2

Google DeepMind

Playable 3D worlds from a single image [2]

Nov 2025

Marble

World Labs

First commercial world model product [3]

Genie 2 proved the concept was viable: a single image in, a navigable 3D environment out, with keyboard and mouse control maintaining coherence for 10–20 seconds and up to a minute in some cases [2]. Marble made it commercial—generating persistent, downloadable 3D environments from text, images, videos, or panoramas, with output as Gaussian splats, meshes, or video, and direct integration into Unity, Unreal Engine, and VR headsets [3].

The downstream applications are transformative. Autonomous vehicle companies can train on photorealistic simulated cities. Game developers can prototype playable environments from a sketch. Architects can walk through spaces that exist only as floor plans [6]. Text-to-game is the inevitable next step after text-to-video—making generated output interactive rather than passive—and the gap is closing fast [6].

Industry Verticals

Adoption varies dramatically by sector, shaped by each industry's risk tolerance, regulatory environment, and creative workflows [3]:

Advertising56%

Campaign visuals, banner ads, social media graphics

Entertainment & Media43%

Storyboarding, pre-visualization, VFX, promotional content

Creative Software31%

Integrated design platforms, editing tools

Education30%

Interactive videos, animated explainers

Retail & E-Commerce19%

Product photography, catalog imagery

Advertising & Marketing

Marketing leads all verticals in generative AI adoption at 75%, up from 61% in 2024 [12]. But adoption doesn't mean transformation—80% of marketers use AI on less than half their work, and only 30% have achieved full integration across the campaign lifecycle [12].

The biggest obstacle is legal, not technical: 94% of agencies cited IP ownership and liability as their primary implementation challenge [7]. Even so, 72% of marketers identified generative AI as the most important trend for H2 2025 [14]. Scaling adoption will require programmatic generation at campaign volume, enforceable brand consistency, and audit trails for legal compliance.

Entertainment

Film and television studios show high awareness but cautious spending. While 68% of media companies report AI adoption, major studios allocate less than 3% of production budgets to generative AI [4]. Instead, they're shifting approximately 7% of operational spending toward AI-enabled tools for contracts, permitting, and production planning [4].

E-Commerce

E-commerce presents a unique constraint: generated content must be indistinguishable from reality. Model creativity cannot interfere with product fidelity—images and videos must faithfully represent every product. This makes e-commerce one of the more demanding verticals for generative media, requiring pixel-level accuracy alongside creative flexibility.

Education

Education remains the sector with the most untapped potential. The bottleneck lies in creating high-quality content at scale that is optimally tailored to each learner [6]. Current generative models struggle with the consistency, controllability, and factual accuracy that educational content demands. Curriculum coherence and cultural sensitivity add further constraints [3]. Education could become one of the largest generative media markets—driven by the need for personalized learning at massive scale [15].

What Comes Next

Three forces will shape generative media through 2026 and beyond [3]:

Multimodal Convergence

World models are collapsing the boundaries between video, 3D, and interactive media. The trajectory from text-to-image to text-to-video to text-to-interactive-environment is compressing faster than most predicted [6][2].

Democratization of Creative Capability

The tools are approaching a point where professional-quality output no longer requires professional-grade expertise. The shift isn't about machines replacing creators—it's about removing barriers for those who lack access to VFX labs and production infrastructure.

The strategic implication is clear: expertise will shift from execution to orchestration. As capability becomes abundant, taste becomes the scarce resource.

Sources

1Deloitte (2024). State of Generative AI in the Enterprise 2024. deloitte.com
2Google DeepMind Blog (2024). deepmind.google
3Artificial Analysis & fal (2025). State of Generative Media Survey Report 2025. artificialanalysis.ai
4Deloitte (2025). Technology, Media & Telecom Predictions 2025. deloitte.com
5McKinsey & Company (2025). The state of AI in 2025. mckinsey.com
6Training Data Podcast (2025). Gorkem Yurtseven, Burkay Gur, Batuhan T. and Sonya Huang. youtube.com
7IAB (2025). State of Data 2025. iab.com
8Aream & Co. (2025). The State of AI in Gaming Survey. areamandco.com
9Tochilkin, D., et al. (2024). TripoSR: Fast 3D Object Reconstruction from a Single Image. arXiv:2403.02151. arxiv.org
10Variety (2025). Video Generation Model Evaluation in 2025. variety.com
11a16z Games (2024). State of AI in Gaming. gamedevreports.substack.com
124As & Forrester (2025). The State of Generative AI Inside US Marketing Agencies. aaaa.org
13Business Wire (2025). businesswire.com
14Mediaocean (2025). 2025 H2 Market Report. mediaocean.com
15Generative Media Conference (October 24, 2025). Gorkem Yurtseven keynote.
16Generative Media Conference (October 24, 2025). Jennifer Li (a16z).
17Menlo Ventures (2025). 2025: The State of Generative AI in the Enterprise. menlovc.com

State of Generative Media

At a Glance

Introduction

Adoption Landscape

The Model Explosion

Image Generation

Key releases

Model Gallery

Video Generation

Key releases

Model Gallery

Audio & Speech

Text-to-Speech

Music & Sound

Speech-to-Text

3D Generation

World Models: The Convergence Layer

Industry Verticals

Advertising & Marketing

Entertainment

E-Commerce

Education

What Comes Next

Sources

Ready to get started

Ready to get started