
State of Generative Media
Generative Media in 2025: An Industry at Its Inflection Point
At a Glance
Generative AI is delivering real results—65% of organizations report ROI within 12 months. Yet the gap between personal adoption (89%) and organizational deployment (57%) points to a missing piece: a unified team workspace for generative workflows. Individual tools abound, but what organizations lack is a single environment where teams can access every model, experiment freely, track costs and usage, control permissions, and govern the entire generative process. Closing this workspace gap is what separates companies still experimenting from those scaling generative media into production.
Introduction
Generative media has crossed a threshold. What began as experimental technology confined to research labs now powers production workflows across advertising, entertainment, gaming, and e-commerce. By the end of 2025, 88% of organizations had deployed AI in at least one business function [5], and the gap between personal experimentation and enterprise adoption—89% versus 57%—signals a market still absorbing what has become possible [3].
This report examines the state of generative media across five modalities—image, video, audio, 3D, and emerging world models—and evaluates how industries are integrating these capabilities into real production pipelines.
Adoption Landscape
The adoption story of 2025 is defined by a split: individuals have moved far ahead of the organizations they work for. Personal adoption of generative media tools reached 89%, while organizational deployment lagged at 57% [3]. The gap is even more pronounced in video—62% of individuals use video generation tools personally, but only 32% of organizations have incorporated them into workflows [3].
This isn't a technology problem. It's an organizational one. Among companies that have scaled generative media successfully [1]:
- 43% redesigned workflows and production pipelines
- 33% invested in staff training and upskilling
- 30% allocated dedicated budgets for media generation infrastructure
The preference for buying over building has accelerated sharply—76% of enterprises now purchase AI solutions compared to just 47% in 2024 [17]. The companies seeing returns are the ones that treated adoption as a structural transformation, not a tool swap. And the investment is paying off [1]:
Across the broader landscape, 74% of companies report that their generative AI initiatives meet or exceed ROI expectations [5].
The Model Explosion
2025 saw an unprecedented acceleration in model releases across every generative modality. Video generation alone produced eight major releases in ten months. Image models evolved from single-shot generation to sophisticated editing and style transfer. Audio crossed the threshold of human-indistinguishable speech. 3D went from minutes-long generation to sub-three-second asset creation.
The pace has been relentless—major video model releases landed every 1-2 months. And a pattern has emerged: task-specific models consistently outperform general-purpose "omni" models [15]. In total, approximately 1,000 new models were presented for all modalities.
Image Generation
Image generation entered 2025 as the most mature generative modality and only extended its lead. Competition intensified across every major AI lab—Black Forest Labs, OpenAI, ByteDance, Alibaba, and Google DeepMind all shipped significant updates, driving rapid improvements in quality, speed, and controllability [3].
Key releases
Model Gallery






In terms of adoption, Google Gemini leads at 74%, followed by OpenAI and Black Forest Labs (FLUX) [3]. Image generation also has the highest production deployment rate of any modality—44% of organizations run it in production workflows [3]. The maturity gap between image and other modalities is narrowing, but image remains the entry point for most enterprises exploring generative media.
Open-source models have been a defining force. When code and weights are available, teams can test, iterate, and deploy without vendor lock-in. The barriers to self-hosting drop considerably compared to closed alternatives [16].
Video Generation
Video was the breakout modality of 2025. Eight major models shipped in 2025, each pushing boundaries in different directions [10]:
Key releases
Model Gallery
Google leads video model adoption at 69%, with Kling and Hailuo as the primary alternatives [3]. Despite this velocity, an adoption gap persists. While 62% of individuals use video generation personally, only 32% of organizations have moved it into production [3]. Video in production sits at 39%—close behind image's 44%, but still constrained by longer generation times, higher compute costs, and the consistency demands of professional workflows [3].
Audio & Speech
Audio generation matured rapidly in 2025, with text-to-speech crossing a critical perceptual threshold.
Text-to-Speech
ElevenLabs' Turbo v2.5 set the production standard at 250–300ms latency. MiniMax's Speech-02 (May 2025) achieved 99% human voice similarity across 32 languages, effectively closing the gap between synthetic and recorded speech [3]. On the open-source side, Kokoro TTS performed well, demonstrating that production-quality voice synthesis is achievable [3].
Music & Sound
ElevenLabs launched Eleven Music in August 2025—the first major AI music model trained entirely on licensed content [3]. This addressed the licensing concerns that had constrained commercial adoption.
Speech-to-Text
Speech-to-Text remained dominated by Whisper v3 and ElevenLabs STT, with improvements focused on latency reduction and multilingual accuracy [3].
3D Generation
3D generation compressed what used to take days or weeks into minutes or seconds. The trajectory through 2025 was steep:
Production-ready models now span multiple input methods: TripoSR and TRELLIS handle image-to-3D conversion [9], Meshy 6 generates from text prompts, and SAM 3D from Meta offers another image-to-3D path [3].
However, meaningful limitations remain. Generated meshes still require topology cleanup for animation workflows. Geometric accuracy deteriorates on intricate mechanical assemblies. Hard-surface modeling—a staple of industrial and product design—continues to demand significant manual refinement [3]. These constraints keep 3D generation firmly in the "augmentation" category for professional pipelines rather than full replacement.
World Models: The Convergence Layer
World models are not an incremental improvement—they are a category shift. By fusing video generation's temporal reasoning with 3D modeling's spatial awareness into real-time interactive systems, they collapse the boundaries between watching and inhabiting generated content [2].
Genie 2 proved the concept was viable: a single image in, a navigable 3D environment out, with keyboard and mouse control maintaining coherence for 10–20 seconds and up to a minute in some cases [2]. Marble made it commercial—generating persistent, downloadable 3D environments from text, images, videos, or panoramas, with output as Gaussian splats, meshes, or video, and direct integration into Unity, Unreal Engine, and VR headsets [3].
The downstream applications are transformative. Autonomous vehicle companies can train on photorealistic simulated cities. Game developers can prototype playable environments from a sketch. Architects can walk through spaces that exist only as floor plans [6]. Text-to-game is the inevitable next step after text-to-video—making generated output interactive rather than passive—and the gap is closing fast [6].
Industry Verticals
Adoption varies dramatically by sector, shaped by each industry's risk tolerance, regulatory environment, and creative workflows [3]:
Advertising & Marketing
Marketing leads all verticals in generative AI adoption at 75%, up from 61% in 2024 [12]. But adoption doesn't mean transformation—80% of marketers use AI on less than half their work, and only 30% have achieved full integration across the campaign lifecycle [12].
The biggest obstacle is legal, not technical: 94% of agencies cited IP ownership and liability as their primary implementation challenge [7]. Even so, 72% of marketers identified generative AI as the most important trend for H2 2025 [14]. Scaling adoption will require programmatic generation at campaign volume, enforceable brand consistency, and audit trails for legal compliance.
Entertainment
Film and television studios show high awareness but cautious spending. While 68% of media companies report AI adoption, major studios allocate less than 3% of production budgets to generative AI [4]. Instead, they're shifting approximately 7% of operational spending toward AI-enabled tools for contracts, permitting, and production planning [4].
E-Commerce
E-commerce presents a unique constraint: generated content must be indistinguishable from reality. Model creativity cannot interfere with product fidelity—images and videos must faithfully represent every product. This makes e-commerce one of the more demanding verticals for generative media, requiring pixel-level accuracy alongside creative flexibility.
Education
Education remains the sector with the most untapped potential. The bottleneck lies in creating high-quality content at scale that is optimally tailored to each learner [6]. Current generative models struggle with the consistency, controllability, and factual accuracy that educational content demands. Curriculum coherence and cultural sensitivity add further constraints [3]. Education could become one of the largest generative media markets—driven by the need for personalized learning at massive scale [15].
What Comes Next
Three forces will shape generative media through 2026 and beyond [3]:
The strategic implication is clear: expertise will shift from execution to orchestration. As capability becomes abundant, taste becomes the scarce resource.
Sources
- 1Deloitte (2024). State of Generative AI in the Enterprise 2024. deloitte.com
- 2Google DeepMind Blog (2024). deepmind.google
- 3Artificial Analysis & fal (2025). State of Generative Media Survey Report 2025. artificialanalysis.ai
- 4Deloitte (2025). Technology, Media & Telecom Predictions 2025. deloitte.com
- 5McKinsey & Company (2025). The state of AI in 2025. mckinsey.com
- 6Training Data Podcast (2025). Gorkem Yurtseven, Burkay Gur, Batuhan T. and Sonya Huang. youtube.com
- 7IAB (2025). State of Data 2025. iab.com
- 8Aream & Co. (2025). The State of AI in Gaming Survey. areamandco.com
- 9Tochilkin, D., et al. (2024). TripoSR: Fast 3D Object Reconstruction from a Single Image. arXiv:2403.02151. arxiv.org
- 10Variety (2025). Video Generation Model Evaluation in 2025. variety.com
- 11a16z Games (2024). State of AI in Gaming. gamedevreports.substack.com
- 124As & Forrester (2025). The State of Generative AI Inside US Marketing Agencies. aaaa.org
- 13Business Wire (2025). businesswire.com
- 14Mediaocean (2025). 2025 H2 Market Report. mediaocean.com
- 15Generative Media Conference (October 24, 2025). Gorkem Yurtseven keynote.
- 16Generative Media Conference (October 24, 2025). Jennifer Li (a16z).
- 17Menlo Ventures (2025). 2025: The State of Generative AI in the Enterprise. menlovc.com