New video reasoning model extracts timestamped, structured metadata from up to two hours of video in a single API call - no indexing, no preprocessing - with superior segmentation quality over leading general-purpose models
SAN FRANCISCO, April 20, 2026 /PRNewswire-PRWeb/ --TwelveLabs, the video intelligence company, today announced the general availability of Pegasus 1.5, its latest video reasoning model. Pegasus 1.5 introduces Time Based Metadata Extraction (TBM), a capability that lets users define a custom JSON schema and receive timestamped, structured metadata from video content up to two hours long - with no ingestion pipeline, no preprocessing, and no indexing step.
The release marks a shift in how organizations work with video data. Rather than relying on manual tagging, transcription, or frame-level analysis, Pegasus 1.5 reasons across entire videos to produce structured outputs that map directly to production workflows. Media companies, sports broadcasters, and content platforms can now convert raw footage into query-ready metadata in a single API call - transforming video from a storage cost into a queryable, monetizable data asset.
What Pegasus 1.5 Delivers
Pegasus 1.5 processes video as a multidimensional volume - not a sequence of frames. Its Time Based Metadata Extraction capability accepts a user-defined JSON schema and returns structured, timestamped results across the full duration of a video, up to two hours.
Key capabilities include:
- Time Based Metadata Extraction (TBM): Define a JSON schema describing the data you need - editorial segments, sports plays, brand appearances, speaker changes, shot types. Pegasus 1.5 returns timestamped, structured metadata that maps to your schema, discovering every instance across your entire video. No indexing, no preprocessing, no manual tagging.
- On-the-fly processing: Point the model at any video - a URL, an uploaded file, or a base64 string - and get results. No ingestion pipeline. No waiting for an index to build. A developer can go from zero to structured video metadata in a single API call within minutes of getting an API key.
- Multimodal Prompting: Provide a reference image of any entity - a person, product, or logo - and define what you are looking for. Pegasus 1.5 finds every moment that entity appears on screen, timestamped and structured. A sports broadcaster can locate every dunk from a specific player across a full season; a brand can measure product screen time across thousands of hours of content.
- Long-form video support: Process videos up to two hours in a single pass, maintaining context and continuity across the full duration. Feature-length films, full-length sports broadcasts, multi-hour archive tapes, and conference recordings - analyzed in a single request.
- Production-grade structured output: Deliver reliable, schema-compliant JSON at scale. In internal benchmarks on news content, leading general-purpose models exhibited high structured JSON failure rates. Pegasus 1.5 maintained consistent output fidelity across complex, multi-definition schemas.
- Superior segmentation quality: Produce more accurate temporal boundaries and segment-level metadata. On aggregate segmentation quality benchmarks, Pegasus 1.5 outperforms Gemini 3 Pro by 13.1%, with boundary accuracy within approximately 350 milliseconds.
Performance and Benchmarks
In head-to-head evaluations against leading general-purpose models, Pegasus 1.5 demonstrates measurable advantages in the two dimensions that matter most for production video workflows: segmentation quality and structured output reliability.
On aggregate segmentation quality - the accuracy of temporal boundaries and segment-level metadata - Pegasus 1.5 outperforms Gemini 3.1 Pro on aggregated segmentation by 13.1%. On structured JSON output reliability, particularly on complex content types like news programming, general-purpose models showed significantly higher failure rates in producing valid, schema-conformant JSON. Pegasus 1.5 was purpose-built for this workload and delivers consistent results where general-purpose models break down.
"Most AI models treat video as a sequence of images with audio attached. That approach works for demos, but it falls apart in production," said Jae Lee, CEO and co-founder of TwelveLabs. "Pegasus 1.5 treats video as what it actually is - a rich, multidimensional data source. TBM gives enterprises a way to define exactly what they need from their video and get it back as structured, time-coded data. No preprocessing. No manual steps. No waiting for an index to build. That is the difference between a research prototype and production infrastructure. And once video data is structured, it becomes a first-class input for any AI agent or automated system - as easy to parse as text."
Availability
Pegasus 1.5 is available today through the TwelveLabs API and is being demonstrated live at NAB Show 2026 in Las Vegas. Visit twelvelabs.io or contact the sales team for enterprise pricing and integration support.
About TwelveLabs
TwelveLabs builds video foundation models that transform how machines understand video. Its multimodal AI platform enables enterprises to search, analyze, and generate insights from video at scale. Based in San Francisco and Seoul, TwelveLabs serves customers across media, security, advertising, and enterprise software. Learn more at twelvelabs.io.
Media Contact
Amber Moore, Moore Communications, 1 5039439381, [email protected]
SOURCE Twelve Labs

Share this article