Grok Imagine 1.5 Shows the Real Pricing Shape of API Video

xAI lists Grok Imagine 1.5 Preview with image input pricing, resolution-based per-second output pricing, and a 60 RPM limit. That matters more than another demo clip.

Grok Imagine 1.5 Shows the Real Pricing Shape of API Video
Photo / Unsplash

Summary

xAI released grok-imagine-video-1.5-preview on June 3, 2026. The news page frames it as an image-to-video model available through the xAI API in preview; the model documentation puts it into an even more concrete developer shape: model name, alias, pricing, rate limit, and regions are all listed. My read is that the pricing entry is one of the main signals in the release. A video model that only shows samples remains a creative tool story. A video model priced per output second in API docs starts to face the questions real engineering teams ask: what does a call cost, where is the limit, and can this sit inside a budgeted workflow?

The official model page is more specific than a single per-second figure: image input costs $0.01, output video is priced by resolution and second, with 480p at $0.08/second and 720p at $0.14/second. It also lists rate limits at 60 requests per minute and availability in us-east-1, eu-west-1, and us-west-2. These numbers are less glamorous than a polished clip, but they are more useful for builders. Generative video is often marketed as a moment of inspiration; production use collides with batch cost, retries, source-image hosting, queueing, review, and procurement. By placing Grok Imagine 1.5 inside the API and pricing system, xAI brings those constraints into the first evaluation conversation.

This piece does not argue that Grok Imagine 1.5 is better than Sora, Veo, or any other video model. xAI has not published a reproducible cross-model quality benchmark in the official material. The more durable question is what happens when video generation gets a per-second output meter. That unit pushes teams away from asking whether a clip looks impressive in isolation and toward asking whether that clip deserves to exist inside a product flow.

What happened

The official release page confirms the core product shape. grok-imagine-video-1.5-preview turns a single still image into video. You provide a starting frame and a prompt that describes the motion; the model animates the scene, including camera moves, atmosphere, and physics, while staying faithful to the source image. xAI says clips can be generated at up to 720p. The prompt can describe the camera move, pacing, and sound design; the code example sets duration=10, resolution="720p", and retrieves the generated result from response.url.

The model documentation adds the operational facts. The model’s alias is grok-imagine-video-1.5-2026-05-30. Its modality is image -> video. Its pricing is listed as output $0.080 per second. Its request limit is 60 requests per minute. Its available regions are us-east-1, eu-west-1, and us-west-2. Those fields matter because they show xAI treating the model as an API product entry, not only as a launch-page capability.

The cautious reading is important. The markdown model page lists $0.080 per second; it does not, in that same compact page, fully specify every production cost component such as failed attempts, storage retention, queue priority, or any post-preview price guarantee. The release-page example contains duration=10, but the page does not publish a duration ceiling. A responsible evaluation should treat the visible per-second output price as the current baseline, then measure the missing operational costs in a real prototype.

Why it matters

The first reason it matters is budget granularity. Text models are metered by tokens. Image models are usually evaluated by the image. If video is metered by output seconds, generation length becomes a direct product-control lever. Builders will start asking whether a user action should trigger a video at all, whether a still image is enough, whether only high-value events should generate clips, and whether maximum duration belongs in business logic. Pricing units reshape product design; that effect lasts longer than the launch sample.

The second reason is that bulk production becomes harder to hand-wave. Teams often discuss generative video as if “more assets” were the default win. Once it is an API, bulk means queues, retries, caching, moderation, failed attempts, and budget alarms. A 60 RPM limit is not a negative signal. It is a useful boundary for a preview-stage service. Clear limits let engineering teams plan; vague promises of unlimited scale create worse decisions.

The third reason is procurement language. Enterprises and SaaS teams struggle to buy a delightful creative button. They can buy a backend service with a model name, price, limit, region, console, docs, SDK, and API key. Grok Imagine 1.5 enters the same developer system where xAI already sells other model capabilities. That moves the internal conversation from “will users enjoy this effect” toward “can this be governed as a production pipeline step.” For commercial adoption, that is more important than social distribution of sample clips.

Builder impact

If you are evaluating Grok Imagine 1.5, start with the cost model rather than the highlight reel. A practical evaluation sheet needs at least four columns: trigger condition, target duration, retry policy, and cacheability. The official per-second price lets that sheet become concrete. The official 60 RPM limit prevents you from assuming every request can finish synchronously. This exercise will kill weak use cases quickly and reveal the few workflows where generated video has real value.

The API shape encourages three early patterns. The first is high-value, low-frequency asset generation: product hero motion, campaign visuals, or key scenes in education and games. The second is a semi-automated creative workflow where a human selects or approves still frames before the system animates them. The third is a capped batch job where only approved assets move into video generation. My judgment is that the first two are likely to be healthier early uses, because they tolerate human selection and explicit budget control.

The image_url parameter deserves its own architectural attention. It means the source image must already be hosted somewhere the model can access. Video generation is a later step in the chain, not the whole chain. You need object storage, permission rules, expiring URLs, a task queue, result URL persistence, and a decision about whether failed attempts reuse the same source image. With that plumbing, Grok Imagine 1.5 can behave like a backend capability. Without it, it becomes another expensive generate button.

What to ignore

Ignore the illusion created by a single sample clip. Video demos are naturally selected for the best image, the smoothest prompt, and the camera motion that hides defects. API cost accumulates across every failed attempt, prompt revision, accidental trigger, and retry. A programmable video model should be judged with cost, throughput, and workflow fit on the first screen; visual polish belongs later in the evaluation.

Also ignore broad claims that the price is simply cheap or expensive. $0.080 per second does not make the model suitable for every workflow, and it does not rule it out for production. The real question is whether video creates more value than the call costs in your product, and whether you can cap waste. For low-conversion bulk content, the cost boundary will appear quickly. For high-value product demos or brand assets, the same price may be clear enough to manage.

Finally, ignore commitments that are not in the official sources. The release page says prompts can describe sound design; that is not the same as a full audio-production contract. The model page lists visible input and output prices, but it does not turn the entire production bill into a long-term SLA. The rational preview posture is to integrate narrowly, record real failure and rework rates, then decide whether the usage deserves expansion.

Technical takeaway

The clearest technical-commercial shape is this: grok-imagine-video-1.5-preview is image -> video, accessed through the xAI API, using image_url as the input image reference and response.url as the generated output reference. The release page says clips can reach up to 720p. The model page lists image input at $0.01, 480p output at $0.08/second, 720p output at $0.14/second, rate limits at 60 RPM, and availability in three regions.

My bottom line: Grok Imagine 1.5 moves video generation into a system-design evaluation. For xAI, that is a pragmatic distribution choice. For builders, it is a warning and an opportunity: budget, limits, and workflow design now matter as much as visual quality.

Sources

  1. Grok Imagine 1.5 Preview / official
  2. Grok Imagine Video 1.5 Preview model docs / official
  3. Grok Imagine Video 1.5 Pricing on ImagineArt and xAI / blog