Evolvier

E-commerce creative production

GenIm: Engineering a Generative AI Creative Studio for E-Commerce

GenIm is Evolvier's own generative AI creative studio — it turns flat product photos into studio-grade lifestyle imagery and HD video. This case study covers how we architected, trained, and shipped it.

Evolvier (first-party product)Global

The problem

Challenge

E-commerce teams need lifestyle creative for every SKU, market, and season, but physical shoots cannot keep pace and generic image models distort the product itself. The engineering problem was generating photorealistic scenes around an exact, unaltered product — at production scale, with predictable rendering cost.

Key metrics

Turnaround from flat photo to finished asset (to be confirmed)

Rendering pipeline uptime (to be confirmed)

Cost per finished asset vs. studio shoot (to be confirmed)

Context

Evolvier builds and operates its own products alongside client work — that is the Product DNA pillar, and this case study sits squarely inside it. GenIm is our generative AI creative studio: it takes the flat product photography an e-commerce team already has and renders high-resolution lifestyle imagery and HD video from it, adapted for different markets and demographics. There is no anonymized client here. We were the product owner, the architecture team, and the operations crew, and the platform described below is the one we run in production today.

That ownership shapes how to read what follows. Every architecture call had a direct cost we paid ourselves — in GPU bills, in queue backlogs, in support load when something rendered badly. The decisions documented here are the ones that survived contact with production.

The challenge

Lifestyle creative is what makes a product page convert: the jacket on a person, the lamp in a room. Producing it traditionally means studio bookings, model contracts, and reshoots — weeks of scheduling per campaign, multiplied by every market and season a catalog serves. Past a certain SKU count, shoot logistics quietly decide what gets marketed.

The obvious shortcut, prompting a general-purpose image model, fails in a way every merchandiser recognizes: the model treats the product as a suggestion. Logos warp, stitching melts, fabric drapes like plastic, and color shifts enough to trigger returns. For commerce, an attractive image of approximately your product is worse than no image, because the customer who buys from it receives something else.

That gave us three hard constraints. The product must pass through generation exactly unaltered. Turnaround must be minutes, not weeks, or the tool does not change anyone's merchandising calendar. And cost per asset must stay predictable as catalogs and regions multiply. Each constraint shaped a different layer of the architecture.

Architecture & decisions

Decoupling the studio from the render pipeline

GPU inference is slow, expensive, and spiky — everything a web application must never block on. So GenIm is decoupled from day one: the studio interface (Next.js and TypeScript) only ever writes generation jobs to a queue, and containerized GPU workers (Docker on Kubernetes) consume them asynchronously. Worker capacity scales with queue depth rather than with studio traffic, which keeps the two cost curves independent: a burst of catalog uploads never degrades the UI, and a failed render retries in the pipeline without the user's session being involved. The entire environment is defined in Terraform, the same infrastructure-as-code and automated-pipeline discipline we apply to client cloud work — reproducible environments, health checks, and deploys that do not require a maintenance window.

Fidelity as a model constraint, not a post-filter

The defining product decision was to custom-train diffusion models rather than wrap a general-purpose API. The reason is that product-geometry fidelity cannot be patched on after generation: no filter can un-melt a logo. The constraint has to live in the model itself, so training explicitly targets the commerce failure modes — fabric lines, stitching, logo placement, and color depth are treated as fixed geometry while the scene is generated around them. Demographic and spatial parameters (model demographics, background, lighting) stay controllable per region from the same source photo. This is where most of the engineering time went, and it is the reason GenIm is a trained product rather than a prompt template. It is also the foundation of our AI development practice: production AI is a data-and-constraints problem long before it is a prompting problem.

Data isolation for customer catalogs

Catalog import means holding other companies' product data and imagery, so tenant isolation was a launch requirement, not a roadmap item. Catalogs and generated assets are scoped per account, encrypted with AES-256 at rest and TLS 1.3 in transit, behind token-based authentication — the same security baseline we hold across Evolvier platforms. The render workers themselves never receive database credentials: each job carries a scoped payload containing only the assets and parameters that job needs. A misbehaving or compromised worker has nothing else to read, which keeps the blast radius of the most exposed component as small as the job it is running.

Integration patterns: catalog in, channels out

We kept GenIm's integration surface deliberately narrow. On the way in, import adapters accept product catalogs and existing pack shots and normalize them into a consistent internal format, so the rendering pipeline works from one contract regardless of source. On the way out, finished work renders to channel-ready formats — high-resolution stills for product pages, HD video clips sized for social channels — and job completion is reported asynchronously, consistent with the queue-first design. Narrow interfaces here were a scaling decision: every new input source or output channel is an adapter, not a pipeline change.

Build highlights

  • Custom-trained diffusion models with geometry preservation as a first-class training objective, not a post-processing step.
  • Demographic and spatial control — model demographics, background environment, and lighting — parameterized per region from a single source photograph.
  • Automated HD video output as an additional rendering pass over already-generated assets, so motion creative stops being a separate production line.
  • Queue-backed GPU rendering with infrastructure as code, so capacity, cost, and deploys are reproducible rather than artisanal.
  • Shipped end to end by the same senior squad model we sell as product engineering — architecture, build, and operations under one team.

Results

The numbers below are tracked in production and published once verified — we do not publish figures we cannot stand behind, including for our own products.

Turnaround from flat photo to finished asset (to be confirmed)
Rendering pipeline uptime (to be confirmed)
Cost per finished asset vs. studio shoot (to be confirmed)

The qualitative result is the one the architecture was aimed at: lifestyle creative stopped being rationed. Every SKU can ship with scene imagery instead of a flat photo on white, regional variants are a parameter change rather than a reshoot, and video is no longer reserved for the hero products that earned a videographer's day rate.

Build something similar

If you are weighing a generative AI build — custom-trained models, GPU pipelines, tenant-isolated data, predictable rendering economics — this is tuition we have already paid on our own platform. The same team that built GenIm runs our AI development and product engineering engagements, and the conversation starts with architecture, not a sales deck.

Stack

PythonNode.jsTypeScriptNext.jsAWSDockerKubernetes (EKS/GKE)Terraform

Services used

Build something similar

The decisions in this case study transfer. Talk through your system with the engineers who made them — a senior engineer replies within one business day.

Prefer email? support@evolvier.com