Having a single AWS account for scaling up implies shared quotas, notably for inference.
Batch Inference at 300 Pixels
With quotas shared across teams (requests and tokens per minute), processing 2 million images on demand would have saturated the system.
That was not the only challenge. The PoC, conducted with about a hundred images, validated sequential processing. But at scale, a synchronous and unitary invocation “was almost an antipattern,” explains Lévi Bernadine, ML engineer at Decathlon Digital. Even when parallelizing, with the shared quotas, bottlenecks were created: queues, timeouts, rejections…
We also needed to ensure state management to prevent one error from crashing an entire batch. And control costs: an input image consumed tokens proportional to its size.
Under these conditions, Decathlon opted for asynchronous processing, with batch inference. This background processing does not block the system resources of other teams, halves the cost compared to on-demand processing, and ships native error handling. In exchange, it required managing the lack of guaranteed execution time for jobs.
Images were resized to 300 pixels in height or width (aspect ratio preserved), with JPEG compression at quality 85. The results were on average 96% smaller than the original files.
One Prompt, Two Variants
Decathlon experimented with distinct prompts for descriptions and keywords. Noting that results were not very coherent, they ultimately opted for a single prompt, which also had the advantage of consuming only one API request.
This prompt has two variants, assigned according to the nature of the image. On one side, product packshots. On the other, sports practice photos (contextual images: the aim is to describe the action, mood, emotion…).
Daily Preprocessing, Hourly Processing
The preprocessing workflow runs once per day. “This aligns with the period when our data lake tables are updated,” Lévi Bernadine said at the AWS Summit Paris. Its main components:
- Identify, in the DAM (Digital Asset Management), assets not yet fed into the generator or updated since
- Download them in original quality via the Decathlon CDN
- Prepare prompts for Claude (Anthropic) and Nova (Amazon) models
- Resize and compress
- Store the images, as well as the input files (JSONL containing the compressed image encoded in base64 and the tailored prompt)
- Status tracking (DynamoDB registry)
The processing workflow runs in parallel, every hour. It checks the registry for images ready to be processed (status “staging”). They are grouped into batches of 500 to 2,000 images. The bundle is passed to Bedrock – with an IAM bridge – and the Batch API pushes the results into S3 (descriptions + keywords in English and French). Airflow checks status every half hour (48-hour timeout). When a job finishes, post-processing starts and the DAM is updated.
€3,200 in LLM Costs… for €1.2M in Savings?
This serverless system can process 25,000 images per day. In terms of performance, they report validation rates “up to 93% across different evaluators and tool categories.”
Lévi Bernadine mentions another use case in exploration: identifying the presence of recognizable mannequins. He also notes the possibility of A/B testing images and descriptions in collaboration with the e-commerce teams.
In batch at 300px with Nova Pro, it cost €3,229 to process the 2 million images. More precisely, €2,125 in input (2.31 billion tokens, of which 85% represent the image pixels) and €1,104 in output (300 million tokens). With Claude Sonnet 4.5, the bill would have been about €18,000. On-demand and at full resolution (1200px), it would have required €30,000 with Nova Pro and €160,000 with Claude Sonnet.