Customer Stories

AI Computer Vision for Retail Shelf Monitoring in Pharma

AI & Machine Learning
AI Consulting
The post thumbnail

OUR CLIENT

A Global Pharma Retail Network Replaced Manual Shelf Audits with Automated Product Detection at Scale

Our client operates within the pharmaceutical sector, managing a network of pharmacies subject to strict commercial agreements governing how products must be displayed, positioned, and rotated on shelves. These planogram commitments form a core part of their trade and supplier relationships, and non-compliance carries direct financial and reputational risk.

The assortment is predominantly over-the-counter (OTC), meaning shelf visibility and product placement are direct commercial levers rather than purely operational housekeeping. Shelf photography taken across multiple locations captures a wide diversity of lighting conditions, camera angles, and product configurations. The assortment spans a large number of SKU classes, and the portfolio evolves continuously as new products and variants are introduced. Any viable solution needed to be accurate, scalable, and maintainable without requiring a full engineering cycle for every product change.

The engagement also required comprehensive technical documentation to support knowledge transfer and long-term maintenance within the client’s internal teams, ensuring the solution could be owned and extended independently after project completion.

BUSINESS CHALLENGE

Scale, Speed, and Spatial Intelligence: Why Standard Approaches Fall Short

The project surfaced a set of tightly interconnected challenges. Individually, each would have been manageable. Together, they demanded a system built for long-term production operation, not a one-off experiment.

  • Scalable object detection in a variable environment.
    Shelf images taken across many locations vary significantly in lighting, perspective, and product arrangement. The model needed to perform consistently well outside a controlled training set, across real-world conditions.
  • A continuously evolving SKU portfolio.
    New products and packaging variants are introduced regularly. A system requiring full model retraining for every assortment update would not be operationally sustainable at the pace the business requires.
  • Spatial rule verification, not just product detection.
    Knowing which products are present on a shelf is necessary but not sufficient. The system needed to evaluate the spatial layout itself, assessing orientation, quantity, and display density against configurable planogram rules at a product-position level.
  • The cost of manual inspection at scale.
    Without automation, maintaining shelf compliance across hundreds of locations placed growing strain on operational teams. The objective was to reduce routine inspection effort and surface deviations early, before they translate into commercial impact.
  • A credible path from proof of concept to production.
    The solution required a coherent, documented pipeline from data collection and annotation through to continuous training and inference in a clustered cloud environment, enabling the client to own and evolve it independently over time.
Pharma Computer Vision

OUR SOLUTION & APPROACH

End-to-End AI Computer Vision Pipeline for Retail Shelf Monitoring

Our approach was structured across four interconnected workstreams, each addressing a specific layer of the technical challenge while integrating into a unified, maintainable production pipeline.

1. Data Pipeline and Annotation Framework

Annotations were exported from Label Studio into a standardised format aligned with object detection training requirements. Data from multiple sources was merged, filtered, and quality-checked, with superclass grouping applied at brand level to ensure label consistency and enable group-level performance metrics. For example, variant-level SKUs across the same product line, such as a 10-tablet and 20-tablet pack variant, were mapped to a shared parent class – enabling both granular detection and aggregated brand compliance scoring.

Synthetic training images were generated to supplement limited real-world samples, using inpainting, alpha compositing, JPEG quality randomisation, and class-balance control. Controlled test sets were maintained separately to enable reliable performance benchmarking across training iterations.

2. Model Architecture and Training

The detection model is built on RT-DETR v2, a transformer-based object detection architecture that delivers global context modelling, end-to-end training, and scalable inference without the two-stage region proposal pipeline typical of older architectures. The configuration uses a hybrid ResNet-50 and Vision Transformer backbone with 1024×1024 input resolution, enabling reliable detection of small, densely packed objects.

Hyperparameter optimisation was handled via Optuna with TPE sampling and early pruning across learning rate, regularisation, and scheduler parameters, avoiding the cost of exhaustive grid search. Augmentations were tailored to pharmacy-specific conditions: geometric transforms, photometric variation, JPEG degradation, occlusion simulation, and shadow effects.

3. Few-Shot Learning for New SKU Onboarding

For introducing new product classes, a knowledge distillation approach was adopted: a teacher model guides a student model using KL divergence loss with temperature scaling. Selective layer freezing concentrates learning on higher-level feature layers. New classes can be integrated using as few as 10 to 50 synthetic images, typically generated from product packaging mockups, without requiring a full training run.

Catastrophic forgetting of existing classes is controlled through a mixture of new and historical samples alongside a weighted distillation schedule, preserving recognition quality for the established assortment while extending coverage to new products.

4. Inference Engine and Shelf Rule Verification

Detection outputs are post-processed using non-maximum suppression tuned for densely packed pharmacy shelf configurations. Product positions are then clustered using DBSCAN with a weighted distance function that accounts for both vertical and horizontal shelf structure, enabling accurate identification of shelf-level product groupings.

Configurable planogram rules are applied against detected spatial layouts, evaluating product count, orientation, and display density within adjustable tolerance thresholds. Non-compliant arrangements are flagged for review. Training and inference operate on a GPU cluster in the cloud, designed as a continuous workflow rather than a batch job.

Technology Stack:
Area Tools & Frameworks
Model & Architecture
  • RT-DETR v2 (object detection model)
  • ResNet-50 (backbone)
  • Vision Transformer (hybrid backbone component)
Training & Optimisation
  • Optuna (hyperparameter optimisation, TPE sampling)
  • Knowledge Distillation (KL divergence loss with temperature scaling)
Data & Annotation
  • Label Studio (annotation tool)
Inference & Clustering
  • DBSCAN (spatial clustering of product positions)
  • Non-Maximum Suppression / NMS (post-processing)
Infrastructure
  • Databricks (clustered GPU environment for training and inference)

RESULTS & IMPACT

From Reactive Auditing to Automated Shelf Intelligence

By embedding an AI computer vision layer directly into its shelf monitoring operations, the client has moved from periodic manual inspection to a systematic, data-driven compliance model. The impact spans both operational efficiency and strategic capability.

The system represents a meaningful shift in how shelf compliance is managed: from a labour-intensive review cycle to a scalable, continuous monitoring workflow that surfaces exceptions rather than requiring end-to-end manual review.

  • Automated shelf layout assessment.
    Spatial compliance is evaluated automatically against planogram rules, replacing reliance on periodic manual audits and enabling consistent shelf standards across all locations simultaneously.
  • Reduced inspection workload.
    Operational teams are redirected from repetitive image review toward exception handling and commercial decision-making, increasing the value of each hour of human review time.
  • Scalable SKU onboarding.
    New products are integrated via few-shot learning using limited synthetic data, eliminating full retraining cycles and reducing the time from product launch to monitored shelf presence.
  • Transparent ML-to-business reporting.
    Multi-dimensional metrics including mAP, F-beta scores, and superclass-level compliance rates bridge the communication gap between ML teams and commercial stakeholders, enabling clearer performance accountability.
  • A production-grade foundation for scaling.
    The transformer architecture, standardised data pipeline, and configurable rule engine support continuous improvement and expansion without requiring the system to be rebuilt.
  • A defined path to edge deployment.
    Model quantisation is scoped as the next development phase, enabling field inference on resource-constrained devices independent of cloud connectivity.

CONTACT US

Looking to Scale Your AI Initiatives?

If your organization is facing capacity challenges in AI & Data Analytics, or is ready to move from AI pilots to production-scale platforms, our team can provide the expertise, speed, and track record you need – contact us.