Fraud Detection in Retail: ML Model for Contactless Shopping

Our Client

Eurocash is one of the largest distributors of groceries in Poland, owning a network of discount wholesale stores operating statewide.

Our client places great importance on customer satisfaction, which translates into a continuous improvement of customer service. One of their goals was to automate processes, reduce the time required, and decrease the workload for shopping cart verification before customers exit the store.

To ensure seamless service and protect the company’s profits, the client sought enhancements in detecting errors and fraud at the self-service cash registers, along with a corresponding decrease in the number of carts requiring manual verification. Manual checks are not well-received by customers and lead to increased costs.

NEEDS & REQUIREMENTS

Our goal in this project was to reconcile two seemingly contradictory objectives: effectively detecting customers’ fraudulent behaviors while reducing the number of shopping cart checks.

From a technical standpoint, our aim was to minimize financial losses caused by discrepancies between invoices and the actual contents of the shopping carts. Special attention was given to cases where the cart contained products not accounted for in the invoice, as these were the primary drivers of losses.

The first step involved installing floor scales specifically designed to weigh the shopping carts in the stores. From that point forward, all customers who completed their shopping and settled their invoices were required to weigh their full carts on the scales before leaving. Any discrepancies between the cart’s weight and the total weight of the goods listed on the invoice could trigger further investigation. Consequently, project management decided to reinforce the monitoring system with Machine Learning technology.

The second step entailed implementing Machine Learning algorithms to analyze historical data and identify complex patterns that could indicate potential issues within the carts.

This step was divided into three parts:

Business analysis – determining the goals, definitions, and key performance indicators (KPIs) of the analytical model.
Initial modeling – developing the cart classification model and implementing reports.
Expanded modeling – constructing additional analytical models to support and enhance fraud detection, such as customer classification.

OUR APPROACH

The primary challenge for the project team was to develop an accurate classification tool for manual inspection. During the business analysis phase, the following key performance indicators (KPIs) were established:

The model should flag a maximum of 20% of carts for inspection to keep inspection costs (including inspectors’ salaries, extended service duration, and customer dissatisfaction) at an acceptable level.
The model must correctly identify at least 80% of incompatible carts to significantly reduce losses.
The model should have an error rate of no more than one in five controls to minimize false alarms, which can lead to customer dissatisfaction and increased costs.

Operating the model online posed another challenge. After weighing the cart, the model had to analyze the list of goods on the invoice and provide an accurate response regarding whether the cart should be directed for manual inspection. This response had to be generated within the shortest possible time to minimize customer waiting time for the final decision.

Collecting, integrating, and preparing scattered data sources from throughout the organization was also a significant task. Additionally, the installation of weighing devices was still in progress, resulting in constantly changing data formats.

Customer behaviors needed to be incorporated into the machine learning scenarios as well. Unexpected behaviors, such as placing a coat in the cart, including goods in collective packaging, or using their own shopping bag, could disrupt accurate measurements and had to be accounted for.

And what if the cart itself suddenly loses a screw or two? It does affect the total weight and needs to be considered in the model.

OUR SOLUTION

To develop an analytical model for cart classification, our team utilized the Microsoft Azure (Databricks) environment. Based on the available data, we constructed statistical models using Python and its libraries, such as scikit-learn.

After thorough verification, we ultimately selected Catboost and optimized its parameters using the Hyperopt optimization method.

A critical factor in the project’s success was the meticulous preparation of the data. As we had control over the entire data collection process, we ensured that we obtained accurate data in the appropriate formats and quality.

An illustration of an analytical model in a Microsoft Azure (Databricks) environment.

Another important aspect was identifying the features that had a significant impact on making accurate decisions. By combining the expertise of our client’s specialists with advanced data analysis techniques, we identified the key factors and their relevant influence on the model.

Last but certainly not least, the training of the algorithm played a vital role in the overall solution. The model already meets the established KPIs, and its ability to learn from new incoming data is expected to deliver even more precise results.

The main challenge faced by the project team was to develop a precise classification tool for manual inspection. During the business analysis phase, the following KPIs were established:

First bar graph shows the rate of detected discrepancies increased by 50%. Second bar graph shows the rate of false alarms decreased by 67%.

THE RESULTS

Thanks to the implementation of Machine Learning technology instead of simple business rules, the rate of detected discrepancies increased by 50%.

Simultaneously, the rate of false alarms decreased by more than threefold, falling well below the 20% KPI threshold. Additionally, the model directed fewer carts to manual inspection compared to the rule-based approach, meeting the KPI requirement of a maximum 20% level. The identified variances accounted for approximately 90% of the total value of errors.

Customer Stories

Fraud Detection in Retail: Machine Learning Solutions for Contactless Shopping