A Recent advance in machine learning has shown great potential to solve logistics problems. In this article, we will show one possible approach of using intelligent agents to manage inventory.

Conventionally in inventory control models, we need to assume, for example, constant or known distribution of demand to simplify the system. This leads to limited applications because this kind of assumptions usually not happen. Only some
have stable demand, while many other economic valued products have ever changing customer needs. In reality, an optimal solution from the optimization algorithms, which require more time and data, might not be optimal. Actually, heuristics methods often yield better real-world results. This is why, in addition to the ease of implementation, the industry prefers heuristics approaches to optimal methods such as mixed integer programming.

These days inventory can be replenished using the strategy given by reinforcement learning. A reinforcement learning agent can be thought as a trained employee who learns from consequences of taking actions. For example, in the adjustment of safety lead time, a reward is given when the model suggests a safety factor which will obtain the target service level during next order cycle without having excessive inventory. In contrast, the agent will receive a penalty if it has not achieved these satisfied results. Repeating many times of taking actions and getting rewards (or penalties), the agent will gradually learn to make better decisions. This is just like the agent has gained more working experience, and eventually becomes an expert in this particular subject.

Reinforcement learning in inventory control is based on a simple heuristics approach. When the set point is too low, the agent will be told to raise up. When the set point is too high, it will just lower the setting. These straightforward and transparent logics are main advantages of this method when it comes to practical consideration. Most of industrial people do not like anything they cannot control. Black box models, complex algorithms which are difficult to understand, or any approaches they cannot explain causes and effects usually do not get buy-in. At the end of the day, the decision processes will not be purely automated. A practical solution will be more likely a combination of a sophisticated model and human judgment. People still would like to have a power to control and adjust when unusual situations happen. The reinforcement learning model, which is intuitive and can easily be tuned by users, is a promising alternative for this task. In our case,
when a store offers a big discount, we could assign more penalty to the model if the setting leads to product out of stock in the future.

Written by Sertis Team