Learning and repeated decision-making
One of the problems I am interested in studying is learning behavioral dynamics of humans: can we learn how people make decisions (i.e. how people learn), and can we use the corresponding models to predict what their future decisions will be? To answer these questions, I use the framework of of sequential decision-making to model the learning process and decision-making process of humans. Players making repeated decisions, in which they learn over time to optimize their choices, can be efficiently modeled by a sequential process in which they optimize a payoff function at each step, linked to the “reward” they experience. The techniques developed leverage known models, such as the replicator dynamics and the hedge algorithm. In order to understand if these decision-making process models converge to an equilibrium (or not), we use optimization methods commonly developed in machine learning, such as mirror descent and stochastic gradient descent. Depending on the assumptions made on the learning process humans use to make their decisions, we prove convergence of these processes to a set of equilibria (in some cases Nash equilibria of an underlying game). The more specific the assumptions on the learning process humans use, the more guarantees there is for the type of convergence (i.e. on average, no-regret, almost surely etc.), and potentially for convergence rates. For illustration purposes, we have implemented an online gaming framework on Mechanical Turk (Amazon’s crowdsourcing service for parallelizable tasks). In this online game (see video to gain a better understanding of the experiments), distributed online players use Mechanical Turk to play against each other, while we watch the convergence rate of their game, and use our algorithms to predict the decisions they will make, based on what we observed they did in the past. The results are very exciting, humans indeed converge to the equilibria predicted by our models, and the learning rates we infer are representative of their behavior well enough to predict a few steps their future actions in time. This type of work has numerous applications, in particular systems in which players (for example companies such as Waze/Google, INRIX, Apple Maps, routing motorists running their apps) compete for a given resource selfishly (for example road capacity), and improve their decision making process over time by learning the dynamics of their agents (motorists).