Call for PhD students: Efficient Sequential Learning in Structured and Constrained Environments, SequeL team, INRIA, Lille, France

Call for PhD students: Efficient Sequential Learning in Structured and Constrained Environments, SequeL team, INRIA, Lille, France

Posted by Rebecca Martin on Tue, 11/02/2014 - 01:00

Efficient Sequential Learning in Structured and Constrained Environments, SequeL team, INRIA, Lille, France (PhD position)

Keywords: multi-arm bandit, stochastic optimization, reinforcement learning, apprenticeship learning, learning on graphs, transfer learning

This Ph.D. program is focused on the problem of sequential learning in structured and constrained environments. The key aspect of this problem is that relatively little knowledge of the environment is available beforehand, and the learner (a virtual or physical agent) has to sequentially interact with environment to learn its structure and then act optimally. This problem encompasses a wide range of applications depending on the definition of the structure of the environment, the sequential nature of the interaction between learner and environment, and the type of constraints. In particular, this Ph.D. program will be driven by two application domains of great scientific and commercial importance: recommender systems and crowdsourcing. In recommender systems, the sequential interaction is particularly complex since the user-base keeps changing over time (users subscribe and unsubscribe to the service) and the set of items that can be recommended continuously evolve (products may be added or removed or change characteristics). On the other hand, the environment is highly structured, since users may have similar preferences and items share similar features that could be exploited to improve the quality of the recommendations. In crowdsourcing, users are payed to solve subtasks assigned by an automatic system. The main challenge in this scenario is that users may have very heterogeneous skills and the crowdsourcing campaign has strict, and often contradicting, constraints on resources, such as the money invested in the campaign, the time, the desired accuracy in the solution of the global task, etc. In this case, it is crucial that the learner identifies the best allocation of subtasks to users in order to meet most of the desired constraints. Finally, both applications usually involve large amounts of data (e.g., large user-base) which asks for efficient learning algorithms with minimal time, space, and sample complexity. In general, a successful sequential learning strategy should efficiently allocate the constrained resources to exploitation (making the best decision based on our current, but possibly imperfect, knowledge) and to exploration (decisions that may appear sub-optimal but which may reduce the uncertainty and, as a result, could improve the relevance of future decisions).


The main mathematical models of reference to deal with the problem of sequential learning are multi-armed bandits and reinforcement learning. The multi-armed bandit problem captures the essence of the exploration-exploitation trade-off in an unknown environment where only noisy observations of the performance of a set of arms (e.g., the preference of a user for a product) are available and the objective is to identify the arm with the best performance. On the other hand, reinforcement learning formalizes the more general problem of decision-making under uncertainty when the state of the environment evolves in response to the action taken by the learner.


The objective of this research program is to answer to a number of fundamental questions in sequential learning in structured and constrained environments:

• Sequential transfer learning from multiple and concurrent sources: It is often the case that similar problems belonging to the same domain should be solved over time (e.g., in crowdsourcing the user-base may be almost the same across different tasks). The objective of sequential transfer learning is to construct general knowledge from past experience and reuse it to improve the learning performance over subsequent tasks. The objective is to develop transfer algorithms for exploration-exploitation problems and understand how effective it could be and under which assumptions, negative transfer can be avoided. Furthermore, we also intend to investigate the scenario of transfer from parallel sources, like in distributed learning problems. In this case, the main constraints are time and communication. In fact, each source acts as a node in a network so that at each point in time the information available at different sources is limited and each source may be constrained in communicating the knowledge learned so far to other nodes.

• Learning on Graphs: Graphs offer a natural representation for the structured problems. The users of a social network, the similarity between the content, the location proximity of the sensors: all these can be represented by a (weighted) graph. A decision-making strategy then often queries either nodes or edges, e.g., readings from a sensor network, product offers to a particular user in a social network, etc. Graph representation however comes with several challenges. A large number of nodes poses significant storage and computational difficulties and efficient approximations need to be often designed. Furthermore, real-world graphs are typically dynamic, with changing number of nodes and connections. Finally, there are cases when even the graph structure itself is unknown and its discovery needs to be a part of an efficient strategy. Summing up, learning in realistic graph needs to account for astructure discovery.

• Sequential apprenticeship learning: In reinforcement learning, it is not always easy to define the desired behavior of an agent through a reward function. For instance, in recommender systems the objective is to maximize the long-term revenue, but other criteria may be of interest such as avoiding suggesting the same item too often, favor novel products, and introduce promotions. These behaviors are desirable but difficult to encode as a reward. Nonetheless, an expert may sequentially provide feedbacks to the strategy currently implemented by the learner over time and even propose corrections. The objective is to integrate the imitation learning and reinforcement learning in a unique learning system that can optimize the reward (e.g., long-term revenue) exploiting the corrections and examples provided by the expert. Another issue that should be considered is that the learner may have a finite budget of requests of feedbacks from the supervisor and should use them when it is more needed.

Job Description:

The PhD candidate will focus on one or more issues related to sequential learning in structured and constrained problems. The PhD candidate will first acquire expertise in different topics of machine learning such as online learning, multi-armed bandit, statistical learning theory, reinforcement learning, approximate dynamic programming, and algorithmic game theory. Then, the PhD candidate is expected to contribute to the advancement of the literature on this problem along many different lines: methodological (e.g., definition of general abstract models for a wide range of decision-making problems), theoretical (e.g., near optimality performance guarantees), and algorithmic (e.g., development of novel algorithms for specific decision-making problems). The research activity of the PhD candidate will be closely related to EU CompLACS project ( This will allow the PhD candidate to develop collaborations with other researchers participating in these research projects and it may also allow him/her to spend part of his research activity at partner laboratories such as University College London (UK), Montanuniversitat Leoben (Austria), TU Darmstadt, etc. Possibility of internships in the industry research labs such as: Technicolor Research or INTEL Research in California. The starting date of PhD program flexible after October 15th, 2014.


The applicant must have a Master of Science in Computer Science, Statistics, or related fields, possibly with background in reinforcement learning, bandits, or optimization. The working language in the lab is English, a good written and oral communication skills are required.


The application should include a brief description of research interests and past experience, a CV, degrees and grades, a copy of Master thesis (or a draft thereof), motivation letter (short but pertinent to this call), relevant publications, and other relevant documents. Candidates are encouraged to provide letter(s) of recommendation and contact information to reference persons. Please send your application in one single pdf to or The deadline for the application is April 15th, 2014, but we encourage the applicants to contact us as soon as possible. The final decision will be communicated in the beginning of June.

• Application closing date: April 15, 2014
• Duration: 3 years (a full time position)
• Starting date: October 15st, 2014
• Supervisors: Michal Valko, Alessandro Lazaric, and Remi Munos
• Place: SequeL, INRIA Lille - Nord Europe

Working Environment:

The PhD candidate will work at SequeL ( lab at INRIA Lille - Nord Europe located in Lille. INRIA ( is France's leading institution in Computer Science, with over 2800 scientists employed, of which around 250 in Lille. Lille is the capital of the north of France, a metropolis with 1 million inhabitants, with excellent train connection to Brussels (30 min), Paris (1h) and London (1h30). The SequeL lab is a dynamic lab at INRIA with over 25 researchers (including PhD students) which covers several aspects of machine learning from theory to applications, including statistical learning, reinforcement learning, and sequential learning.


• Duration: 36 months – starting date of the contract : October 2014, 15th
• Salary: 1957,54 € the first two years and 2058,84 € the third year
• Salary after taxes: around 1597,11€ the 1st two years and 1679,76 € the 3rd year (benefits included).
• Possibility of French courses
• Help for housing
• Participation for public transport
• Scientific Resident card and help for husband/wife visa


1. Jacob Abernethy, Elad Hazan, and Alexander Rakhlin. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization. In Proceedings of the 21st Annual Conference on Learning Theory (COLT'08), 2008.
2. Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, and Karthik Sridharan. Stochastic Convex Optimization. In Proceedings of the Conference on Learning Theory (COLT'09), 2009.
3. Pieter Abbeel, Andrew Ng: Apprenticeship learning via Inverse Reinforcement Learning, in Proceedings of the 21st International Conference on Machine Learning, (ICML) 2004.
4. P Cesa-Bianchi, Nicolo , Gentile, Claudio, and Zappella, Giovanni. A Gang of Bandits. In Neural Information Processing Systems (NIPS), 2013.
5. Li, Lihong, Chu, Wei, Langford, John, and Schapire, Robert E. A Contextual-Bandit Approach to Personalized News Article Recommendation. WWW 10, 2010.
6. Alon, Noga, Cesa-Bianchi, Nicolo , Gentile, Claudio, and Mansour, Yishay. From Bandits to Experts: A Tale of Domination and Independence. In Neural Information Processing Systems (NIPS), 2013.
7. A. Lazaric. Transfer in Reinforcement Learning: a Framework and a Survey. In M. Wiering and M. van Otterlo, editors, Reinforcement Learning: State of the Art, Springer, 2011.
This call is posted at with the most up-to-date information.