PhD position in Latent Topic Models for Big Data

PhD position in Latent Topic Models for Big Data

Posted by Rebecca Martin on Sun, 29/06/2014 - 15:22

The LIG (Grenoble, France) and AAI (Sydney, Australia) offer a
fully-funded 3 year PhD position on "Scaling Latent Topic/Class Models
to Big Data Collections and Streams".


Numerous pieces of content are currently exchanged in social media,
making them an important source of information. For example, people
share, per month, 30 billion pieces of content on Facebook and over 5
billion tweets (see for example the site This importance
is also reflected in the fact that, when searching for information
online, 18% of the users directly search on social media sites (as
Twitter, Facebook or blog sites), a proportion constantly growing.
Searching, filtering, enriching and organizing this information, as well
as being able to rapidly identify important new events, are major
challenges faced by researchers from different communities, as
information retrieval, data mining and machine learning.

Several approaches have been developed in the past to address these
challenges, even though not at the scale and speed required by current
data collections and streams. Among these different approaches, the ones
based on latent topic/class analysis (as Latent Dirichlet Allocation and
their hierarchical extensions) are particularly interesting as they
yield state-of-the-art results and allow one to categorize/annotate
documents with existing taxonomies (filtering and enriching), to infer
new taxonomies or complement existing ones (organizing) and to detect
outliers and new events (event detection). However, current latent topic
models have two major drawbacks that prevent their use on large-scale
collections and high-speed streams: (a) they are mainly static and do
not take into account the dynamics of the data, and (b) the inference
and learning mechanisms usually rely on Markov Chain Monte-Carlo (MCMC)
methods, which are too slow to be used in the big data era. The goal of
this project is precisely to address these two problems, by constructing
new latent topic models able to handle dynamic data, and by designing
new learning and inference methods able to provide good estimates of the
parameters of the new models under real-time and one-pass constraints.
The models and methods developed and implemented during the PhD will be
tested on real data collections and streams.

To apply

The application should include a brief description of research interests
and past experience, a CV, degrees and grades, a copy of Master thesis
(or a draft thereof), a motivation letter (short but pertinent to this
call), and relevant publications if any. Candidates are encouraged to
provide letter(s) of recommendation and contact information to reference
persons. Please send your application in one single pdf to

Duration: 3 years (2 years in Grenoble and 1 year in Sydney)
Starting date: October, 2014
Supervisors: Marianne Clausel (UJF/LJK, France), Massih-Reza Amini
(UJF/LIG, France), Eric Gaussier (UJF/LIG, France), Guandong Xu
(UTS/AAI, Australia), LongBing Cao (UTS/AAI, Australia)

Working Environment:
The PhD candidate will work at AMA team ( of the
LIG lab at Grenoble, France, and Advanced Analytics Institute
of UTS, Australie. LIG ( is a leading institution
in Computer Science in France. Grenoble is the capital of the Alps in
France, with excellent train connection to Geneva (2h), Paris (3h) and
Turin (4h). AMA team is a dynamic group working in Machine Learning and
connected scientific domains over 20 researchers (including PhD
students) and that covers several aspects of machine learning from
theory to applications, including statistical learning, data-mining, and
cognitive science. AAI’s vision is to be a world-leading,
interdisciplinary facility with a focus on innovation, practice-driven
analytics, decision-making research and education in broad-based
analytics areas.