Summary
Efficient approximate inference in
large Hybrid Networks (graphical models with discrete and continuous variables)
is one of the major unsolved problems in machine learning, and insight into good
solutions would be beneficial in advancing the application of sophisticated
machine learning to a wide range of real-world problems.
Such research would
benefit potentially applications in Speech Recognition, Visual Object Tracking
and Machine Vision, Robotics, Music Scene Analysis, Analysis of complex Times
series, understanding and modelling complex computer networks, Condition
monitoring, and other complex phenomena.
This theory challenge specifically
addresses a central component area of PASCAL, namely Bayesian Statistics and
statistical modelling, and is also related to the other central areas of
Computational Learning, Statistical Physics and Optimisation
techniques.
One aim of this challenge is to bring together leading researchers
in graphical models and related areas to develop and improve on existing
methods for tackling the fundamental intractability in HNs. We do not believe
that there will necessarily emerge a single best approach, although we would
expect that successes in one application area should be transferable to related
areas. Many leading machine learning researches are currently working on applications
that involve HNs, and we invite participants to suggest their own
applications. Ideally, this would be in the form of a dataset along the lines
of PASCAL.
Graphical Models
Graphical
models are a powerful framework for formulating and solving difficult problems
in machine learning. Being essentially a marriage between graph and probability
theory, they provide a theoretically elegant approach to thinking about complex
machine learning problems, and have had widespread success, being now one of the
dominant approaches. A great many problems in machine learning can be formulated
as latent or hidden variable problems in which an underlying mechanism, which
cannot be directly observed, is responsible for generating observable values --
based on these observations, we wish to learn something about the fundamental
generating process.
Hybrid
Networks
Hybrid networks are graphical models which contain both
discrete and continuous variables. Often, but not exclusively, natural
application areas arise in the temporal domain, for example speech recognition,
or visual tracking, in which time plays a fundamental role; one reason for this is
that many physical processes are inherently Markovian.
In the continuous case, a widely used
model is the Kalman Filter (KF), which is based on a linear dynamical
system with Gaussian additive noise. The discrete anologue of the Kalman Filter
is the Hidden Markov Model (HMM), in which hidden states are discrete,
and the output(visible) states may be discrete or continuous. Such models
are used in leading speech recognition software and tracking applications. Both the KF and the HMM are
computationally tractable. Recently, there has
been a recognition that many machine learning models would naturally have both
continuous (as in the KF) and discrete (as in
the HMM) hidden variables.
Hybrid Networks (HNs) are such Graphical Models with both
continuous
and discrete variables
. For example, imagine that a musical
instrumentis played at a time t, which we can model with a switch
variables(t)=1 : future sound generation can be modelled as a
hiddenGaussian linear dynamical system with transition
dynamicsp(h(t+1)|h(t),s(t)=1). When the musical instrument is at a
future time t is turned off, s(t)=0, a different dynamics occurs,
p(h(t+1)|h(t),s(t)=0). The observation (visible) process p(v(t)|h(t)) is
typically a noise projection of the hidden state h(t), say in the case of
sound, to a one dimensional pressure displacement. Based on the observed
sequence v(1),...v(T), we may with to infer the switch variables s(1),..s(T).
This particular kind of HN is called a switching Kalman
Filter.
The Challenge
A fundamental difficulty with Hybrid Networks is their intractability -- unfortunately,
the marriage of two tractable models, the Kalman Filter and Hidden
Markov Model, does not result in a tractable Hybrid Network. Recently, developing approximate
inference and learning methods in HNs has been a major research activity, across
a large range of fields, including speech, music transcription, condition
monitoring, control, robotics, and computer-brain interfaces.
Whist this challenge is largely theoretical in nature,
in order to have at least one concrete
problem, we will make available a dataset of acoustic recordings,
for example of a live piano recording. The challenge would be infer what notes
were played and when -- that is, to perform a transcription of the performance.
We will know the ground truth (since we generated the data), for which we can
compare competing solutions. Music transcription is a difficult, largely
unsolved problem, although initial attempts using HNs have demonstrated the
effectiveness of the HN solution. However, making faster approximation schemes
in this area would overcome the current barrier
to commercialisation of such techniques.
Although we framed, for concreteness here, a music example,
this is merely one instance of many application areas, and we do not expect any bias towards one particular application
area.