| |
Abstracts for Talks and Related Material
FALL 2003
|
|
Talks
Title: Fast Methods for Kernel-based Text Analysis
Presenter: Taku Kudo
Abstract:
Kernel-based learning (e.g., Support Vector Machines) has been
successfully applied to many hard problems in Natural Language
Processing (NLP). In NLP, although feature combinations
are crucial to improving performance, they are heuristically
selected. Kernel methods change this situation.
The merit of the kernel methods is that effective feature
combination} is implicitly expanded without loss of generality and
increasing the computational costs. Kernel-based text analysis shows an excellent
performance in terms in accuracy; however,
these methods are usually too slow to apply to large-scale text analysis.
In this paper, we extend a Basket Mining algorithm to convert a
kernel-based classifier into a simple and fast linear classifier.
Experimental results on English BaseNP Chunking, Japanese Word Segmentation and Japanese Dependency
Parsing show that our new classifiers are about 30 to 300 times faster
than the standard kernel-based classifiers.
Link: PDF, PPT slides
Title: Machine learning for evolutionary computation. Evolutionary computation for machine learning.
Presenter: Xavier Llora
Abstract:
Machine learning and evolutionary computation present several areas of
intersection for mutual benefit. Machine learning provides tools for
improving the competence of evolutionary algorithms, that is solve
problems quickly, efficiently, and reliable. Some of the examples of the
usage of machine learning for evolutionary computation mainly deal with
decision making facets of such evolutionary processes. On the other
hand, evolutionary computation provides new ways for approaching machine
learning problems. Among others, multiobjective evolutionary computation
easily analyze hypothesis tradeoffs for a given learning problem.
Title: Learning Control Knowledge Using Approximate Policy Iteration
Presenter: Bob Givan
Abstract:
Approximate policy iteration is a well known technique for heuristically solving large stochastic planning problems. This technique has generally
failed to generalize to highly structured domains where the state-value functions that must be represented are very complex. In this talk, I discuss
approximate policy iteration with the usual value-function learning step replaced by a learning step in plan (policy) space. I present policy-language biases
for this learning step that enable the solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular,
the technique discussed can induce high-quality, domain-specific planners for classical planning domains (both deterministic and stochastic variants) by
solving such domains as extremely large MDPs.
Title: Learning from positive and unlabeled examples
Presenter: Bing Liu
Abstract:
In traditional supervised learning, a classifier is built using labeled
training examples of every class. The resulting classifier is then
used to assign classes to future data. Although this classic model
is important, in practice one also encounters a related problem.
That is, one has a set of examples of a particular class P (called
positive class) and a set of unlabeled examples U that contains
instances from class P and also other types of instances (called
negative instances). One wants to build a classifier to classify the
instances in the unlabeled set U or some future data into positive and
negative classes. Due to the fact that there is no labeled negative
example available for learning, traditional classification techniques
are not directly applicable. In this talk, I will first discuss some
theoretical foundations for learning in this setting and show that
the problem can be posed as a constrained optimization problem
and that under appropriate conditions the solution to the constrained
optimization problem will give us a good classifier. After discussing
a few existing (mainly heuristics) techniques for solving the problem,
I will present two more principled approaches based on logistic
regression and a biased formulation of SVM. Experimental results
in the domain of text classification will also be discussed.
Title: Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination
Presenter: Dan Gusfield
Abstract:
A phylogenetic network is a generalization of a phylogenetic tree,
allowing structural properties that are not tree-like. With the growth
of genomic data, much of which does not fit ideal tree models, and the
increasing appreciation of the genomic role of such phenomena as
recombination, recurrent and back mutation, horizontal gene transfer,
gene conversion, and mobile genetic elements, there is greater need to
understand the algorithmics and combinatorics of phylogenetic
networks. Understanding recombination is particularly important,
because recombination is the key element of ``gene mapping experiments"
that nature has already done for us. Those experiments can help
determine the location of genes involved in gene-influenced diseases
and of economically important traits. But in order to interpret these
natural experiments, we must better understand the algorithmics and
combinatorics of recombination in networks. To date, only a few papers
have addressed this issue.
Wang et al. studied the problem of constructing a phylogenetic network
for a set of n binary sequences derived from the all-0 ancestral
sequence, when each site in the sequence can change from 0 to 1 at most
once in the network, and recombination between sequences is allowed.
They showed that the problem of finding a phylogenetic network that
minimizes the number of recombination events is NP-hard, but gave a
polynomial-time algorithm (O(nm + n^4)-time, for n sequences of length
m each) that was intended to determine whether the sequences could be
derived on a phylogenetic network in which the recombination cycles are
node disjoint. We call such a network a ``galled-tree". The paper by
Wang et al. is seminal in defining this problem and asserting that it
has an efficient solution. Unfortunately, the algorithm given there is
incomplete and only constitutes a sufficient, but not a necessary, test
for the existence of a galled-tree for the data.
In this talk we do several things. By more deeply analyzing the
combinatorial constraints on galled-trees, we obtain an algorithm that
runs in O(nm + n^3)-time and is guaranteed to be both a necessary and
sufficient test for the existence of a galled-tree for the data. If
there is a galled-tree, the algorithm constructs a ''reduced"
galled-tree that minimizes the number of mutations occurring on
recombination cycles. We prove that every reduced galled-tree for the
data uses the same number of recombintions, which is the minimum number
of
recombinations needed for the data, over all possible
phylogenetic networks for the data, even
if multiple cross-overs are allowed. Hence, when a galled-tree exists,
the problem of minimizing the number of recombinations can be solved
efficiently. The effect is that the galled-tree is a phylogenetic
network that explains the input sequences with the ``littlest
deviation" from a true tree model. We show that the reduced galled-tree
is ``nearly-unique", but when it is not unique, the algorithm also
obtains a count of the number of galled-trees that exist for the input
data, and can create these in linear time for each one, starting from
the canonical galled-tree. We also show two additional results: first,
any set of sequences that can be derived on a galled tree can be
derived on a true tree (without recombination cycles), where at most
one back mutation is allowed per site; second, the site compatibility
problem (which is NP-hard in general) can be solved in polynomial time
for any set of sequences that can be derived on a galled tree.
Perhaps more important than the specific results obtained for
galled-tree, we introduce several tools for thinking about
recombination in phylogenetic networks, and discuss how they may
generalize to reconstructing phylogenetic networks where cycles are not
always node-disjoint.
Professor Gusfield's primary interests involve the efficiency of
algorithms, particularly for problems in combinatorial optimization and
graph theory. These algorithms have been applied to study data and
computer security, stable matching, network flow, matroid
optimization,and string/pattern matching problems. Currently, Professor
Gusfield is focused on string and combinatorial problems that arise in
computational biology, particularly involving bioinformatics and
genomics. He is the author the book
"Algorithms on Strings, Trees, and Sequences: Computer Science and
Computational Biology", which has been referred to as the definitive
text on the subject.
Title: Learning From Networks of Examples
Presenter: Pedro Domingos
Abstract:
Most machine learning algorithms assume that examples are independent
of each other, but many (or most) domains violate this assumption.
For example, in real markets customers' buying decisions are
influenced by their friends and acquaintances, but data mining for
marketing ignores this (as does traditional economics). In this talk I
will describe how we can learn models that account for example
dependences, and use them to make better decisions. For example, in
the marketing domain we are able to pinpoint the most influential
customers, "seed" the network by marketing to them, and unleash a wave
of word of mouth. We mine these models from collaborative filtering
systems and knowledge-sharing Web sites, and show that they are
surprisingly robust to imperfect knowledge of the network. I will also
survey other applications of learning from networks of examples we are
working on, including combining link and content information in
Google-style Web search and predicting the evolution of scientific
communities.
Title: Learning Retrieval Functions from Implicit Feedback
Presenter: Thorsten Joachims
Abstract:
A central goal of information retrieval is the design of functions that rank documents according to their
relevance to a query. In this talk, I will present an approach to automatically learning such ranking
functions. Taking an empirical-risk-minimization approach with Kendall's Tau as the loss function, I will
explore a Support Vector algorithm that leads to a convex training problem.
An important property of this algorithm is that it can be trained with partial information about the
target rankings like "for query Q, document A should be ranked higher than document B". I will
demonstrate that in WWW search engines such preference data is available in abundance, since it can be
inferred from the clicking behavior of users. Experiments show that the method can effectively adapt the
retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms
of retrieval quality after only a couple of hundred training queries.
Bio:
Thorsten Joachims is an Assistant Professor in the Department of Computer Science at Cornell University.
In 2001, he finished his dissertation with the title "The Maximum-Margin Approach to Learning Text
Classifiers: Methods, Theory, and Algorithms", advised by Prof. Katharina Morik at the University of
Dortmund. From there he also received his Diplom in Computer Science in 1997 with a thesis on WebWatcher,
a browsing assistant for the Web. His research interests center on a synthesis of theory and system
building in the field of machine learning, with a focus on Support Vector Machines and machine learning
with text. He authors the SVM-Light algorithm and software for support vector learning. From 1994 to
1996 he was a visiting scientist at Carnegie Mellon University with Prof. Tom Mitchell.
Email: tj@cs.cornell.edu,
WWW
Title: Efficient Algorithms for Online Decision Problems
Presenter: Adam Kalai
Abstract:
In an online decision problem, one makes a sequence of decisions without
knowledge of the future. Modern learning tools such as Weighted
Majority and its many variants provide algorithms with guaranteed low
"regret," even when there are many possible decisions. In applications
of this approach to adaptive algorithms, the challenge has been to
design efficient algorithms for particular problems.
We show how an old algorithm due to Hannan [H57] can be extended to
solve these problems efficiently and intuitively. We give a simple
analysis and several examples.
This is joint work with Santosh Vempala.
Title: A Tale of Two Bounds
Presenter: John Langford
Abstract:
One subbranch of learning theory is concerned with attempting to
predict future performance for a learning algorithm. This subbranch
has only very recently achieved predictions which are meaningful for
typical learning algorithms on typical learning problems.
These new results provide an interesting tool for learning algorithm
design and analysis. I'll discuss where this tool works and where it
breaks.
Title: Learning Models and Formulas for a Temporal Event Logic
Presenter: Alan Fern
Abstract:
I describe novel learning algorithms for temporal, relational data and
their application to trainable video interpretation. I extend an existing
visual-event recognition system, Leonard (Siskind 2001), with two new
learning components:
- A new relational sequential inference method that learns mappings
from relational observation sequences to relational state
sequences. Using this method, we infer force-dynamic world models
from raw video, with results comparing favorably to pre-existing
hand-coded model reconstructors.
- A supervised learning method for logical event definitions, which
are grounded in the force-dynamic models constructed in (1)
above. We give a specific-to-general learning algorithm that,
empirically, learns definitions that outperform published
definitions written by a human expert.
Title: Identifying Structure across Pre-partitioned Data
Presenter: Zvika Marx
Abstract:
We propose an information-theoretic clustering approach that
incorporates a pre-known partition of the data, aiming to identify common
clusters that cut across the given partition. In the standard clustering
setting the formation of clusters is guided by a single source of feature
information. The newly utilized pre-partition factor introduces an
additional bias that counterbalances the impact of the features whenever
they become correlated with this known partition. The resulting
algorithmic framework was applied successfully to synthetic data, as well
as to identifying text-based cross-religion correspondences.
.
|