.  
     
       
 
Abstracts for Talks and Related Material
FALL 2003
  Talks



Title: Fast Methods for Kernel-based Text Analysis
Presenter: Taku Kudo
Abstract: Kernel-based learning (e.g., Support Vector Machines) has been successfully applied to many hard problems in Natural Language Processing (NLP). In NLP, although feature combinations are crucial to improving performance, they are heuristically selected. Kernel methods change this situation. The merit of the kernel methods is that effective feature combination} is implicitly expanded without loss of generality and increasing the computational costs. Kernel-based text analysis shows an excellent performance in terms in accuracy; however, these methods are usually too slow to apply to large-scale text analysis. In this paper, we extend a Basket Mining algorithm to convert a kernel-based classifier into a simple and fast linear classifier. Experimental results on English BaseNP Chunking, Japanese Word Segmentation and Japanese Dependency Parsing show that our new classifiers are about 30 to 300 times faster than the standard kernel-based classifiers.
Link: PDF, PPT slides


Title: Machine learning for evolutionary computation. Evolutionary computation for machine learning.
Presenter: Xavier Llora
Abstract: Machine learning and evolutionary computation present several areas of intersection for mutual benefit. Machine learning provides tools for improving the competence of evolutionary algorithms, that is solve problems quickly, efficiently, and reliable. Some of the examples of the usage of machine learning for evolutionary computation mainly deal with decision making facets of such evolutionary processes. On the other hand, evolutionary computation provides new ways for approaching machine learning problems. Among others, multiobjective evolutionary computation easily analyze hypothesis tradeoffs for a given learning problem.


Title: Learning Control Knowledge Using Approximate Policy Iteration
Presenter: Bob Givan
Abstract: Approximate policy iteration is a well known technique for heuristically solving large stochastic planning problems. This technique has generally failed to generalize to highly structured domains where the state-value functions that must be represented are very complex. In this talk, I discuss approximate policy iteration with the usual value-function learning step replaced by a learning step in plan (policy) space. I present policy-language biases for this learning step that enable the solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, the technique discussed can induce high-quality, domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs.


Title: Learning from positive and unlabeled examples
Presenter: Bing Liu
Abstract: In traditional supervised learning, a classifier is built using labeled training examples of every class. The resulting classifier is then used to assign classes to future data. Although this classic model is important, in practice one also encounters a related problem. That is, one has a set of examples of a particular class P (called positive class) and a set of unlabeled examples U that contains instances from class P and also other types of instances (called negative instances). One wants to build a classifier to classify the instances in the unlabeled set U or some future data into positive and negative classes. Due to the fact that there is no labeled negative example available for learning, traditional classification techniques are not directly applicable. In this talk, I will first discuss some theoretical foundations for learning in this setting and show that the problem can be posed as a constrained optimization problem and that under appropriate conditions the solution to the constrained optimization problem will give us a good classifier. After discussing a few existing (mainly heuristics) techniques for solving the problem, I will present two more principled approaches based on logistic regression and a biased formulation of SVM. Experimental results in the domain of text classification will also be discussed.


Title: Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination
Presenter: Dan Gusfield
Abstract: A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not tree-like. With the growth of genomic data, much of which does not fit ideal tree models, and the increasing appreciation of the genomic role of such phenomena as recombination, recurrent and back mutation, horizontal gene transfer, gene conversion, and mobile genetic elements, there is greater need to understand the algorithmics and combinatorics of phylogenetic networks. Understanding recombination is particularly important, because recombination is the key element of ``gene mapping experiments" that nature has already done for us. Those experiments can help determine the location of genes involved in gene-influenced diseases and of economically important traits. But in order to interpret these natural experiments, we must better understand the algorithmics and combinatorics of recombination in networks. To date, only a few papers have addressed this issue.
Wang et al. studied the problem of constructing a phylogenetic network for a set of n binary sequences derived from the all-0 ancestral sequence, when each site in the sequence can change from 0 to 1 at most once in the network, and recombination between sequences is allowed. They showed that the problem of finding a phylogenetic network that minimizes the number of recombination events is NP-hard, but gave a polynomial-time algorithm (O(nm + n^4)-time, for n sequences of length m each) that was intended to determine whether the sequences could be derived on a phylogenetic network in which the recombination cycles are node disjoint. We call such a network a ``galled-tree". The paper by Wang et al. is seminal in defining this problem and asserting that it has an efficient solution. Unfortunately, the algorithm given there is incomplete and only constitutes a sufficient, but not a necessary, test for the existence of a galled-tree for the data.
In this talk we do several things. By more deeply analyzing the combinatorial constraints on galled-trees, we obtain an algorithm that runs in O(nm + n^3)-time and is guaranteed to be both a necessary and sufficient test for the existence of a galled-tree for the data. If there is a galled-tree, the algorithm constructs a ''reduced" galled-tree that minimizes the number of mutations occurring on recombination cycles. We prove that every reduced galled-tree for the data uses the same number of recombintions, which is the minimum number of recombinations needed for the data, over all possible phylogenetic networks for the data, even if multiple cross-overs are allowed. Hence, when a galled-tree exists, the problem of minimizing the number of recombinations can be solved efficiently. The effect is that the galled-tree is a phylogenetic network that explains the input sequences with the ``littlest deviation" from a true tree model. We show that the reduced galled-tree is ``nearly-unique", but when it is not unique, the algorithm also obtains a count of the number of galled-trees that exist for the input data, and can create these in linear time for each one, starting from the canonical galled-tree. We also show two additional results: first, any set of sequences that can be derived on a galled tree can be derived on a true tree (without recombination cycles), where at most one back mutation is allowed per site; second, the site compatibility problem (which is NP-hard in general) can be solved in polynomial time for any set of sequences that can be derived on a galled tree.
Perhaps more important than the specific results obtained for galled-tree, we introduce several tools for thinking about recombination in phylogenetic networks, and discuss how they may generalize to reconstructing phylogenetic networks where cycles are not always node-disjoint.

Professor Gusfield's primary interests involve the efficiency of algorithms, particularly for problems in combinatorial optimization and graph theory. These algorithms have been applied to study data and computer security, stable matching, network flow, matroid optimization,and string/pattern matching problems. Currently, Professor Gusfield is focused on string and combinatorial problems that arise in computational biology, particularly involving bioinformatics and genomics. He is the author the book "Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology", which has been referred to as the definitive text on the subject.


Title: Learning From Networks of Examples
Presenter: Pedro Domingos
Abstract: Most machine learning algorithms assume that examples are independent of each other, but many (or most) domains violate this assumption. For example, in real markets customers' buying decisions are influenced by their friends and acquaintances, but data mining for marketing ignores this (as does traditional economics). In this talk I will describe how we can learn models that account for example dependences, and use them to make better decisions. For example, in the marketing domain we are able to pinpoint the most influential customers, "seed" the network by marketing to them, and unleash a wave of word of mouth. We mine these models from collaborative filtering systems and knowledge-sharing Web sites, and show that they are surprisingly robust to imperfect knowledge of the network. I will also survey other applications of learning from networks of examples we are working on, including combining link and content information in Google-style Web search and predicting the evolution of scientific communities.

Title: Learning Retrieval Functions from Implicit Feedback
Presenter: Thorsten Joachims
Abstract: A central goal of information retrieval is the design of functions that rank documents according to their relevance to a query. In this talk, I will present an approach to automatically learning such ranking functions. Taking an empirical-risk-minimization approach with Kendall's Tau as the loss function, I will explore a Support Vector algorithm that leads to a convex training problem. An important property of this algorithm is that it can be trained with partial information about the target rankings like "for query Q, document A should be ranked higher than document B". I will demonstrate that in WWW search engines such preference data is available in abundance, since it can be inferred from the clicking behavior of users. Experiments show that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training queries.

Bio: Thorsten Joachims is an Assistant Professor in the Department of Computer Science at Cornell University. In 2001, he finished his dissertation with the title "The Maximum-Margin Approach to Learning Text Classifiers: Methods, Theory, and Algorithms", advised by Prof. Katharina Morik at the University of Dortmund. From there he also received his Diplom in Computer Science in 1997 with a thesis on WebWatcher, a browsing assistant for the Web. His research interests center on a synthesis of theory and system building in the field of machine learning, with a focus on Support Vector Machines and machine learning with text. He authors the SVM-Light algorithm and software for support vector learning. From 1994 to 1996 he was a visiting scientist at Carnegie Mellon University with Prof. Tom Mitchell.
Email: tj@cs.cornell.edu, WWW

Title: Efficient Algorithms for Online Decision Problems
Presenter: Adam Kalai
Abstract: In an online decision problem, one makes a sequence of decisions without knowledge of the future. Modern learning tools such as Weighted Majority and its many variants provide algorithms with guaranteed low "regret," even when there are many possible decisions. In applications of this approach to adaptive algorithms, the challenge has been to design efficient algorithms for particular problems. We show how an old algorithm due to Hannan [H57] can be extended to solve these problems efficiently and intuitively. We give a simple analysis and several examples. This is joint work with Santosh Vempala.

Title: A Tale of Two Bounds
Presenter: John Langford
Abstract: One subbranch of learning theory is concerned with attempting to predict future performance for a learning algorithm. This subbranch has only very recently achieved predictions which are meaningful for typical learning algorithms on typical learning problems. These new results provide an interesting tool for learning algorithm design and analysis. I'll discuss where this tool works and where it breaks.

Title: Learning Models and Formulas for a Temporal Event Logic
Presenter: Alan Fern
Abstract: I describe novel learning algorithms for temporal, relational data and their application to trainable video interpretation. I extend an existing visual-event recognition system, Leonard (Siskind 2001), with two new learning components:
  1. A new relational sequential inference method that learns mappings from relational observation sequences to relational state sequences. Using this method, we infer force-dynamic world models from raw video, with results comparing favorably to pre-existing hand-coded model reconstructors.

  2. A supervised learning method for logical event definitions, which are grounded in the force-dynamic models constructed in (1) above. We give a specific-to-general learning algorithm that, empirically, learns definitions that outperform published definitions written by a human expert.


Title: Identifying Structure across Pre-partitioned Data
Presenter: Zvika Marx
Abstract: We propose an information-theoretic clustering approach that incorporates a pre-known partition of the data, aiming to identify common clusters that cut across the given partition. In the standard clustering setting the formation of clusters is guided by a single source of feature information. The newly utilized pre-partition factor introduces an additional bias that counterbalances the impact of the features whenever they become correlated with this known partition. The resulting algorithmic framework was applied successfully to synthetic data, as well as to identifying text-based cross-religion correspondences. .


       
 
Official inquiries about AIML should be directed to Dav Zimak (davzimak@uiuc.edu).
Last update: 11/09/2002