Einat Minkov

Learning to Query Heterogeneous Data

 

Structured data, describing entities and their inter-relations, can be accommodated and processed using relational databases. However, there is much information available from unstructured or semi-structured sources that we would like to query and reason about. In this talk, I will describe a query language that is applied to a graph containing a heterogeneous mixture of textual and non-textual objects. Random graph walk paradigms (e.g., Personalized PageRank) are used to rank the entities in the graph by their similarity, or relatedness, to a query. I will show that multiple tasks in a given domain can be casted as search queries in this framework. While graph walks provide good performance, machine learning techniques can be applied to adapt the generated similarity metric per task. In the talk, I will include an experimental evaluation of several classes of similarity queries from the domain of personal information management, where email messages, meeting entries and social network information extracted from a personal workstation are represented as a graph; for instance, we use similarity search to find people likely to attend a meeting. A second domain evaluated is the processing of parsed text as an entity-relation graph, where we use the graph-based similarity measure to extract city and person names from textual corpora.

 

 

 

 

 

Official inquiries about AIIS should be directed to Alexandre Klementiev (klementi AT uiuc DOT edu)
Last update: 08/30/2007