Yuancheng Tu
10103 Bravern 1
Bellevue, WA, 98004
myFirstName_myLastName (at) hotmail dot com

[About Me] [Publication] [Projects] [Resources]
About Me

I recently graduated and moved to Microsoft Speech Recognition and Synthesis team. Before that, I was a Ph.D student at the University of Illinois at Urbana-Champaign, majoring in computational linguistics. I worked in the Cognitive Computation Group and my thesis advisor is Dan Roth. My primary research interests are natural language processing, machine learning, computational lexical semantics. I am also interested in structure learning in NLP and Text Mining.



Y. Tu, Ph.D. Dissertation: English complex verb constructions: identification and inference , University of Illinois at Urbana-Champaign, USA, 2012.

Y. Tu and D. Roth, Sorting out the Most Confusing English Phrasal Verbs , First Joint Conference on Lexical and Computational Semantics, Montreal, Canada, 2012. [PDF] [bibitem] [Data]

Y. Tu and D. Roth, Learning English Light Verb Constructions: Contextual or Statistical , ACL-HLT workshop: Multiword Expressions: from Parsing and Generation to the Real World (MWE 2011), Portland, Oregon, 2011. [PDF] [bibitem] [Data] [Annotation] [slides]

Y. Tu, Book Review of Idiom Treatment Experiments in Machine Translation by Dimitra Anastasiou, Linguists List Review, 2011. [web]

Y. Tu, N. Johri, D. Roth and J. Hockenmaier, Citation Author Topic Model in Expert Search , The 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China, 2010. [PDF] [bibitem] [Poster] [Data and Preprocessing Toolkit] [CAT Model]

N. Johri, D. Roth and Y. Tu, Experts' Retrieval with Multiword-Enhanced Author Topic Model ,Human Language Technologies: 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2010, Semantic Search Workshop), Los Angeles, California, 2010. [PDF] [bibitem] [slides]

D. Roth and Y. Tu, Aspect Guided Text Categorization with Unobserved Labels, International Conference of Data Mining (ICDM), Miami, Florida, 2009. [PDF] [bibitem] [slides]

M. Chang, D. Goldwasser, D. Roth, and Y. Tu, Unsupervised Constraint Driven Learning for Transliteration Discovery, NAACL, Boulder, Colorado, 2009. [PDF] [bibitem]

D. Goldwasser, M. Chang, Y. Tu, and D. Roth, Constraint Driven Transliteration Discovery, Recent Advances in Natural Language Processing, 2009. [PDF] [bibitem]

M. Sammons, V. Vydiswaran, T. Vieira, N. Johri, M. Chang, D. Goldwasser, V. Srikumar, G. Kundu, Y. Tu, K. Small, J. Rule, Q. Do, D. Roth, Relation Alignment for Textual Entailment Recognition , The second Textual Analysis Conference(TAC-2009), NIST Gaithersburg, MaryLand, Nov. 16-17, 2009. [PDF]

Q. Do, D. Roth, M. Sammons, Y. Tu, V. Vinod, Robust, Light-weight Approaches to compute Lexical Similarity, Computer Science Research and Technical Reports, University of Illinois (2009). [PDF] [bibitem]

V. Punyakanok, D. Roth, W. Yih, D. Zimak, and Y. Tu, Semantic Role Labeling via Generalized Inference over Classifiers Shared Task Paper, Proc. of the Annual Conference on Computational Natural Language Learning (CoNLL), pp.130-133, 2004 [PDF] [bibitem]

X. Li, D. Roth, and Y. Tu, Phrasenet: Towards context sensitive lexical semantics, In Proceedings of CoNLL03, pages 67-74, 2003. [PDF] [bibitem]

Y. Tu, Book Review of From the Colt's Mouth... And Others, Breivik, Leiv Egil and Angela Hasselgren, Linguists List Review Issues, 2002. [web]

Y. Tu, Phrasenet: A Context Sensitive Lexical Semantics Knowledge Base. My previous Ph. D. Proposal, University of Illinois at Urbana-Champaign, 2003. [PDF] [HTML] [bib]


Current Projects

DSSI-2011 Yahoo-Award Winning Project: Expert Search by Non-Experts

Project Demo
Please input keywords or paper abstracts into the search box to retrieve relevant experts. Currently, we use only UIUC domain. This is the project done by thirteen DSSI-2011 students and led by Alex Kotov and me.

We propose to develop an expert and expertise search system which tracks and retrieves researchers and professors based on their research expertise, visualizes their related information via an automatic generated webpages. Our system will have many useful implications for the academic, commercial and government fields. For example, academic funding agencies frequently search for researchers which are not only experts in their fields but also satisfy from geographical constraints. Commercial implications include, but are not limited to, an email agent that could prompt people of events when and only when the person is likely to be interested in that event. Government implications include military intelligence gathering, in that, a ranking intelligence officer may want to communicate with an expert on a topic who is nearby.

Structure Learning in Context Sensitive Recognition and Substitution of MWEs

This project introduces a novel unsupervised sequence learning model for identifying English Multiword Expressions (MWEs). We model the compositionality of MWEs in a given context as a sequence learning problem over each component of the given MWE. When idiomatic uses of a MWE are indistinguishable on the surface from its literal uses in a given context, our model makes a prediction based on the learned best hidden sequence combination and identifies which component can be simplified by examining the binary value of each hidden state. This model proposes a unified way to deal with all types of MWEs in a computational system and provides less ambiguous lexical representation for further knowledge acquisition and inference in other applications such as machine translation, information extraction and data mining.

Lexical Simplification in Textual Inference

In this research we propose to learn the natural language entailment relation via exploiting the logic entailment or polarity expressed by natural language lexicon, such as factive verbs and imperative verbs. The model is rule-based and can be used to generate the entailed statement for a given sentence with a factive or imperative verb as the predicate. This model can be used in textual entailment to produce less ambiguous lexical representation and it can also help to achieve better recall in information retrieval or data mining system.

Some of my previous projects



What Paper Are You Working On?
My advisor passed this essay to me when I asked him how to do good research. And I think this is one of the best advice I got and is definitely worth sharing.
How to Write a great Research Paper?
My colleague, Ming-Wei pointed out to me this great presentation by Simon Johns from Microsoft Research Cambridge. I really enjoy the content as well as the presentation itself.
Having Fun in Research
This is a wonderful presentation by YuanYuan Zhou, once a CS professor in UIUC and now in UC San Diego. What I like most in this presentation is "you need to enjoy the journey!".
Really Achieving Your Childhood Dreams
This is the really inspiring last lecture given by a former CMU professor Randy Pausch.
List of NLP/CL Conference Deadlines by Joel Tetreault
A nice way to keep track of deadlines
Home Page of Yuancheng