Lin Tan

Leveraging Code Comments to Improve Software Reliability

 

Software reliability is critically important. This work focuses on addressing fundamental challenges of software reliability: obtaining accurate program specifications and discovering tools/languages limitations. In this talk, I will show that comments provide a great data source for obtaining important information, including specifications and problems of current tools/languages. First, I will present a novel approach, iComment, which is the first work to automatically extract specifications from comments written in natural language and use these specifications to detect comment-code inconsistencies, i.e., software bugs and bad comments. Our evaluation on large real-world software such as the Linux kernel, Mozilla, Apache and Wine and 2 types of comments shows that iComment effectively extracted 1832 specifications and detected 60 new bugs and bad comments. iComment combines techniques from different areas, including natural language processing (NLP), machine learning, information retrieval, program analysis and statistics. To help explain the pros and cons of extracting specifications from comments compared to extracting specifications from code, I will briefly discuss AutoISES, which infers security specifications by statically analyzing source code, and then directly use these specifications to automatically detect security bugs/violations. I will also briefly present, cComment, which studies comment semantics and characteristics to further understand what other comments can be utilized, how we can utilize them, and what important problems/limitations they reveal. We discovered many interesting findings that can guide the design of new languages and tools for improving reliability, programmer productivity, software evolution, etc.

 

 

 

 

 

Official inquiries about AIIS should be directed to Alexandre Klementiev (klementi AT uiuc DOT edu)
Last update: 08/30/2007