Generating Update Summaries for DUC 2007
Abstract
Update summaries as defined for the new DUC 2007 task deliver focused information to a user who has already read a set of older documents covering the same topic. In this paper, we show how to generate this kind of summary from the same data structure—fuzzy coreference cluster graphs—as all other generic and focused multi-document summaries. Our system ERSS 2007 implementing this algorithm also participated in the DUC 2007 main task, without any changes from the 2006 version.
Creating a Fuzzy Believer to Model Human Newspaper Readers

Abstract
We present a system capable of modeling human newspaper readers. It is based on the extraction of reported speech, which is subsequently converted into a fuzzy theory-based representation of single statements. A domain analysis then assigns statements to topics. A number of fuzzy set operators, including fuzzy belief revision, are applied to model different belief strategies. At the end, our system holds certain beliefs while rejecting others.
Fuzzy Clustering for Topic Analysis and Summarization of Document Collections

Abstract
Large document collections, such as those delivered by Internet search engines, are difficult and time-consuming for users to read and analyse. The detection of common and distinctive topics within a document set, together with the generation of multi-document summaries, can greatly ease the burden of information management. We show how this can be achieved with a clustering algorithm based on fuzzy set theory, which (i) is easy to implement and integrate into a personal information system, (ii) generates a highly flexible data structure for topic analysis and summarization, and (iii) also delivers excellent performance.
Text Mining and Software Engineering: An Integrated Source Code and Document Analysis Approach

Abstract
Documents written in natural languages constitute a major part of the artifacts produced during the software engineering lifecycle. Especially during software maintenance or reverse engineering, semantic information conveyed in these documents can provide important knowledge for the software engineer. In this paper, we present a text mining system capable of populating a software ontology with information detected in documents. A particular novelty is the integration of results from automated source code analysis into an NLP pipeline, allowing to cross-link software artifacts represented in code and natural language on a semantic level.
Empowering Software Maintainers with Semantic Web Technologies

Abstract
Software maintainers routinely have to deal with a multitude of artifacts, like source code or documents, which often end up disconnected, due to their different representations and the size and complexity of legacy systems. One of the main challenges in software maintenance is to establish and maintain the semantic connections among all the different artifacts. In this paper, we show how Semantic Web technologies can deliver a unified representation to explore, query and reason about a multitude of software artifacts. A novel feature is the automatic integration of two important types of software maintenance artifacts, source code and documents, by populating their corresponding sub-ontologies through code analysis and text mining. We demonstrate how the resulting "Software Semantic Web" can support typical maintenance tasks through ontology queries and DL reasoning, such as security analysis, architectural evolution, and traceability recovery between code and documents.
Keywords: Software Maintenance, Ontology Population, Text Mining.
Towards a Systematic Evaluation of Protein Mutation Extraction Systems
Abstract
The development of text analysis systems targeting the extraction of information about mutations from research publications is an emergent topic in biomedical research. Current systems differ in both scope and approaches, which prevents a meaningful comparison of their performance and therefore possible synergies. To overcome this "evaluation bottleneck," we developed a comprehensive framework for the systematic analysis of mutation extraction systems, precisely defining tasks and corresponding evaluation metrics that will allow a comparison of existing and future applications.
Keywords: mutation extraction systems; mutation evaluation tasks; mutation evaluation metrics
Ontological Text Mining of Software Documents

Abstract
Documents written in natural languages constitute a major part of the software engineering lifecycle artifacts. Especially during software maintenance or reverse engineering, semantic information conveyed in these documents can provide important knowledge for the software engineer. In this paper, we present a text mining system capable of populating a software ontology with information detected in documents.
Task-Dependent Visualization of Coreference Resolution Results

Abstract
Graphical visualizations of coreference chains support a system developer in analyzing the behavior of a resolution algorithm. In this paper, we state explicit use cases for coreference chain visualizations and show how they can be resolved by transforming chains into other, standardized data formats, namely Topic Maps and Ontologies.
Processing of Beliefs extracted from Reported Speech in Newspaper Articles

Abstract
The growing number of publicly available information sources makes it impossible for individuals to keep track of all the various opinions on one topic. The goal of our artificial believer system presented in this paper is to extract and analyze statements of opinion from newspaper articles.
Beliefs are modeled using a fuzzy-theoretic approach applied after NLP-based information extraction. A fuzzy believer models a human agent, deciding what statements to believe or reject based on different, configurable strategies.
Next-Generation Summarization: Contrastive, Focused, and Update Summaries

Abstract
Classical multi-document summaries focus on the common topics of a document set and omit distinctive themes particular to a single document—thereby often suppressing precisely that kind of information a user might need for a specific task. This can be avoided through advanced multi-document summaries that take a user's context and history into account, by delivering focused, contrastive, or update summaries. To facilitate the generation of these different summaries, we propose to generate all types from a single data structure, topic clusters, which provide for an abstract representation of a set of documents. Evaluations carried out on five years' worth of data from the DUC summarization competition prove the feasibility of this approach.


