Connecting Wikis and Natural Language Processing Systems

Abstract
We investigate the integration of Wiki systems with automated natural language processing (NLP) techniques. The vision is that of a "self-aware" Wiki system reading, understanding, transforming, and writing its own content, as well as supporting its users in information analysis and content development. We provide a number of practical application examples, including index generation, question answering, and automatic summarization, which demonstrate the practicability and usefulness of this idea. A system architecture providing the integration is presented, as well as first results from an initial implementation based on the GATE framework for NLP and the MediaWiki system.
General Terms: Design, Human Factors, Languages
Keywords: Self-aware Wiki System, Wiki/NLP Integration
Fuzzy Belief Revision
Abstract

Fuzzy sets, having been the long-standing mainstay of modeling and manipulating imperfect information, are an obvious candidate for representing uncertain beliefs.
Unfortunately, unadorned fuzzy sets are too limited to capture complex or potentially inconsistent beliefs, because all too often they reduce to absurdities ("nothing is possible") or trivialities ("everything is possible").
However, we show that by combining the syntax of propositional logic with the semantics of fuzzy sets a rich framework for expressing and manipulating uncertain beliefs can be created, admitting Gärdenfors-style expansion, revision, and contraction operators and being moreover amenable to easy integration with conventional ``crisp'' information processing.
The model presented here addresses many of the shortcomings of traditional approaches for building fuzzy data models, which will hopefully lead to a wider adoptance of fuzzy technologies for the creation of information systems.
Keywords
fuzzy belief revision, fuzzy information systems, soft computing, fuzzy object-oriented data model
Fuzzy Coreference Resolution for Summarization

Abstract
We present a fuzzy-theory based approach to coreference resolution and its application to text summarization.
Automatic determination of coreference between noun phrases is fraught with uncertainty. We show how fuzzy sets can be used to design a new coreference algorithm which captures this uncertainty in an explicit way and allows us to define varying degrees of coreference.
The algorithm is evaluated within a system that participated in the 10-word summary task of the DUC 2003 competition.
Using Knowledge-poor Coreference Resolution for Text Summarization
Abstract

We present a system that produces 10-word summaries based on the single summarization strategy of outputting noun phrases representing the most important text entities (as represented by noun phrase coreference chains). The coreference chains were computed using fuzzy set theory combined with knowledge-poor corefernce heuristics.
An Integration Architecture for User-Centric Document Creation, Retrieval, and Analysis

Abstract
The different stages in the life-cycle of contentcreation, storage, retrieval, and analysisare usually regarded as distinct and isolated steps. In this paper we examine the synergies resulting from their integration within a single architecture.
Our goal is to employ such an architecture to improve user support for knowledge-intensive tasks. We present a case study from the area of building architecture, which is currently ongoing.
Supporting Reverse Engineering Tasks with a Fuzzy Repository Framework
Abstract

Software reverse engineering (RE) is often hindered not by the lack of available data, but by an overabundance of it: the (semi-)automatic analysis of static and dynamic code information, data, and documentation results in a huge heap of often incomparable data. Additionally, the gathered information is typically fraught with various kinds of imperfections, for example conflicting information found in software documentation vs. program code.
Our approach to this problem is twofold: for the management of the diverse RE results we propose the use of a repository, which supports an iterative and incremental discovery process under the aid of a reverse engineer. To deal with imperfections, we propose to enhance the repository model with additional representation and processing capabilities based on fuzzy set theory and fuzzy belief revision.
Keywords
fuzzy reverse engineering, meta model, extension framework, iterative process, knowledge evolution
Multi-ERSS and ERSS 2004
Abstract
Last year, we presented a system, ERSS, which constructed 10 word summaries in form of a list of noun phrases. It was based on a knowledge-poor extraction of noun phrase coreference chains implemented on a fuzzy set theoretic base. This year we present the performance of an improved version, ERSS 2004 and an extension of the same basic system: Multi-ERSS constructs 100-word extract summaries for clusters of texts. With very few modifications we ran ERSS 2004 on Tasks 1 and 3 and Multi-ERSS on Tasks 2, 4, and 5, scoring generally above average in all but the linguistic quality aspects.
Enriching Protein Structure Visualizations with Mutation Annotations Obtained by Text Mining Protein Engineering Literature

Abstract
Protein structure visualization tools render images that allow the user to explore structural features of a protein. Context specific information relating to a particular protein or protein family is not easily integrated and must be uploaded from databases or provided through manual curation of input files. We describe a mixed natural language processing and sequence analysis based approach for the retrieval of mutation specific annotations from full text articles for rendering with protein structures.
Keywords
Text Mining, Protein Structure Annotation, Protein Function, ProSAT, Xylanase
Engineering a Semantic Desktop for Building Historians and Architects

Abstract
We analyse the requirements for an advanced semantic support of usersbuilding historians and architectsof a multi-volume encyclopedia of architecture from the late 19th century. Novel requirements include the integration of content retrieval, content development, and automated content analysis based on natural language processing.
We present a system architecture for the detected requirements and its current implementation. A complex scenario demonstrates how a desktop supporting semantic analysis can contribute to specific, relevant user tasks.
Combining Biological Databases and Text Mining to support New Bioinformatics Applications

Abstract
A large amount of biological knowledge today is only available from full-text research papers. Since neither manual database curators nor users can keep up with the rapidly expanding volume of scientific literature, natural language processing approaches are becoming increasingly important for bioinformatic projects.
In this paper, we go beyond simply extracting information from full-text articles by describing an architecture that supports targeted access to information from biological databases using the results derived from text mining of research papers, thereby integrating information from both sources within a biological application.
The described architecture is currently being used to extract information about protein mutations from full-text research papers. Text mining results drive the retrieval of sequence information from protein databases and the employment of algorithmic sequence analysis tools, which facilitate further data access from protein structure databases. Complex mapping of NLP derived text annotations to protein structures allows the rendering, with 3D structure visualization, of information not available in databases of mutation annotations.
