NLP

Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles

Abstract

Reported speech in the form of direct and indirect reported speech is an important indicator of evidentiality in traditional newspaper texts, but also increasingly in the new media that rely heavily on citation and quotation of previous postings, as for instance in blogs or newsgroups. This paper details the basic processing steps for reported speech analysis and reports on performance of an implementation in form of a GATE resource.

Generating Update Summaries for DUC 2007

Abstract

Update summaries as defined for the new DUC 2007 task deliver focused information to a user who has already read a set of older documents covering the same topic. In this paper, we show how to generate this kind of summary from the same data structure—fuzzy coreference cluster graphs—as all other generic and focused multi-document summaries. Our system ERSS 2007 implementing this algorithm also participated in the DUC 2007 main task, without any changes from the 2006 version.

An Initial Fuzzy Coreference Cluster Graph

Creating a Fuzzy Believer to Model Human Newspaper Readers

Montreal 2007

Abstract

We present a system capable of modeling human newspaper readers. It is based on the extraction of reported speech, which is subsequently converted into a fuzzy theory-based representation of single statements. A domain analysis then assigns statements to topics. A number of fuzzy set operators, including fuzzy belief revision, are applied to model different belief strategies. At the end, our system holds certain beliefs while rejecting others.

Fuzzy Clustering for Topic Analysis and Summarization of Document Collections

Montreal 2007

Abstract

Large document collections, such as those delivered by Internet search engines, are difficult and time-consuming for users to read and analyse. The detection of common and distinctive topics within a document set, together with the generation of multi-document summaries, can greatly ease the burden of information management. We show how this can be achieved with a clustering algorithm based on fuzzy set theory, which (i) is easy to implement and integrate into a personal information system, (ii) generates a highly flexible data structure for topic analysis and summarization, and (iii) also delivers excellent performance.

Towards a Systematic Evaluation of Protein Mutation Extraction Systems

Abstract

The development of text analysis systems targeting the extraction of information about mutations from research publications is an emergent topic in biomedical research. Current systems differ in both scope and approaches, which prevents a meaningful comparison of their performance and therefore possible synergies. To overcome this "evaluation bottleneck," we developed a comprehensive framework for the systematic analysis of mutation extraction systems, precisely defining tasks and corresponding evaluation metrics that will allow a comparison of existing and future applications.

Keywords: mutation extraction systems; mutation evaluation tasks; mutation evaluation metrics

Protein Domains

Task-Dependent Visualization of Coreference Resolution Results

A single coreference chains visualized as a Topic Map

Abstract

Graphical visualizations of coreference chains support a system developer in analyzing the behavior of a resolution algorithm. In this paper, we state explicit use cases for coreference chain visualizations and show how they can be resolved by transforming chains into other, standardized data formats, namely Topic Maps and Ontologies.

Processing of Beliefs extracted from Reported Speech in Newspaper Articles

A fuzzy believer?

Abstract

The growing number of publicly available information sources makes it impossible for individuals to keep track of all the various opinions on one topic. The goal of our artificial believer system presented in this paper is to extract and analyze statements of opinion from newspaper articles.

Beliefs are modeled using a fuzzy-theoretic approach applied after NLP-based information extraction. A fuzzy believer models a human agent, deciding what statements to believe or reject based on different, configurable strategies.

Next-Generation Summarization: Contrastive, Focused, and Update Summaries

Conference Hotel, Borovets, Bulgaria

Abstract

Classical multi-document summaries focus on the common topics of a document set and omit distinctive themes particular to a single document—thereby often suppressing precisely that kind of information a user might need for a specific task. This can be avoided through advanced multi-document summaries that take a user's context and history into account, by delivering focused, contrastive, or update summaries. To facilitate the generation of these different summaries, we propose to generate all types from a single data structure, topic clusters, which provide for an abstract representation of a set of documents. Evaluations carried out on five years' worth of data from the DUC summarization competition prove the feasibility of this approach.

Connecting Wikis and Natural Language Processing Systems

Palais de Congres, Montreal, Canada

Abstract

We investigate the integration of Wiki systems with automated natural language processing (NLP) techniques. The vision is that of a "self-aware" Wiki system reading, understanding, transforming, and writing its own content, as well as supporting its users in information analysis and content development. We provide a number of practical application examples, including index generation, question answering, and automatic summarization, which demonstrate the practicability and usefulness of this idea. A system architecture providing the integration is presented, as well as first results from an initial implementation based on the GATE framework for NLP and the MediaWiki system.

General Terms: Design, Human Factors, Languages
Keywords: Self-aware Wiki System, Wiki/NLP Integration

Fuzzy Coreference Resolution for Summarization

Venice

Abstract

We present a fuzzy-theory based approach to coreference resolution and its application to text summarization.

Automatic determination of coreference between noun phrases is fraught with uncertainty. We show how fuzzy sets can be used to design a new coreference algorithm which captures this uncertainty in an explicit way and allows us to define varying degrees of coreference.

The algorithm is evaluated within a system that participated in the 10-word summary task of the DUC 2003 competition.

Syndicate content