Agents and Databases: Friends or Foes?

Friendly Meetings in Vancouver

Abstract

On first glance agent technology seems more like a hostile intruder into the database world. On the other hand, the two could easily complement each other, since agents carry out information processes whereas databases supply information to processes. Nonetheless, to view agent technology from a database perspective seems to question some of the basic paradigms of database technology, particularly the premise of semantic consistency of a database. The paper argues that the ensuing uncertainty in distributed databases can be modelled by beliefs, and develops the basic concepts for adjusting peer-to-peer databases to the individual beliefs in single nodes and collective beliefs in the entire distributed database.

The FungalWeb Ontology: Application Scenarios

Abstract

The FungalWeb Ontology aims to support the data integration needs of enzyme biotechnology from inception to product roll out. Serving as a knowledge base for decision support, the conceptualization seeks to link fungal species with enzymes, enzyme substrates, enzyme classifications, enzyme modifications, enzyme retail and applications. We demonstrate how the FungalWeb Ontology supports this remit by presenting application scenarios, conceptualizations of the ontological frame able to support these scenarios and semantic queries typical of a Biotech Manager. Queries to the knowledge base are answered with description logic (DL) and automated reasoning tools.

A Self-Learning Context-Aware Lemmatizer for German

Vancouver Waterfront

Abstract

Accurate lemmatization of German nouns mandates the use of a lexicon. Comprehensive lexicons, however, are expensive to build and maintain. We present a self-learning lemmatizer capable of automatically creating a full-form lexicon by processing German documents.

ERSS 2005: Coreference-Based Summarization Reloaded

Abstract

Friendly Meetings in Vancouver
We present ERSS 2005, our entry to this year's DUC competition. With only slight modifications from last year's version to accommodate the more complex context information present in DUC 2005, we achieved a similar performance to last year's entry, ranking roughly in the upper third when examining the ROUGE-1 and Basic Element score.

We also participated in the additional manual evaluation based on the new Pyramid method and performed further evaluations based on the Basic Elements method and the automatic generation of Pyramids. Interestingly, the ranking of our system differs greatly between the different measures; we attempt to analyse this effect based on correlations between the different results using the Spearman coefficient.

Context-based Multi-Document Summarization using Fuzzy Coreference Cluster Graphs

The IPD cluster computing cluster summaries using a clustering algorithm :)

Abstract

Constructing focused, context-based multi-document summaries requires an analysis of the context questions, as well as their corresponding document sets. We present a fuzzy cluster graph algorithm that finds entities and their connections between context and documents based on fuzzy coreference chains and describe the design and implementation of the ERSS summarizer implementing these ideas.

Mutation Mining - A Prospector's Tale

Screenshot of ProSAT/Webmol with MutationMiner annotations

Abstract

Protein structure visualization tools render images that allow the user to explore structural features of a protein. Context specific information relating to a particular protein or protein family is, however, not easily integrated and must be uploaded from databases or provided through manual curation of input files. Protein Engineers spend considerable time iteratively reviewing both literature and protein structure visualizations manually annotated with mutated residues. Meanwhile, text mining tools are increasingly used to extract specific units of raw text from scientific literature and have demonstrated the potential to support the activities of Protein Engineers.

The transfer of mutation specific raw-text annotations to protein structures requires integrated data processing pipelines that can co-ordinate information retrieval, information extraction, protein sequence retrieval, sequence alignment and mutant residue mapping. We describe the Mutation Miner pipeline designed for this purpose and present case study evaluations of the key steps in the process. Starting with literature about mutations made to protein families; haloalkane dehalogenase, bi-phenyl dioxygenase, and xylanase we enumerate relevant documents available for text mining analysis, the available electronic formats, and the number of mutations made to a given protein family. We review the efficiency of NLP driven protein sequence retrieval from databases and report on the effectiveness of Mutation Miner in mapping annotations to protein structure visualizations. We highlight the feasibility and practicability of the approach.

Keywords

Text mining - Protein structure annotation - Protein mutation - Data mining - Haloalkane dehalogenase - Biphenyl dioxygenase - Xylanase

Fuzzy Set Theory-Based Belief Processing for Natural Language Texts

Introduction

The growing number of publicly available information sources makes it impossible for individuals to keep track of all the various opinions on one topic. The goal of our artificial believer system we present in this paper is to extract and analyze opinionated statements from newspaper articles.

Beliefs are modeled with a fuzzy-theoretic approach applied after NLP-based information extraction. A fuzzy believer models a human agent, deciding what statements to believe or reject based on different, configurable strategies.

Workshop on Traceability at CASCON 2007

Together with Juergen Rilling from Concordia University and Philippe Charland from the DRDC Canada I'm organizing a workshop at CASCON 2007: Traceability in Software Engineering—Past, Present and Future. It's on October 25 at the Sheraton Parkway Toronto North Hotel and Convention Centre, Ontario, Canada.

Durm German Lemmatizer v1.0 Released

I'm happy to announce the first public release of our free/open source Durm Lemmatization System for the German language.

The release comes with source code, binaries, documentation, resources (German lexicon, Case Tagger probabilities), and manually annotated texts from the German Wikipedia for evaluation.

Multi-lingual Noun Phrase Chunker Updated

I just posted a small update to my multi-lingual noun phrase chunker (MuNPEx) for GATE.

Changes in v0.2 are:
o preliminary Spanish support (see below)
o renamed from "NPE" to "MuNPEx" in a blatant attempt on Googlewhacking
o small cleanups
o now comes with a sample NE transducer for number markup to improve chunking
Supported languages are now English, German, French, and Spanish (beta).

Syndicate content