Durm

A Semantic Wiki Approach to Cultural Heritage Data Management

Abstract

Providing access to cultural heritage data beyond book digitization and information retrieval projects is important for delivering advanced semantic support to end users, in order to address their specific needs. We introduce a separation of concerns for heritage data management by explicitly defining different user groups and analyzing their particular requirements. Based on this analysis, we developed a comprehensive system architecture for accessing, annotating, and querying textual historic data. Novel features are the deployment of a Wiki user interface, natural language processing services for end users, metadata generation in OWL ontology format, SPARQL queries on textual data, and the integration of external clients through Web Services. We illustrate these ideas with the management of a historic encyclopedia of architecture.

Text Mining: Wissensgewinnung aus natürlichsprachigen Dokumenten

(This webpage is about a technical report on Text Mining, written in German. Try Google Translate for an English version.)
Text Mining Bericht Titelseite

Interner Bericht 2006-5, Fakultät für Informatik, Universität Karlsruhe (TH), Germany

Herausgegeben von René Witte und Jutta Mülle

ISSN 1432-7864

200 Seiten, 75 Abbildungen

An Integration Architecture for User-Centric Document Creation, Retrieval, and Analysis

Toronto

Abstract

The different stages in the life-cycle of content—creation, storage, retrieval, and analysis—are usually regarded as distinct and isolated steps. In this paper we examine the synergies resulting from their integration within a single architecture.

Our goal is to employ such an architecture to improve user support for knowledge-intensive tasks. We present a case study from the area of building architecture, which is currently ongoing.

Engineering a Semantic Desktop for Building Historians and Architects

Page scan from 'Handbuch der Architektur'

Abstract

We analyse the requirements for an advanced semantic support of users—building historians and architects—of a multi-volume encyclopedia of architecture from the late 19th century. Novel requirements include the integration of content retrieval, content development, and automated content analysis based on natural language processing.

We present a system architecture for the detected requirements and its current implementation. A complex scenario demonstrates how a desktop supporting semantic analysis can contribute to specific, relevant user tasks.

A Self-Learning Context-Aware Lemmatizer for German

Vancouver Waterfront

Abstract

Accurate lemmatization of German nouns mandates the use of a lexicon. Comprehensive lexicons, however, are expensive to build and maintain. We present a self-learning lemmatizer capable of automatically creating a full-form lexicon by processing German documents.

Durm German Lemmatizer v1.0 Released

I'm happy to announce the first public release of our free/open source Durm Lemmatization System for the German language. The release comes with source code, binaries, documentation, resources (German lexicon, Case Tagger probabilities), and manually annotated texts from the German Wikipedia for evaluation.

Technical Report on Text Mining (in German)

A new technical report on Text Mining (in German) is now available. This is a collection of reports written by students within a Hauptseminar, which was given by yours truly and Jutta Mülle at Universität Karlsruhe, Germany.

Syndicate content