GATE Components

Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles


Abstract

Reported speech in the form of direct and indirect reported speech is an important indicator of evidentiality in traditional newspaper texts, but also increasingly in the new media that rely heavily on citation and quotation of previous postings, as for instance in blogs or newsgroups. This paper details the basic processing steps for reported speech analysis and reports on performance of an implementation in form of a GATE resource.

Durm German Lemmatizer v1.0 Released

I'm happy to announce the first public release of our free/open source Durm Lemmatization System for the German language.

The release comes with source code, binaries, documentation, resources (German lexicon, Case Tagger probabilities), and manually annotated texts from the German Wikipedia for evaluation.

Multi-lingual Noun Phrase Chunker Updated

I just posted a small update to my multi-lingual noun phrase chunker (MuNPEx) for GATE. Changes in v0.2 are: o preliminary Spanish support (see below) o renamed from "NPE" to "MuNPEx" in a blatant attempt on Googlewhacking o small cleanups o now comes with a sample NE transducer for number markup to improve chunking Supported languages are now English, German, French, and Spanish (beta).
Syndicate content