Efficient Multi-Profile Filtering using Finite Automata

Lukas C. Faulstich
Institut für Informatik
Freie Universität Berlin
email: faulstic@inf.fu-berlin.de

Report B 01-03
March 2001

The task of an alerting system is to observe events, produce a stream of event notifications, and deliver each notification to all clients whose profiles match this notification. The details of the delivery are specified by a notification policy that is part of a profile.

This report deals with the problem of matching an incoming document against a set of profiles. Focusing on this aspect, we can model profiles as unary predicates on the set of documents. If profiles are arbitrary unary predicates on documents, each profile must be checked separately. However, in many application domains such as digital libraries, much simpler predicates are common.

In this paper we focus on substring matching. We extend the well-known technique (cf. [1]) of constructing a deterministic finite state machine from a single search string and feeding the document as input to it. We show that any finite number of simple substring profiles can be checked simultaneously by a single deterministic finite state machine in linear time. We give a construction for this automaton and show that it has very modest storage requirements. We present methods for the incremental maintenance of this automaton in case of insertion or deletion of profiles.

Extensions of simple substring profiles to exact word matching, phrase matching, and profiles against different parts (fields) of a document are straight-forward. An extension to Boolean profiles based on post-processing by an additional layer of finite state machines is discussed. However, the advantage of linear running time is lost; the performance of this approach is similar to the methods presented in [8].

Get the report in Gzipped PostScript