Stopwortlisten und Stemming

Imlementiert von Andreas Hetey
Dokumentiert anhand der Java-Dokumentation

Die in den Stopwortlisten enthaltenen Wörter weden aus dem Text entfernt. Die verbleibenden werden anhand unten aufgeführter Stemmingregel behandelt. Die verwendete Strache wird anhand der vorkommenden Wörtern erkannt (Vergleich mit ausgewählten Begriffen, z.B. deutschen Artkeln etc.)

Deutsche Wortliste

Die folgende Stopwortliste gilt für die deutsche Sprache:

ab, aber, alle, allen, aller, am, an, andere, anderem , anderen, anderer, anderes, ans, auf, aufwaerts, aus, bei, beim, das, dein, dem, den, denn, der, des, dich, die, diese, diese, diesem, diesen, dieser, dieser, dieses, dir, drei, dreie, dreien, dreier, du, ein, eine, einem, einen, einer, eines, einige, einigen, einiger, er, es, euch, euer, für, heraus, herein, herunter, hinaus, hinein, hinter, hinunter, ich, ihm, ihn, ihnen, ihr, im, in, ins, jede, jedem , jeden, jeder, jedes, jemand, jene, jenem, jenen, jener, jenes, keine, keinem, keinen, keiner, keines, man, mein, mich, mir, mit, nach, neben, niemand, ob, ohne, sein, selbst, sich, sie, so, über, um, und, uns, unser, unter, verschiedene, versch iedenen, verschiedener, viele, vielen, vieler, von, vor, wann, warum, was, wegen, weil, welche, welchem, welchen, welcher, welches, wem, wen, wer, wes, wessen, wie, wieviele, wievielem, wievielen, wievieler, wievieles, wir, wo, zehn, zu, zum, zur, zwei, zweie, zweien, zweier

Das Stemming der deutschen Wörter unterliegt folgenden Regeln:

  • leave unchanged if ends with ee or ss
  • remove ending e,en,n,er,ern if remaining word has minimum word-length (eg. 2) vowels and umlauts
  • change au => aeue
  • change vowels (append 'e') a => ae
  • change {ß} ß => ss
  • change umlauts eg. {ä} ä => ae

    Englische Wortliste

    Die folgende Stopwortliste gilt für die englische Sprache:

    a, above, according, across, actually, adj, after, afterwards, again, against, almost, alone, along, also, although, always, among, amongst, an, and, another, any, anyhow, anyone, anything, anywhere, are, aren, aren't, around, as, at, be, became, because, become, becomes, been, before, beforehand, begin, behind, being, below, beside, besides, between, both, but, by, can, can't, cannot, co, could, couldn, couldn't, did, didn, didn't, do, does, doesn, doesn't, don , don't, down, during, each, eg, else, elsewhere, end, ending, enough, etc, even, ever, every, everyone, everything, everywhere, except, few, for, from, further, had , has, hasn, hasn't, have, haven, haven't, he, hence, her, here, hereby, herein, hereupon, hers, herself, him, himself, his, how, however, ie, i.e., if, in, inc, inc., indeed, instead, into, is, isn, isn't, it, its, itself, last, later, latterly, least, less, let, like, likely, ll, ltd, made, make, makes, many, maybe, me, meantime, meanwhile, might, miss, more, moreover, most, mostly, mr, mrs, much, must , my, myself, namely, next, no, nobody, none, nonetheless, noone, nor, not, nothing, now, nowhere, of, off, often, on, one, only, onto, or, others, otherwise, our, ours, ourselves, out, over, own, per, perhaps, rather, re, recent, recently, same , seem, seemed, seeming, seems, several, she, should, shouldn, shouldn't, since, so , some, somehow, someone, something, sometime, sometimes, somewhere, still, such, taking, than, that, the, their, them, themselves, then, thence, there, thereafter, thereby, therefore, therein, thereupon, these, they, this, those, though, through, throughout, thru, thus, to, together, too, toward, towards, under, unless, unlike, unlikely, until, up, upon, us, used, using, ve, very, via, was, wasn, we, we, well, were, weren, weren't, what, whatever, when, whence, whenever, where, whereafter, whereas, whereby, wherein, whereupon, wherever, whether, which, while, who, whoever, whole, whom, whomever, whose, why, will, with, within, without, would, wouldn, wouldn't, yes, yet, you, your, yours, yourself, yourselves

    Das Stemming der englischen Wörter unterliegt folgenden Regeln:

  • remove ending
  • if a word ends with a consonant other than s, followed by an s, then delete s.
  • if a word ends in es, drop the s.
  • if a word ends in ing, delete the ing unless the remaining word consists only of one letter or of th.
  • If a word ends with ed, preceded by a consonant, delete the ed unless this leaves only a single letter.
  • transform the remaining word if a word ends with "ies" but not "eies" or "aies" then "ies --> y."
    Last modified: Mon Feb 15 09:53:01 MET 1999