Name | Class | Output Extension | Further Arguments | Description |
---|
adjust | de.fu_berlin.ties.xml.XMLAdjuster | xml | | Tries to fix corrupt XML documents, especially documents containing nesting errors |
analyze | de.fu_berlin.ties.eval.MistakeAnalyzer | mistakes | | Analyses the types of prediction errors that occurred during a test run |
answers | de.fu_berlin.ties.extract.AnswerBuilder | ans | | Builds answer keys from from an annotated text (in XML format) |
avg-length | de.fu_berlin.ties.eval.AverageLength | avl | | Calculates the average length for extractions of different types and evaluation statuses |
class-train | de.fu_berlin.ties.classify.ClassTrain | cls | | Classifies a list of files, training the text classifier on each error |
dsv2xml | de.fu_berlin.ties.xml.convert.DSVtoXMLConverter | xml | | Converts data in DSV format into XML |
eval-preds | de.fu_berlin.ties.eval.PredictionEvaluator | | | Reads a set of files that must contain predictions and evaluates them against the corresponding answer keys (*.ans files) |
externalize | de.fu_berlin.ties.io.Externalize | dsv | | Externalizes the contents of a file in DSV format. For each entry, the contents of one specified field (read from the "externalize.key" configuration parameter) are stored in an external file whose name is stored in the output DSV file instead of its content. |
extract | de.fu_berlin.ties.extract.Extractor | pred | | Extracts relevant information from texts |
filter | de.fu_berlin.ties.classify.TextFilter | | | A simple filter for classifying and/or training text files |
preprocess | de.fu_berlin.ties.preprocess.PreProcessor | aug | | Preprocesses documents by converting them to a suitable XML format and adding lingustic information |
re-eval | de.fu_berlin.ties.eval.ReEvaluator | ext | | Re-evaluates evaluated extractions (useful for switching the match mode -- eval.match.all) |
shuffle | de.fu_berlin.ties.eval.ShuffleGenerator | | | Creates random "shuffles" of input arguments (e.g. files or URLs) |
shuffle-lines | de.fu_berlin.ties.eval.LineShuffleGenerator | rand | | Randomly reshuffles the lines in a file |
simple-quotes | de.fu_berlin.ties.text.SimplifyQuotes | txt | | Simplifies different kinds of quotes that can occur in text files |
split | de.fu_berlin.ties.io.Split | | | Splits an input file into a series of output files |
strip | de.fu_berlin.ties.xml.dom.XMLStripper | txt | | Strips all markup from an XML document and stores the resulting plain text |
train | de.fu_berlin.ties.extract.Trainer | | | Trains the classifier used to extract information |
train-eval | de.fu_berlin.ties.extract.TrainEval | metrics | | Trains an extractor and evaluates extraction quality |
unflatten | de.fu_berlin.ties.xml.convert.AttributeUnflatten | xml | | Unflattens an XML document, reading labels for a combination strategy from an XML attribute ("class" by default) |