Data Formats

Many processing goals use the Delimiter-separated Values (DSV) format documented below.

Delimiter-Separated Values (DSV)

DSV format stores a number of entries with multiple fields.

Example file:

    Class|File
    spam|spam1.txt
    nonspam|other.txt
    spam|spam2.txt

Fields are identified by their header, so the following would be equivalent:

    File|Class
    spam1.txt|spam
    other.txt|nonspam
    spam2.txt|spam

This format is very similar to the CSV format common on Windows and the format used on Unix systems for files such as /etc/group + /etc/password. Main differences are the use of a pipe as field separator and the identification of fields by header instead of position.