|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.fu_berlin.ties.ConfigurableProcessor
de.fu_berlin.ties.TextProcessor
de.fu_berlin.ties.io.Split
public class Split
Uses a given pattern to split an input file into a series of output files. Base name and extension of the output files are determined from the input file. For example, if the input file is named file.data and contains 87 sections (separated by the configured pattern), 87 output files named file01.data, file02.data, ..., file87.data will be created (the number of leading zeros is determined as required to ensure that all file names have the same length).
If a splitted sequence is empty or contains only whitespace characters, it is not stored and not counted (but since we don't know in advance how many sequences are empty/blank, they are still considered when determining the number of leading zeros that are prepended to filenames).
Instances of this class are thread-safe. It's also possible to use the
static split(Reader, File, String, String, Pattern)
method without
creating an instance.
Field Summary | |
---|---|
static String |
CONFIG_PATTERN
Configuration key: The default pattern used to split input: "split.pattern". |
Fields inherited from class de.fu_berlin.ties.TextProcessor |
---|
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL |
Constructor Summary | |
---|---|
Split()
Creates a new instance, using the standard configuration. |
|
Split(TiesConfiguration conf)
Creates a new instance. |
Method Summary | |
---|---|
protected void |
doProcess(Reader reader,
Writer writer,
ContextMap context)
Processes the contents of a reader, writing a modified version to a writer. This implementation delegates to split(Reader, File, String, String) . |
void |
split(Reader reader,
File directory,
String localName,
String charset)
Delegates to the static split(Reader, File, String, String, Pattern) method, using the
configured default pattern. |
static void |
split(Reader reader,
File directory,
String localName,
String charset,
Pattern pattern)
Splits an input file into a series of output files, calling Pattern.split(java.lang.CharSequence) and storing each member
of the returned array in a separate file. |
Methods inherited from class de.fu_berlin.ties.TextProcessor |
---|
getOutFileExt, process, process, process, process, process, process, toString |
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor |
---|
getConfig |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final String CONFIG_PATTERN
Constructor Detail |
---|
public Split()
public Split(TiesConfiguration conf)
conf
- used to configure this instanceMethod Detail |
---|
public static void split(Reader reader, File directory, String localName, String charset, Pattern pattern) throws IllegalArgumentException, IOException
Pattern.split(java.lang.CharSequence)
and storing each member
of the returned array in a separate file.
reader
- reader containing the text to process; not closed by this
methoddirectory
- the directory to use for storing the output files;
if null
, the working directory is usedlocalName
- the name of the input file, used to determine the names
of output filescharset
- the character set to use for the external files;
if null
, the default charset of the current platform is usedpattern
- the pattern used to split the input
IllegalArgumentException
- if pattern
is
null
IOException
- if an I/O error occurs while writing the output
filesprotected void doProcess(Reader reader, Writer writer, ContextMap context) throws IOException, ProcessingException
split(Reader, File, String, String)
.
doProcess
in class TextProcessor
reader
- reader containing the text to process; should not be closed
by this methodwriter
- the writer to write the processed text to; might be flushed
but not closed by this method; if this method does not use the writer,
the underlying file will be deleted afterwardscontext
- a map of objects that are made available for processing;
when called from the implemented process
methods in this
class, it will contain mappings from IOUtils.KEY_LOCAL_CHARSET
to the character set of the output writer;
from TextProcessor.KEY_OUT_DIRECTORY
to the output directory (File
);
from ContentType.KEY_MIME_TYPE
to the document's MIME type; from
TextProcessor.KEY_LOCAL_NAME
to the local name (String) and either from
TextProcessor.KEY_DIRECTORY
to the input directory (File
), in case of
a local file) or from TextProcessor.KEY_URL
to the URL
(otherwise) of
the processed document
IOException
- if an I/O error occurs
ProcessingException
- if an error occurs during processingpublic void split(Reader reader, File directory, String localName, String charset) throws IllegalArgumentException, IOException
split(Reader, File, String, String, Pattern)
method, using the
configured default pattern.
reader
- reader containing the text to process; not closed by this
methoddirectory
- the directory to use for storing the output files;
if null
, the working directory is usedlocalName
- the name of the input file, used to determine the names
of output filescharset
- the character set to use for the external files;
if null
, the default charset of the current platform is used
IllegalArgumentException
- if the configured default pattern is
null
IOException
- if an I/O error occurs while writing the output
files
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |