de.fu_berlin.ties.io
Class Split

java.lang.Object
  extended by de.fu_berlin.ties.ConfigurableProcessor
      extended by de.fu_berlin.ties.TextProcessor
          extended by de.fu_berlin.ties.io.Split
All Implemented Interfaces:
Processor

public class Split
extends TextProcessor

Uses a given pattern to split an input file into a series of output files. Base name and extension of the output files are determined from the input file. For example, if the input file is named file.data and contains 87 sections (separated by the configured pattern), 87 output files named file01.data, file02.data, ..., file87.data will be created (the number of leading zeros is determined as required to ensure that all file names have the same length).

If a splitted sequence is empty or contains only whitespace characters, it is not stored and not counted (but since we don't know in advance how many sequences are empty/blank, they are still considered when determining the number of leading zeros that are prepended to filenames).

Instances of this class are thread-safe. It's also possible to use the static split(Reader, File, String, String, Pattern) method without creating an instance.

Version:
$Revision: 1.6 $, $Date: 2006/10/21 16:04:22 $, $Author: siefkes $
Author:
Christian Siefkes

Field Summary
static String CONFIG_PATTERN
          Configuration key: The default pattern used to split input: "split.pattern".
 
Fields inherited from class de.fu_berlin.ties.TextProcessor
CONFIG_POST, KEY_DIRECTORY, KEY_LOCAL_NAME, KEY_OUT_DIRECTORY, KEY_URL
 
Constructor Summary
Split()
          Creates a new instance, using the standard configuration.
Split(TiesConfiguration conf)
          Creates a new instance.
 
Method Summary
protected  void doProcess(Reader reader, Writer writer, ContextMap context)
          Processes the contents of a reader, writing a modified version to a writer. This implementation delegates to split(Reader, File, String, String).
 void split(Reader reader, File directory, String localName, String charset)
          Delegates to the static split(Reader, File, String, String, Pattern) method, using the configured default pattern.
static void split(Reader reader, File directory, String localName, String charset, Pattern pattern)
          Splits an input file into a series of output files, calling Pattern.split(java.lang.CharSequence) and storing each member of the returned array in a separate file.
 
Methods inherited from class de.fu_berlin.ties.TextProcessor
getOutFileExt, process, process, process, process, process, process, toString
 
Methods inherited from class de.fu_berlin.ties.ConfigurableProcessor
getConfig
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

CONFIG_PATTERN

public static final String CONFIG_PATTERN
Configuration key: The default pattern used to split input: "split.pattern".

See Also:
Constant Field Values
Constructor Detail

Split

public Split()
Creates a new instance, using the standard configuration.


Split

public Split(TiesConfiguration conf)
Creates a new instance.

Parameters:
conf - used to configure this instance
Method Detail

split

public static void split(Reader reader,
                         File directory,
                         String localName,
                         String charset,
                         Pattern pattern)
                  throws IllegalArgumentException,
                         IOException
Splits an input file into a series of output files, calling Pattern.split(java.lang.CharSequence) and storing each member of the returned array in a separate file.

Parameters:
reader - reader containing the text to process; not closed by this method
directory - the directory to use for storing the output files; if null, the working directory is used
localName - the name of the input file, used to determine the names of output files
charset - the character set to use for the external files; if null, the default charset of the current platform is used
pattern - the pattern used to split the input
Throws:
IllegalArgumentException - if pattern is null
IOException - if an I/O error occurs while writing the output files

doProcess

protected void doProcess(Reader reader,
                         Writer writer,
                         ContextMap context)
                  throws IOException,
                         ProcessingException
Processes the contents of a reader, writing a modified version to a writer. This implementation delegates to split(Reader, File, String, String).

Specified by:
doProcess in class TextProcessor
Parameters:
reader - reader containing the text to process; should not be closed by this method
writer - the writer to write the processed text to; might be flushed but not closed by this method; if this method does not use the writer, the underlying file will be deleted afterwards
context - a map of objects that are made available for processing; when called from the implemented process methods in this class, it will contain mappings from IOUtils.KEY_LOCAL_CHARSET to the character set of the output writer; from TextProcessor.KEY_OUT_DIRECTORY to the output directory (File); from ContentType.KEY_MIME_TYPE to the document's MIME type; from TextProcessor.KEY_LOCAL_NAME to the local name (String) and either from TextProcessor.KEY_DIRECTORY to the input directory (File), in case of a local file) or from TextProcessor.KEY_URL to the URL (otherwise) of the processed document
Throws:
IOException - if an I/O error occurs
ProcessingException - if an error occurs during processing

split

public void split(Reader reader,
                  File directory,
                  String localName,
                  String charset)
           throws IllegalArgumentException,
                  IOException
Delegates to the static split(Reader, File, String, String, Pattern) method, using the configured default pattern.

Parameters:
reader - reader containing the text to process; not closed by this method
directory - the directory to use for storing the output files; if null, the working directory is used
localName - the name of the input file, used to determine the names of output files
charset - the character set to use for the external files; if null, the default charset of the current platform is used
Throws:
IllegalArgumentException - if the configured default pattern is null
IOException - if an I/O error occurs while writing the output files


Copyright © 2003-2007 Christian Siefkes. All Rights Reserved.