Class BRobotsTxt

java.lang.Object
   |
   +----java.util.Vector
           |
           +----BRobotsTxt

public class BRobotsTxt
extends Vector

An object of this class represents the robot-protection information of one internet site (URL).
Robot Exclusion: URLs can be specified to be forbiden to be parsed, this specification and additional informations are contained in the robots.txt file at serversite.

This class uses a weak implementation of the Robot Exclusion Standard as it is used commonly throughout the web. There is an Internet-Draft for a more rigid version. Since this is yet not a standard and the author believes that the existing convention offers good oppotunity to protect websites from being indexed unwillingly, the following implementation fits the requirements ;-)
For more information on this topic vistit http://info.webcrawler.com/mak/projects/robots/.
View the java-source file:

The program works as follows:
- For each url look for the File robot.txt
- parse the file and check if the URL if forbidden for robots

If requred here could follow an important part of the code
(that explayns something! this is just an example): 
private String[] init(URL url, String robotName)
{
Vector v = getRobotFile(url, robotName.toLowerCase());
String[] stringArray = new String[v.size()];
v.copyInto(stringArray);
return stringArray;
}

Version:: 0.1 7-Jan-1999
Author:: Simon Berg, Sabisch,Hinze (documentational changes jan-99)

BRobotsTxt(URL, String): requests the robots.txt from the server, uses isAllowed

isAllowed(URL): Checks, if robot-access is allowed

BRobotsTxt

 public BRobotsTxt(URL url,
                   String robotName) throws NoRobotInformationException

requests the robots.txt from the server, uses isAllowed

Parameters:: url - unified resource locator, directory and filename.; robotName - specifies the robotclass, case-insensitive
Returns:: none
Throws: NoRobotInformationException: if no information is found.

isAllowed

 public boolean isAllowed(URL url)

Checks, if robot-access is allowed

Parameters:: url - unified resource locator, directory and filename.
Returns:: true if robot-access of url is allowed, false otherwise