Class BRobotsTxt

java.lang.Object
   |
   +----java.util.Vector
           |
           +----BRobotsTxt

public class BRobotsTxt
extends Vector
An object of this class represents the robot-protection information of one internet site (URL).
Robot Exclusion: URLs can be specified to be forbiden to be parsed, this specification and additional informations are contained in the robots.txt file at serversite.

This class uses a weak implementation of the Robot Exclusion Standard as it is used commonly throughout the web. There is an Internet-Draft for a more rigid version. Since this is yet not a standard and the author believes that the existing convention offers good oppotunity to protect websites from being indexed unwillingly, the following implementation fits the requirements ;-)
For more information on this topic vistit http://info.webcrawler.com/mak/projects/robots/.
View the java-source file:

The program works as follows:
- For each url look for the File robot.txt
- parse the file and check if the URL if forbidden for robots


If requred here could follow an important part of the code
(that explayns something! this is just an example): 
private String[] init(URL url, String robotName)
{
Vector v = getRobotFile(url, robotName.toLowerCase());
String[] stringArray = new String[v.size()];
v.copyInto(stringArray);
return stringArray;
} 

Version:
0.1 7-Jan-1999
Author:
Simon Berg, Sabisch,Hinze (documentational changes jan-99)

Constructor Index

 o BRobotsTxt(URL, String)
requests the robots.txt from the server, uses isAllowed

Method Index

 o isAllowed(URL)
Checks, if robot-access is allowed

Constructors

 o BRobotsTxt
 public BRobotsTxt(URL url,
                   String robotName) throws NoRobotInformationException
requests the robots.txt from the server, uses isAllowed

Parameters:
url - unified resource locator, directory and filename.
robotName - specifies the robotclass, case-insensitive
Returns:
none
Throws: NoRobotInformationException
if no information is found.

Methods

 o isAllowed
 public boolean isAllowed(URL url)
Checks, if robot-access is allowed

Parameters:
url - unified resource locator, directory and filename.
Returns:
true if robot-access of url is allowed, false otherwise