Skip to content

Correspondence Technologies

Sections
Personal tools
You are here: Home » Software » SfIndex Demo » collective - CMFWebAgent

Package: collective - CMFWebAgent

An application that allows you to monitor the entire web for new pages related to topics of interest. Information found is stored as news items on your portal.

a test of SfIndex product

Releases

Release Date Release Name Download
2002-06-21 22:00 1.0 sourceforge

Package README (CVS ver. 1.1.1.1)

CMF Web Agent

  The CMF Web Agent is an application that allows you to monitor the
  entire web (or as much as is indexed by several popular search
  engines, anyway) for new pages related to topics of interest.

  For example, monitor the web for any mention of your new startup
  company, and display the results as a news list on your company
  intranet.  Alternatively, monitor the net for your own name or email
  address and keep the results in your private content management
  portal.

Disclaimer

  This application makes use of the public web search engines
  (currently Google, http://www.google.com, and AllTheWeb,
  http://www.alltheweb.com) to detect new content.  The author of this
  program strongly urges users to not abuse the services provided by
  these search engines.

  **Do not configure your agent to run the same query more than once
  per day.**

  **Do not configure your agent to run more than a few different
  queries.**

Requirements

  The agent operates directly on the database for your CMF-based Zope
  portal.  You need to be running a CMF portal on a Zope server with a
  ZEO backend to allow the agent to connect and insert content into
  the database.

  Since all of that software, including the agent, is written in
  Python, you also need to have Python installed.  Versions 2.1.1+ are
  supported.

Operation

  The agent creates "CMF News Items" in your portal.  To do this, it
  needs to connect to the object database and assume the identify of a
  portal member.  The portal member to be used can be specified in the
  configuration file (see below), but it is most appropriate to create
  a new user such as 'google' or 'agent' for this purpose.  The name
  does not matter, but the user should not be used for other purposes.

  When the agent connects to the database, it looks through the member
  home directory in the portal.  The agent examines each 'Folder'
  object it finds.  The 'description' property is interpreted as a
  search query for the search engines.  The agent runs the search,
  providing additional parameters to control the output from the
  search engine (for example, to maximize the number of items per page
  or select a specific catalog in which to search).  Each "hit" in the
  search results is stored as a News Item in the current directory.

Installation

  Create an Identify for the Agent

    First, create the user to be used by the agent.  For the purposes
    of this document, the name 'agent' will be used.  If the agent is
    to be allowed to publish documents directly, give the new user the
    'Reviewer' role.

      **NOTE** -- Currently only the default workflow is supported.
      This should not be difficult to change, if someone wants to
      contribute a patch.

      **NOTE** -- Currently the user must have the 'Reviewer' role.
      An enhancement to make this requirement optional has been
      submitted by Tres Seavers, but I have not integrated it yet.

  Install and Configure the Agent Software

    Once the user is created, the next step is to install the agent
    software.  The agent can run on any computer system which has
    network access to the ZEO server for the portal.  For the purposes
    of these instructions, we will assume that the agent is going to
    run on the same computer as the ZEO server.

    1. Login to the server as the user that owns the Zope
       installation.  Change directory to one level above the
       INSTANCE_HOME for the Zope server.  For the purposes of these
       instructions we will assume that the two are separate and that
       your INSTANCE_HOME is in a directory called 'InstanceHome'.

    2. Extract the tarball containing the CMF Agent source files to
       create a new directory called CMFAgent-X where X is a version
       number that will depend on which version of the package you
       have downloaded.  Change directory into the CMFAgent-X
       directory, hereafter refered to as the AGENT_HOME.

    3. In order for the agent to connect to the ZEO server, you must
       tell it where to find the server process.  Two files are
       necessary for this, the 'custom_zodb.py' file tells the Zope
       libraries how to connect to the database and the 'zope.conf'
       file tells the 'custom_zodb.py' where that database is on the
       network.  It is normally sufficient to copy the two files
       'custom_zodb.py' and 'zope.conf' from the INSTANCE_HOME of your
       server.

      For example::

        % cp ../InstanceHome/custom_zodb.py .
        % cp ../InstanceHome/zope.conf .

    4. As the agent connects to the ZEO server like any other ZEO
       client, it will have access to the data stored within.  It will
       not, however, know how to interpret that data until all of the
       Zope software and associated Products are made available to it.
       The 'zope.conf' file from the previous step sets things up so
       the agent can access the Zope core software.  Products which
       are installed outside of your INSTANCE_HOME should also be
       available as a result of copying this file over.

       Products installed within the INSTANCE_HOME, however, will not
       be.  If there are any product directories within your
       INSTANCE_HOME, it is important to make them visible in the
       AGENT_HOME.  On a system which supports it, symbolic links are
       the easiest way to do this.

       For example::

         % ln -s ../InstanceHome/Products .

  Configuring the Agent Software

    Now all that remains is to configure the agent so that it knows
    its identity within the portal, and set up the search settings to
    be monitored.

    1. The agent configuration is read from the file 'agent.conf' in
       the AGENT_HOME directory.  For a new installation, create the a
       fresh version of this file by copying 'agent.conf.in'.  For an
       upgrade installation, merge the changes to the file in your
       existing AGENT_HOME, with the new 'agent.conf.in' to create a
       new file.

    2. The format of the configuration file is standard Python.  The
       file is read into memory and executed in a protected sandbox
       when the agent starts up.  Required values are extracted from
       the resulting namespace, and other values are ignored.  This
       means you can litter the configuration file with whatever you
       might need in order to build up the configuration values.

    3. Edit the configuration file to set the 'member_name' and
       'portal_name' properties.  The value 'member_name' should be
       set to the name of the portal user created earlier.  The
       'portal_name' should refer to the dotted path from the root of
       the ZODB to the CMF Portal.

       For example, if you reach the portal through the URL::

         http://www.madeupname.com/path/to/Portal

       then the 'portal_name' would be::

         portal_name = 'path.to.Portal'

         **NOTE** - 'SiteAccess' users, the full path from the true
         root of the database must be provided.  This would include
         any folders with 'SiteRoot' objects.

  Configuring the Search Topics

    And now the software is ready to use!  All that remains is to set
    up the searches in the Portal.  That configuration is done in the
    portal as the user created earlier.  Login to the portal through
    your web browser as the user specified as 'member_name' above,
    then for each search topic follow these instructions.

    1. Create a folder to contain the results of the search topic.

    2. Set the 'id' of the folder to a meaninful name.

    3. Set the 'description' of the folder to the search terms to be
       used.  Whatever terms you would type into the search form on
       the web server main page should be entered directly.  The
       format and encoding will be changed as needed before the search
       is submitted to the search engine.

  Running the Agent

    To run the agent, change directory to the AGENT_HOME and run the
    program 'agent.py'.  The configuration data is detected from the
    location of the software, and the search terms are pulled out of
    the portal.  As the search runs, new hits will be printed to
    stdout.  This makes it easy for you to set up a cron job to run
    daily (not more frequently, please) and send you email with new
    results as well as update the portal contents.

    Additional command line arguments, mostly to control the amount of
    progress information printed during operation, are available.  Use
    -h to get a list.

 

Powered by Plone

This site conforms to the following standards: