Monday, January 16, 2012

ATG Content Repository Loader

atg.adapter.gsa.ContentRepositoryLoader is the java class which implements automated content loading. A component of this class will scan a given set of disk folders and synchronize the database with the contents of the folder. This includes reading meta tag properties and populating those fields in the database.

The loader can populate the content repository database when it starts or can be configured to scan the disk folders on a schedule. In a production environment you probably won't want to configure the schedule, since this wastes cycles on production servers and you probably have an event which you can fire when content is changed and approved for publication. It's probably better to start the loader in response to an event. In a site with more than one Dynamo Server remember that they all share the same repository database and so you don't want to set up a loader on every server.
Repository Definition Tags used by the Loader

The Hybrid2 module contains a db\hybrid.xml file which will be combined with the file from Hybrid. Here are the main points of the combination

<gsa-template>
    <header>
        <description>This file adds the properties used by the content loader</description>
    </header>
    <!-- set the content-path-property to be used by the content loader -->
    <item-descriptor name="folder" content-path-property="path">
        <table name="hsqlc_folders" type="primary" id-column-name="id">
            <property name="path" data-type="string"/>
        </table>
    </item-descriptor>
    <item-descriptor name="article" content-path-property="path">
        <table name="hsqlc_articles" type="primary" id-column-name="id">
            <property name="path" data-type="string"/>
        </table>
    </item-descriptor>
</gsa-template>


Folder Item Descriptor

The content loader requires a content-path-property to be specified, we'll

<item-descriptor name="folder" content-path-property="path">
    <table name="hsqlc_folders" type="primary" id-column-name="id">
        <property name="path" data-type="string"/>
    </table>
</item-descriptor>


Article Item Descriptor

Here too we set the content-path-property to path which we now store in the database

    <item-descriptor name="article" content-path-property="path">
        <table name="hsqlc_articles" type="primary" id-column-name="id">
            <property name="path" data-type="string"/>
        </table>
    </item-descriptor>


Configuring the Loader

The loader uses a component of type HTMLMetaTagParser, this component doesn't require much setup, we just need to create the component: /db/HTMLMetaTagParser

$class=atg.adapter.html.HTMLMetaTagParser
$scope=global


The loader component itself requires a little more work, I created mine in /db/ContentRepositoryLoader

$class=atg.adapter.gsa.ContentRepositoryLoader
$scope=global
HTMLMetaTagParser=/db/HTMLMetaTagParser
contentItemDescriptorName=article
ignoreMissingUpdatedStorageFile=true
lastUpdatedStorage=hybrid_auto_loader_update.txt
loggingDebug=true
monitoredPaths=articles
relativePathParent=..\\\\hybrid\\\\doc
removeStaleContentOnUpdate=true
repository=/db/HybridRepository
repositoryType=HTML
scanForUpdates=true
schedule=every\ 1\ minute
scheduler=/atg/dynamo/service/Scheduler


Running the Loader

The loader can be started manually from the DCC. Before starting the loader it's probably best to remove the manually entered folders and articles from the database, either via the DCC or SQL input.

delete from hsqlc_articles

delete from hsqlc_folders

After you start the loader component check the dynamo console for messages with the debug logging turned on I got these messages...

**** debug      Tue Mar 27 18:16:18 PST 2001    985745778617    /db/ContentRepositoryLoader     atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100004(/articles/diary.jhtml)
**** debug      Tue Mar 27 18:16:18 PST 2001    985745778637    /db/ContentRepositoryLoader     atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100005(/articles/thoughts.jhtml)
**** debug      Tue Mar 27 18:16:18 PST 2001    985745778657    /db/ContentRepositoryLoader     atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100006(/articles/journal.jhtml)


To view your new data in the DCC By File view you may need to disconnect and re-connect the DCC from the server
Summary

We've created the basics of a Hybrid SQL Content Repository, and demonstrated how to configure a content loader component to synchronize the database with the content on the file system.

No comments:

Popular Posts