atg.adapter.gsa.ContentRepositoryLoader is the java class which implements automated content loading. A component of this class will scan a given set of disk folders and synchronize the database with the contents of the folder. This includes reading meta tag properties and populating those fields in the database.
The loader can populate the content repository database when it starts or can be configured to scan the disk folders on a schedule. In a production environment you probably won't want to configure the schedule, since this wastes cycles on production servers and you probably have an event which you can fire when content is changed and approved for publication. It's probably better to start the loader in response to an event. In a site with more than one Dynamo Server remember that they all share the same repository database and so you don't want to set up a loader on every server.
Repository Definition Tags used by the Loader
The Hybrid2 module contains a db\hybrid.xml file which will be combined with the file from Hybrid. Here are the main points of the combination
<gsa-template>
<header>
<description>This file adds the properties used by the content loader</description>
</header>
<!-- set the content-path-property to be used by the content loader -->
<item-descriptor name="folder" content-path-property="path">
<table name="hsqlc_folders" type="primary" id-column-name="id">
<property name="path" data-type="string"/>
</table>
</item-descriptor>
<item-descriptor name="article" content-path-property="path">
<table name="hsqlc_articles" type="primary" id-column-name="id">
<property name="path" data-type="string"/>
</table>
</item-descriptor>
</gsa-template>
Folder Item Descriptor
The content loader requires a content-path-property to be specified, we'll
<item-descriptor name="folder" content-path-property="path">
<table name="hsqlc_folders" type="primary" id-column-name="id">
<property name="path" data-type="string"/>
</table>
</item-descriptor>
Article Item Descriptor
Here too we set the content-path-property to path which we now store in the database
<item-descriptor name="article" content-path-property="path">
<table name="hsqlc_articles" type="primary" id-column-name="id">
<property name="path" data-type="string"/>
</table>
</item-descriptor>
Configuring the Loader
The loader uses a component of type HTMLMetaTagParser, this component doesn't require much setup, we just need to create the component: /db/HTMLMetaTagParser
$class=atg.adapter.html.HTMLMetaTagParser
$scope=global
The loader component itself requires a little more work, I created mine in /db/ContentRepositoryLoader
$class=atg.adapter.gsa.ContentRepositoryLoader
$scope=global
HTMLMetaTagParser=/db/HTMLMetaTagParser
contentItemDescriptorName=article
ignoreMissingUpdatedStorageFile=true
lastUpdatedStorage=hybrid_auto_loader_update.txt
loggingDebug=true
monitoredPaths=articles
relativePathParent=..\\\\hybrid\\\\doc
removeStaleContentOnUpdate=true
repository=/db/HybridRepository
repositoryType=HTML
scanForUpdates=true
schedule=every\ 1\ minute
scheduler=/atg/dynamo/service/Scheduler
Running the Loader
The loader can be started manually from the DCC. Before starting the loader it's probably best to remove the manually entered folders and articles from the database, either via the DCC or SQL input.
delete from hsqlc_articles
delete from hsqlc_folders
After you start the loader component check the dynamo console for messages with the debug logging turned on I got these messages...
**** debug Tue Mar 27 18:16:18 PST 2001 985745778617 /db/ContentRepositoryLoader atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100004(/articles/diary.jhtml)
**** debug Tue Mar 27 18:16:18 PST 2001 985745778637 /db/ContentRepositoryLoader atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100005(/articles/thoughts.jhtml)
**** debug Tue Mar 27 18:16:18 PST 2001 985745778657 /db/ContentRepositoryLoader atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100006(/articles/journal.jhtml)
To view your new data in the DCC By File view you may need to disconnect and re-connect the DCC from the server
Summary
We've created the basics of a Hybrid SQL Content Repository, and demonstrated how to configure a content loader component to synchronize the database with the content on the file system.
The loader can populate the content repository database when it starts or can be configured to scan the disk folders on a schedule. In a production environment you probably won't want to configure the schedule, since this wastes cycles on production servers and you probably have an event which you can fire when content is changed and approved for publication. It's probably better to start the loader in response to an event. In a site with more than one Dynamo Server remember that they all share the same repository database and so you don't want to set up a loader on every server.
Repository Definition Tags used by the Loader
The Hybrid2 module contains a db\hybrid.xml file which will be combined with the file from Hybrid. Here are the main points of the combination
<gsa-template>
<header>
<description>This file adds the properties used by the content loader</description>
</header>
<!-- set the content-path-property to be used by the content loader -->
<item-descriptor name="folder" content-path-property="path">
<table name="hsqlc_folders" type="primary" id-column-name="id">
<property name="path" data-type="string"/>
</table>
</item-descriptor>
<item-descriptor name="article" content-path-property="path">
<table name="hsqlc_articles" type="primary" id-column-name="id">
<property name="path" data-type="string"/>
</table>
</item-descriptor>
</gsa-template>
Folder Item Descriptor
The content loader requires a content-path-property to be specified, we'll
<item-descriptor name="folder" content-path-property="path">
<table name="hsqlc_folders" type="primary" id-column-name="id">
<property name="path" data-type="string"/>
</table>
</item-descriptor>
Article Item Descriptor
Here too we set the content-path-property to path which we now store in the database
<item-descriptor name="article" content-path-property="path">
<table name="hsqlc_articles" type="primary" id-column-name="id">
<property name="path" data-type="string"/>
</table>
</item-descriptor>
Configuring the Loader
The loader uses a component of type HTMLMetaTagParser, this component doesn't require much setup, we just need to create the component: /db/HTMLMetaTagParser
$class=atg.adapter.html.HTMLMetaTagParser
$scope=global
The loader component itself requires a little more work, I created mine in /db/ContentRepositoryLoader
$class=atg.adapter.gsa.ContentRepositoryLoader
$scope=global
HTMLMetaTagParser=/db/HTMLMetaTagParser
contentItemDescriptorName=article
ignoreMissingUpdatedStorageFile=true
lastUpdatedStorage=hybrid_auto_loader_update.txt
loggingDebug=true
monitoredPaths=articles
relativePathParent=..\\\\hybrid\\\\doc
removeStaleContentOnUpdate=true
repository=/db/HybridRepository
repositoryType=HTML
scanForUpdates=true
schedule=every\ 1\ minute
scheduler=/atg/dynamo/service/Scheduler
Running the Loader
The loader can be started manually from the DCC. Before starting the loader it's probably best to remove the manually entered folders and articles from the database, either via the DCC or SQL input.
delete from hsqlc_articles
delete from hsqlc_folders
After you start the loader component check the dynamo console for messages with the debug logging turned on I got these messages...
**** debug Tue Mar 27 18:16:18 PST 2001 985745778617 /db/ContentRepositoryLoader atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100004(/articles/diary.jhtml)
**** debug Tue Mar 27 18:16:18 PST 2001 985745778637 /db/ContentRepositoryLoader atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100005(/articles/thoughts.jhtml)
**** debug Tue Mar 27 18:16:18 PST 2001 985745778657 /db/ContentRepositoryLoader atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100006(/articles/journal.jhtml)
To view your new data in the DCC By File view you may need to disconnect and re-connect the DCC from the server
Summary
We've created the basics of a Hybrid SQL Content Repository, and demonstrated how to configure a content loader component to synchronize the database with the content on the file system.
No comments:
Post a Comment