dca_interface
6.3.4
|
The SCA supports the use of URL Custom Databases that can be used additionally or instead of the internal URL signature database used in the URL Classification module.
Custom Database functionality comes with a module very similar to a classification module. With the module you can create a new database or connect to an already existing one.
For the URLs stored in the database, category descriptions specified as XML files need to be provided. Alternatively, the category descriptions used internally by the SCA for URL Classification can be reused for a Custom Database.
For a complete description of the XML formats refer to Custom Categories
To add URLs in the database a maintenance interface is provided at the dca::UrlCustomDb class level.
URLs can be removed from the database, or added to the database with the associated category ids.
To access a Custom Database for URL Classification simply use a dca::DbConnection class as shown in the section URL Classification.
Just change the DbConnectionData's dbType to DBT_Custom and add a folder that points to the location you want your Custom Database file(s) to be stored in.
Custom Databases are created once by calling the createCustomDb method on dca::UrlCustomDbModule. The resulting database is empty and contains an initial version (e.g. 6.0).
Once created you can connect to your Custom Database by using a standard dca::DbConnection class. Create an instance of a dca::UrlCustomDb using this DbConnection.
You need to fill the Custom Database with URLs and categorizations by using the maintenance interface of your UrlCustomDb class.
You can query the Custom Database by using a dca::UrlDbClassifier at any time, even while you are adding new URLs to the database.
Changes to the Custom Database are applied straight away, but the database file itself is not updated immediately. Updates are first collected in memory and then written to an update file after a given interval. Several times a day the update file(s) are merged and a new database file will be created, replacing the old database file. The settings which control how often update files are written and how often a new database is merged can be specified in the dca::DbConnectionData object, used when a new DbConnection is created.
The database file merge process is implemented in the Scheduler Task (dca::DcaInstance::schedule). You must provide a thread that calls this function in a loop, in order to write the changes to the Custom Database.
Note:
For each new update file the version number of the Custom Database will be increased.
For best performance we support a cache when you are accessing your Custom Database.
The size of the cache is configurable and should be related to the interval of the database update process.
If the size of the cache is too small and you are adding URLs to your Custom Database, the maximum count of cache entries could be exceeded. This would be treated and returned as an error.
To avoid this, you should either increase the size of your cache or reduce the interval of the database update process, or both.
To change the settings for a Custom Database DbConnection object, fill in the related customData member of your DbConnectionData structure.
customdbsamples/createdbsample, customdbsamples/customdbsample, customdbsamples/customdbsample_extended