dca_interface
6.3.4
|
The URL Classification returns categories for a given URL object (if found in the database) or whether the URL is unknown or simply not categorized.
To use the URL Classification functions, the URL Classification Package must first be initialized. To do this, create an instance of the dca::UrlClassification module using dca::UrlClassification::create().
Next, a connection to a URL database must be set up. Refer to Setting up a Database Connection for the steps required to do this.
Once a connection to a URL database has been established, an instance of a dca::UrlDbClassifier must be created. Use dca::UrlClassification::createDbClassifier(), passing as a parameter the newly created database connection object. If you wish to specify options for the classifier (for example, options for Embedded URLs or to enable Feedback mechanism), you should additionally create and initialize a dca::UrlDbClassifierOptions object.
The UrlDbClassifier class classifies dca::Url objects. To create a Url object from a URL text string, use dca::Url::create().
To classify a URL, use dca::UrlDbClassifier::classify(). This function analyzes the URL using the database specified by the database connection and returns the results of a classification in a dca::UrlClassificationResults object. This object is a container for individual results (one result per matched category), and can be iterated over to obtain information on each category matched.
A complete list of suppported URL categories can be found at https://exchange.xforce.ibmcloud.com/faq#info_for_url_report
Before an URL is looked up in our URL database, it has to be normalized in the same way, as it is done on our content analysis servers, where the URL database is created.
If you are using the SCA in a proxy behind a Web Browser this conversion is done by the Web Brower in general.
You may use either
The following schemas are supported by URL classification:
If an URL does not include a schema, http will be assumed
If a URL is not found in the database, the URL is classed as unknown. The function dca::UrlClassificationResults::isUnknownUrl() can be used to check this.
The Feedback option exists to help us to improve the quality of our classifications. Unknown URLs are collected and uploaded to our servers in a given interval. Also some statistics about matched categories and not-categorized classifier calls are collected and submitted.
Uploading such information is done during the dca::UpdateModule::performUpdate() call.
To enable the Feedback option for a UrlDbClassifier, the option enable_Feedback of the dca::UrlDbClassifierOptions must be set to true before creating a UrlDbClassifier.
The following code demonstrates the classification of a URL.