url_samples: urldbsample


Introduction

This sample shows how to use URL Classification using a local URL database.

Syntax:

urldbsample <dca-redist-folder> <ticket> <product> <url-list-file>
url-list-file
file that includes the URLs to classify (one per line) Some sample data is provided in the inital URL database, the coresponding input for the sample program can be found in: url_test_data.txt
Note:
To get classification results you will need first to install a local URL database. This can be done by using the url_samples: urldbdownloadsample.
See also:

Files

file  url_samples/urldbsample/main.cpp
 

URL Classification using a local URL database sample program.


Defines

#define DCA_BINDIR   "bin/linux"
 DCA subdirectory of the DCA binaries.
#define DCA_INITDIR   "init"
 DCA subdirectory of the DCA initialization data.
#define DCA_LOGDIR   "./logs"
 Relative directory for logfile(s).

Functions

static void SetupInitData (const std::string &redist_folder, InitData &initData)
 Sets up the given initData by substituting the given redist_folder with DCA subdirectories.
static bool StartupLibraries ()
 Initializes 3rd party library libcurl and set up open ssl callbacks to startdard implementation.
static void ShutdownLibraries ()
 Shuts down 3rd party libraries. On Windows also WSACleanup is called to shutdown Windows sockets for this process.
static void SetupLicense (const std::string &ticket, const std::string &product, LicenseData &licenseData)
 Sets up the given licenseData by copying the given ticket and product strings.
static void SetupConnectionData (DbConnectionData &cData)
 Sets up the given cData to use a local URL database.
static void PrintLicenseInfo (const License &aLicense)
 Prints out the information about the provided License.
static void PrintDbConnectionInfo (const DbConnection &aDbConnection)
 Prints out the version and datestamp of the local database.
static void PrintResults (const CategoriesInfo &catinfos, const UrlClassificationResults &cats)
 Prints out the classification results and uses the categories info for textual representation of the matched categories.
static void PrintToolHeader ()
 Prints out the name and the version of this sample.
static void PrintUsage (const char *name)
 Prints out the syntax of the sample.
static void LoadUrlFile (const std::string &fileName, std::vector< std::string > &urlList)
 Loads given fileName and puts each line to given urlList (by deleting trailing CRLFs).
void TestUrlClassification (const std::string &aUrlListFile, const DcaInstance &myDca, const UrlDbClassifier &myUrlDbClassifier, const CategoriesInfo &myCategoriesInfo)
 Performs the URL database classification with all URLs found in a given text file.
int main (int argc, char *argv[])
 The main routine.

Variables

const std::string S_UsageString
 Usage string, displayed if a parameter is missing.

Function Documentation

static void SetupInitData ( const std::string &  redist_folder,
InitData initData 
) [static]

Sets up the given initData by substituting the given redist_folder with DCA subdirectories.

Parameters:
[in] redist_folder This is the folder where the DCA has been installed to (assuming trailing fileslash)
[out] initData The InitData structure to set up
Note:
Only DCA_BINDIR differs between Windows and Linux
The directory ./logs is used for the logfile(s)

Definition at line 100 of file url_samples/urldbsample/main.cpp.

static bool StartupLibraries (  )  [static]

Initializes 3rd party library libcurl and set up open ssl callbacks to startdard implementation.

On Windows its necessary to initalize Windows sockets to support IP(v6) addresses as input data.

Returns:
true if nothing fails, false only on Windows if WSAStartup returned an error

Definition at line 118 of file url_samples/urldbsample/main.cpp.

static void SetupLicense ( const std::string &  ticket,
const std::string &  product,
LicenseData licenseData 
) [static]

Sets up the given licenseData by copying the given ticket and product strings.

Parameters:
[in] ticket This is the ticket data as provided with your DCA license
[in] product This is the product shortcut e.g. DC oder MS etc
[out] licenseData The LicenseData structure to set up

Definition at line 164 of file url_samples/urldbsample/main.cpp.

static void SetupConnectionData ( DbConnectionData cData  )  [static]

Sets up the given cData to use a local URL database.

Parameters:
[in] cData The DbConnectionData structure to set up
Note:
Assuming the DCA has already downloaded an initial URL database. If not, set useLocalDatabase to false.

Definition at line 178 of file url_samples/urldbsample/main.cpp.

static void PrintLicenseInfo ( const License aLicense  )  [static]

Prints out the information about the provided License.

Parameters:
[in] aLicense The license for which information should be displayed.

Definition at line 189 of file url_samples/urldbsample/main.cpp.

static void PrintDbConnectionInfo ( const DbConnection aDbConnection  )  [static]

Prints out the version and datestamp of the local database.

Parameters:
[in] aDbConnection The database connection for which a version should be displayed.

Definition at line 217 of file url_samples/urldbsample/main.cpp.

static void PrintResults ( const CategoriesInfo catinfos,
const UrlClassificationResults cats 
) [static]

Prints out the classification results and uses the categories info for textual representation of the matched categories.

Parameters:
[in] catinfos The CategoriesInfo class associated with the given URL database
[in] cats The results of a URL classification

Definition at line 234 of file url_samples/urldbsample/main.cpp.

static void PrintUsage ( const char *  name  )  [static]

Prints out the syntax of the sample.

Parameters:
[in] name The name of the executable

Definition at line 273 of file url_samples/urldbsample/main.cpp.

static void LoadUrlFile ( const std::string &  fileName,
std::vector< std::string > &  urlList 
) [static]

Loads given fileName and puts each line to given urlList (by deleting trailing CRLFs).

Parameters:
[in] fileName The file that contains the input URLs
[out] urlList The list to be filled with the URLs found in fileName

Definition at line 285 of file url_samples/urldbsample/main.cpp.

void TestUrlClassification ( const std::string &  aUrlListFile,
const DcaInstance myDca,
const UrlDbClassifier myUrlDbClassifier,
const CategoriesInfo myCategoriesInfo 
)

Performs the URL database classification with all URLs found in a given text file.

The given aUrlListFile contains one URL per line. The URLs will be added to a vector and for each URL a URL database classification is invoked. The results are printed out by using the PrintResults() function.

Parameters:
[in] aUrlListFile The file that contains the input URLs
[in] myDca A valid set up DCA Instance
[in] myUrlDbClassifier A valid set up UrlDbClassifier
[in] myCategoriesInfo A valid set up CategoriesInfo
Note:
The results of a classification returns either "URL is unknown", "not categorized" or a "set of matched categories".
When creating a multi-threaded application you can include something similar to this function into each thread's worker function.

Definition at line 320 of file url_samples/urldbsample/main.cpp.

int main ( int  argc,
char *  argv[] 
)

The main routine.

Parameters:
[in] argc The count of arguments provided
[in] argv An array of provided arguments
Returns:
5 on usage error, 10 on exception and internal error and 0 on success

Definition at line 389 of file url_samples/urldbsample/main.cpp.


Variable Documentation

const std::string S_UsageString
Initial value:
        "<dca-redist-folder> <ticket> <product> <url-list-file>\n"
        "  dca-redist-folder - the folder where the DCA is installed to\n"
        "  ticket - a valid ticket\n"
        "  product - the product associated with your ticket\n"
        "  url-list-file - file that includes the URLs to classify\n\n"

Usage string, displayed if a parameter is missing.

Definition at line 63 of file url_samples/urldbsample/main.cpp.


Generated on 26 Sep 2016 for dca_interface by  doxygen 1.6.1