XFE SDK API Concepts

The main goal of the SCA API is to support very complex functionality by providing an easy to use API.

Overview of packages and modules

The SCA comes with several packages such as
Each module can be licensed separately.
To access any of the functions of a module, the module must first be loaded by using your license (ticket data).
After the module has been loaded you can create a classifier to analyze/classify the related data objects.
For URL Classification, the following steps must be performed:
For HTML Text Classification, the following steps must be performed:
  • Create a dca::TextClassification object (using license data)
  • Create an dca::HtmlTextClassifier object
  • For each dca::HtmlText to classify
    • Create an dca::HtmlText object with your HTML file contents
    • Create a dca::TextClassificationResults object
    • Call the dca::HtmlTextClassifier::classify() method
    • Enumerate the dca::TextClassificationResults
Result-parsing of the Text Classification or URL Classification is very similar.
AnyClassificationResults myResults;
dca::FunctionResult fr = myClassifier.classify(object, myResults);
if (!fr) // error occurred
        return fr.getReturnCode();
if (!myResults.isCategorized())
        return DCA_SUCCESS;
for (DCA_INDEX_TYPE i = 0; i < myResults.size(); ++i) {
        AnyClassificationResult myResult = myResults[i];
        ...
}

Using native C++

All classes of the SCA API are native C++ classes. Internally they use the SCA DLLs / Shared Objects provided in the distribution.
No class-hierarchy has been implemented - there are no base classes of classifiers or modules, because the data types they deal with are quite different. But they all use the same look-and-feel, so once you know how to use one classification class, you will easily be able to use another.

Supporting C++ features

Instances

All instances in the SCA API are implemented as real C++ classes - pointers are not used.
Real pointers necessary for accessing DLLs and Shared Objects are invisible to the user, and are handled by the API internally as private smart pointers.
// myDca is a real C++ instance
dca::DcaInstance myDca = dca::DcaInstance::create(...)

Auto-Destructor-Cleanup

Additionally, the auto-destructor cleanup features of C++ are used. Whenever a SCA API object goes out of scope, all related handles, structures and memory are freed without the need for an explicit delete or cleanup call.
main( int argc, char *argv[] )
{
        if( argc > 1 ) {
                // when going out of scope the myDca instance will be safely destructed and
                // all related resources will be freed
                dca::DcaInstance myDca = dca::DcaInstance::create(...)
        }
}

Operator Overloading

Variable comparison uses the native C++ comparison operator, and assignments are made using the C++ assignment operator of the related classes - just like you would assume when using native C++ classes.
// C++ class assignment
dca::Category myCategory = myCategories.byId( CAT_ID_URL_PORN );

// C++ class comparison
if( myCategory == NullCategory ) ...

Assignment Operator

An assignment between SCA objects is always done by reference.
If object A is assigned to another object B of the same type, A is not a copy B, instead it is a reference to B.
If you change object A, therefore, object B will also be changed, and vice versa.
dca::Url urlA = urlB;   // urlA is now a reference to urlB

Collections and Items

Whenever a collection of items is used (e.g. a ClassificationResults collection) they are handled the same way.
A collection class is named using plural notation -> Results, Categories, Groups etc.
An item of a collection is named using singular notation -> Result, Category, Group etc.
  • To access an item of a collection, the C++ operator [] has been overloaded (alternatively there is an at() function available)
  • All collections support the size() method
  • Some collections support convenience functions, e.g. byId() to lookup a specified item by using it's (numeric) id
Enumeration sample:
const DCA_SIZE_TYPE countOfResults = Results.size();
for( DCA_INDEX_TYPE i = 0; i < countOfResults; ++i ) {
        Result aResult = Results[i];
}

Error Handling

Errors are handled in a C++ fashion, by using either return code classes or exception classes, which are easy to catch using a try...catch block.
Functions that return a dca::FunctionResult do never throw a SCA exception, but all other functions may throw a SCA exception!
dca::FunctionResult fr = myUrlDbClassifier.classify(...);
try {
        dca::DcaInstance myDca;
        myDca.createXYZ(...) // assume this could raise an exception
}
catch( const dca::ExDca& ex ) {
        cout << "error: " << ex.getReturnCode() << " occured" << endl;
}

STL string and container support

STL strings and containers have been used wherever it is possible and useful.
std::string myStlString( "www.ibm.com" );
dca::Url myUrl = dca::Url::create( myStlString );

Multithreading support

As the SCA is an API it does not create or start any thread by itself.
Whenever it is necessary to perform an asynchronous task, the user must create and start a thread which calls the related SCA functions.
For best scalability the user can define thread priorities, the threading model etc.
For a first impression on how this works, take a look at the extended URL sample (samples/urldbsample_extended). This example demonstrates how to create and start up threads to check for SCA updates and download of a URL database, an asynchronous task that uses just two SCA calls (the sample is available in two versions, one for Windows and one for Linux).
Classes used for classification can be used multithreaded, but the data objects to classify (URLs, emails, HTML text etc) should not be shared among threads. These must be created inside the classifying thread itself.

Generated on 26 Sep 2016 for dca_interface by  doxygen 1.6.1