The main goal of the SCA API is to support very complex functionality by providing an easy to use API.
Overview of packages and modules
- The SCA comes with several packages such as
- Each module can be licensed separately.
- To access any of the functions of a module, the module must first be loaded by using your license (ticket data).
- After the module has been loaded you can create a classifier to analyze/classify the related data objects.
- For URL Classification, the following steps must be performed:
-
- For HTML Text Classification, the following steps must be performed:
-
- Result-parsing of the Text Classification or URL Classification is very similar.
AnyClassificationResults myResults;
if (!fr)
if (!myResults.isCategorized())
AnyClassificationResult myResult = myResults[i];
...
}
Using native C++
- All classes of the SCA API are native C++ classes. Internally they use the SCA DLLs / Shared Objects provided in the distribution.
- No class-hierarchy has been implemented - there are no base classes of classifiers or modules, because the data types they deal with are quite different. But they all use the same look-and-feel, so once you know how to use one classification class, you will easily be able to use another.
Supporting C++ features
Instances
- All instances in the SCA API are implemented as real C++ classes - pointers are not used.
- Real pointers necessary for accessing DLLs and Shared Objects are invisible to the user, and are handled by the API internally as private smart pointers.
Auto-Destructor-Cleanup
- Additionally, the auto-destructor cleanup features of C++ are used. Whenever a SCA API object goes out of scope, all related handles, structures and memory are freed without the need for an explicit delete or cleanup call.
main(
int argc,
char *argv[] )
{
if( argc > 1 ) {
}
}
Operator Overloading
- Variable comparison uses the native C++ comparison operator, and assignments are made using the C++ assignment operator of the related classes - just like you would assume when using native C++ classes.
Assignment Operator
- An assignment between SCA objects is always done by reference.
- If object A is assigned to another object B of the same type, A is not a copy B, instead it is a reference to B.
- If you change object A, therefore, object B will also be changed, and vice versa.
Collections and Items
- Whenever a collection of items is used (e.g. a ClassificationResults collection) they are handled the same way.
- A collection class is named using plural notation -> Results, Categories, Groups etc.
- An item of a collection is named using singular notation -> Result, Category, Group etc.
-
To access an item of a collection, the C++ operator [] has been overloaded (alternatively there is an at() function available)
-
All collections support the size() method
-
Some collections support convenience functions, e.g. byId() to lookup a specified item by using it's (numeric) id
- Enumeration sample:
Result aResult = Results[i];
}
Error Handling
- Errors are handled in a C++ fashion, by using either return code classes or exception classes, which are easy to catch using a try...catch block.
- Functions that return a dca::FunctionResult do never throw a SCA exception, but all other functions may throw a SCA exception!
-
try {
myDca.createXYZ(...)
}
}
STL string and container support
- STL strings and containers have been used wherever it is possible and useful.
std::string myStlString( "www.ibm.com" );
Multithreading support
- As the SCA is an API it does not create or start any thread by itself.
- Whenever it is necessary to perform an asynchronous task, the user must create and start a thread which calls the related SCA functions.
- For best scalability the user can define thread priorities, the threading model etc.
- For a first impression on how this works, take a look at the extended URL sample (samples/urldbsample_extended). This example demonstrates how to create and start up threads to check for SCA updates and download of a URL database, an asynchronous task that uses just two SCA calls (the sample is available in two versions, one for Windows and one for Linux).
- Classes used for classification can be used multithreaded, but the data objects to classify (URLs, emails, HTML text etc) should not be shared among threads. These must be created inside the classifying thread itself.