dca_interface  6.3.4
dca::HtmlTextClassifier Class Reference

#include <text_classifier.h>

Detailed Description

HTML text classifier object for text classification.

See also
TextClassification, TextClassificationResult, HtmlText, CategoriesInfo

The following sample shows the text classification workflow:

// assume we have a valid DcaInstance (myDca) and License (myLicense)
// initialize the Text Classification module
dca::TextClassification myTextClassification =
dca::TextClassification::create( myDca, myLicense );
// create an HTML Text Classifier
dca::HtmlTextClassifier myHtmlTextClassifier =
myTextClassification.createHtmlClassifier();
// create an HTML text object
// assume we have the contents of a valid HTML page in std::string
// myHtmlTextContents
dca::HtmlText myHtmlText =
dca::HtmlText::create( myDca, myHtmlTextContents );
// declare the classification results
dca::TextClassificationResults myTextClassificationResults;
// run the classification
myHtmlTextClassifier.classify( myHtmlText, myTextClassificationResults );
// if myResult returns false an error occured
if( !myResult ) {
cout << "Received error from Text Classification (Error code: " <<
myResult.getReturnCode() << ", Description: " <<
myResult.getDescription() << ")." << endl;
return;
}
if( !myTextClassificationResults.isCategorized() ) {
cout << "No categories found for given HTML data." << endl;
return;
}
// we received results and simply want to print them out
const DCA_SIZE_TYPE count = myTextClassificationResults.size();
// iterate through all matched categories
for( DCA_INDEX_TYPE i = 0; i < count; ++i ) {
const dca::TextClassificationResult myTCResult =
myTextClassificationResults[ i ];
cout << "Got result #" << (i+1) << " category id:" <<
myTCResult.id() << ", Score:" <<
myTCResult.score() << endl;
}

Definition at line 33 of file text_classifier.h.

Public Member Functions

FunctionResult classify (const HtmlText &aText, TextClassificationResults &aTextResults) const
 The HTML Text Classification method. The method takes an initialized HtmlText object and returns the results in a TextClassificationResults class. More...
 

Member Function Documentation

◆ classify()

FunctionResult dca::HtmlTextClassifier::classify ( const HtmlText aText,
TextClassificationResults aTextResults 
) const

The HTML Text Classification method. The method takes an initialized HtmlText object and returns the results in a TextClassificationResults class.

An HtmlText object can be classified as one of the following categories:

  • CATEGORY_ID_TEXT_PORNOGRAPHY
  • CATEGORY_ID_TEXT_WAREZ
  • CATEGORY_ID_TEXT_GAMBLING
  • CATEGORY_ID_TEXT_ANONYMOUS_PROXIES
  • CATEGORY_ID_TEXT_ILLEGAL_DRUGS
  • CATEGORY_ID_TEXT_WEAPONS
Parameters
[in]aTextThe HTML text object to be classified
[out]aTextResultsThe results of the text classification
Returns
DCA_SUCCESS or one of the following error codes
See also
TextClassification, Text, TextClassificationResult

The documentation for this class was generated from the following file:
Single result of a text classification.
HtmlTextClassifier createHtmlClassifier() const
Creates a HtmlTextClassifier that is used to classify HtmlText objects.
DCA_RESULT_TYPE getReturnCode() const
Returns the last error code (if any).
double score() const
Returns the score of the classification (if any), range is from 0.0 to 1.0.
bool isCategorized() const
Returns whether there are any results for the text classification.
DCA_SIZE_TYPE size() const
Returns the number of results in the container.
static HtmlText create(const DcaInstance &aDcaInstance, const std::string &htmlContent)
Creates an HTML text object, used as an input parameter for text classification.
HTML text classifier object for text classification.
FunctionResult classify(const HtmlText &aText, TextClassificationResults &aTextResults) const
The HTML Text Classification method. The method takes an initialized HtmlText object and returns the ...
DCA_CATEGORY_ID_TYPE id() const
Returns the category id of the classification (if any).
size_t DCA_INDEX_TYPE
Type for index access (used for arrays and collections).
Definition: base_types.h:66
std::string getDescription() const
Returns the description for the error or warning.
Encapsulates an HTML text object.
Definition: base_htmltext.h:24
Overall results of a text classification.
size_t DCA_SIZE_TYPE
Type for size (used for size of array and collections).
Definition: base_types.h:72
static TextClassification create(const DcaInstance &aDcaInstance, const License &aLicense)
Initializes the TextClassification module.
Standard function result.
Definition: base_classes.h:148
The HTML Text Classification module class.