dca_interface
6.3.4
|
HTML Text Classification returns the categories for given valid HTML page content.
To use the HTML classification functions, the Text Classification Package must first be initialized. To do this, create an instance of the dca::TextClassification module using dca::TextClassification::create().
Once the Text Classification module has been initialized, a dca::HtmlTextClassifier object must be created. Use dca::TextClassification::createHtmlClassifier() to create an HTML Text Classifier.
The HtmlTextClassifier class classifies dca::HtmlText objects. To create an HtmlText object from an HTML text string, use dca::HtmlText::create(). Note that the HTML text must represent a complete HTML web site - the analysis of partial HTML text is not supported.
To classify an HtmlText object, use dca::HtmlTextClassifier::classify(). This function analyzes the HtmlText object and returns the results of a classification in a dca::TextClassificationResults object. This object is a container for individual dca::TextClassificationResult results (one result per matched category), and can be iterated over to obtain information on each category matched.
The following code demonstrates the classification of some HTML text.