Content Classification Routines

Different vendors provide different content classification routines, including the following.

  • Keywords. A set of related search keywords linked to a preferred term.  Often a mix of query expansion and regular expressions provide the evidence for a term.
  • Natural language processing rules. [Smartlogic capability].  A rule logic language with grammatical, syntactical, natural language, proximity / positional and boolean operators allowing the complex combination of words and phrases.
  • Statistical.  Statistical (often Bayesian, but not necessarily) analysis of word frequencies and proximities based on a sample set.
  • Entity.  Algorithms to identify types of entity and match against reference dictionary, sometimes coupled with syntactical analysis [Smartlogic capability]