Classification Server

Rule-based classification and natural language processing (NLP)

Eliminate the costs, time and hassles of manual classification and let Semaphore Classification Server analyze and auto-classify your content. Classification Server combines precise rulebases and natural language processing to give users the best of both worlds – precise, complete and consistent classification results with the ease of automation.

Classification Server uses multiple strategies to precisely describe the information within your content to unlock its hidden value:

  • Aboutness classification: the process of combining evidence - concepts and relationships found within a model – to return relevant metadata tags.
  • Entity and fact extraction: Classification Server identifies important information found within your content, which is not part of your model, but improves classification results:
    • Named entities such as, people, places, measurements, dates and URIs.
    • Parts of speech: noun, verb, etc.
    • Zones of text - the sentence, phrase or group of words following a named entity.
    • Relationships between entity types such as, a person and an address or birth date.
    • Relationships between concepts within your model and entities not in your model for example, relating the drug name “Aspirin” found within a model and dosage “325mg” a measurement entity.
    • Groups of facts – combining a list of ingredients found in a recipe: “1 cup” (entity), “flour” (modeled concept), “sieved” (modeled concept).

  • Entity classification: the use of named entity rulebases to normalize concepts, for example, a geographic listing which mentions “USA” might also return “United States of America” as an agreed label.

These sophisticated classification strategies help global organizations leverage the valuable information within their organization to make key business decisions that manage information, improve analytics and drive innovation

NLP and semantic techniques

Semaphore Classification Server uses NLP and sophisticated semantic techniques to drive precise and consistent metadata tagging. Classification Server applies these techniques in a precise manner to ensure the highest quality results.

  • Natural Language Processing (NLP) – the use of advanced NLP and identification, analysis and description of the structure of a language’s linguistic units to precisely tag text.
  • The model is then applied to the output of the NLP process and performs named entity extraction and topic, subject and thematic classification.
  • Following the entity extraction and thematic classification processes, entities, topics, subjects and themes are examined to determine how they relate to other elements of the content using:
  • Fact extraction – process text to look for patterns associated by its proximity to a phrase or an entity such as, references, project codes, prices, and credit card numbers.
  • Relationship extraction - fine-grained entity and fact extraction rules that allow for discovery of entity relationships within documents as well as the ability to correlate relationships into groups to derive enhanced meaning.

Semaphore’s classification process is a combination of linguistic and rule-based logic that examines information assets and uses rules and sophisticated classification strategies to apply precise and consistent metadata tags to improve search and retrieval in content management, search engine, database and workflow systems.

Semaphore Classification Review and Classification Analysis tools refine results

Semaphore’s Classification Review Tool allows you to examine classification results using sample content to identify inconsistencies, anomalies and opportunities for ontology, rulebase or classification strategy refinement. Using an agile approach changes can be made, content re-classified and results examined to ensure the highest quality metadata is returned.

Classification Analysis tool provides you with in-depth classification results for a single document. Information about how concepts, labels and relationships with a document contribute to results and how the combination of concepts and their weight result in an overall document score can be quickly identified. This process allows you to review results and make adjustments to your rules and model to increase the precision of classification output.