Describe your content with precise, complete and consistent metadata using Semaphore Classification Server

With Semaphore you begin with a model that defines what is important to your business; the concepts, topics, resources, vocabulary and the relationships among them. Semaphore leverages the model and creates rulebases which are used to analyze and automatically classify content with sophisticated metadata tagging processes making it available to other applications, such as search engines, content management systems and data stores.

Sophisticated classification strategies

Classification Server uses a number of strategies to precisely describe the information within content to unlock its hidden value. This valuable information is leveraged by organizations worldwide to make key business decisions that manage information, improve analytics and drive innovation.

Semaphore’s classification strategies:

  • Aboutness classification: the process of combining evidence - concepts and relationships found in a taxonomy or ontology – to return relevant metadata tags:
Higher scored evidence Lower scored evidence
Preferred labels Alternative labels
Document titles Document body
Mulit word Single word
Proximate words Distant words
  • Entity and fact extraction: Classification Server identifies important information found within your information assets, which is not part of your model, but improves classification results:
    • Named entities such as, people, places, measurements, dates and URIs.
    • Parts of speech: noun, verb, etc.
    • Zones of text - the sentence, phrase or group of words following a named entity.
    • Relationships between entity types such as, a person and an address or birth date.
    • Relationships between known concepts in your model and entities not in your model.For example, relating the drug name “Aspirin” found within a model and dosage “325mg” a measurement entity.
    • Groups of facts – combining a list of ingredients found in a recipe: “1 cup” (entity), “flour” (modeled concept), “sieved” (modeled concept).
  • Entity classification: the use of named entity rulebases to normalize concepts, for example, a geographic listing which mentions “USA” might also return “United States of America” as an agreed label.

Our sophisticated classification strategies result in precise and consistent metadata tagging of your information assets to drive enterprise information management, guide corporate compliance policies and provide complete information for corporate decision-making.

NLP + Sophisticated semantics = Precise metadata

Semaphore uses sophisticated semantic techniques to drive precise and consistent metadata tagging.

  • Natural Language Processing (NLP) – the use of advanced NLP and identification, analysis and description of the structure of a language's linguistic units, lemmatization, part-of-speech tagging and part-of-speech sequence characterization to precisely tag text.
  • The model is then applied to the output of the NLP process to perform:
    • Named entity extraction – the use of patterns and dictionaries to locate and classify elements such as persons, organizations, locations, expressions of time, monetary units, quantities, etc. found within a block of text.  
    • Topic, subject, thematic classification - combine concept evidence from a taxonomy or ontology to create detailed linguistic processing rules. This means documents are tagged with the topics they are “about” as opposed to themes they “mention."
  • Following the entity extraction and topic classification processes, entities, topics, subjects and themes can be studied to determine how they relate to other elements of the content using:
    • Fact extraction – process text to look for patterns associated by its proximity to a phrase or an entity such as, references, project codes, prices, and credit card numbers.
    • Relationship extraction - Fine-grained entity and fact extraction provides for the description of entities and facts within a document. Fact extraction rules that allow for discovery of entity relationships within documents as well as the ability to correlate relationships into groups to derive enhanced meaning.  

Multi-step classification

Our multi-stage classification process is a combination of complex linguistic and rule-based logic that examines a document and uses rules and sophisticated classification strategies to apply precise and consistent metadata tags. Classification Server:

  • Splits each information asset into articles based on word sequences, formatting and layout
  • Searches the document for evidence – concepts from the model, concept variants, phrases, entities and patterns found within documents of any file format
  • Applies weights to evidence (or combination of evidence) to build an overall score for every concept
  • Adjusts weightings based on concept frequency, concept location within a document (header, body and footer) its proximity to other concepts within the document, and the text format
  • Applies metadata tags, which can be used by search engines, content management systems and data stores when threshold levels are met or exceeded
  • Generates classification results in standard RDF triple format for use in external data stores

Refine results with Semaphore Classification Review and Classification Analysis tools

Semaphore’s Classification Review Tool allows you to examine classification results using sample content to identify inconsistencies, anomalies and opportunities for ontology, rulebase or classification strategy refinement. Using an agile approach changes can be made, content re-classified and results examined to ensure the highest quality metadata is returned.

Our Classification Analysis tool provides you with in-depth classification results for a single document. Information about how concepts, labels and relationships with a document contribute to results and how the combination of concepts and their weight result in an overall document score can be quickly identified. This process allows you to review results and make adjustments to your rules and model to increase the precision of classification output.

Semaphore’s classification tools and strategies help organizations capture and manage the untapped information and insight in their unstructured information and use it to drive successful business outcomes.

Ask a question

Please leave your details and one of our experts will get back to you.

All fields are required.

Stay in the loop

Sign up for our newsletter to receive the latest updates, features and news on Semaphore.