Auto-classification

Classification is the process of describing a piece of information (a contract, a proposal, a policy), what it’s about (subjects, topics, themes), and how it should be managed (distributed, secured, archived) by applying one or more metadata tags. Metadata can be applied manually by humans or automatically using technology. Where manual classification relies on a human to read and classify information by selecting the right metadata tags, automatic classification uses statistical methods or model-driven rules to precisely, consistently and efficiently apply the metadata.

Model-driven auto-classification

Semaphore eliminates the costs, time and hassles of manual classification by combining rulebases and natural language processing to give you the best of both worlds – precise, complete and consistent classification results with the ease of automation.

Semaphore derives its rules from a taxonomy or ontology. You can start with an internal model, an existing industry vocabulary, an out-of-the-box model from Smartlogic or mine sample content with Text Miner and import it into Semaphore Ontology Editor. Rules are published and create rulebases; a series of templates, which can be modified to achieve specific classification outcomes. Model-driven classification lets you:

  • Quickly examine the behavior of a rule so problems can be immediately addressed. This transparent process is key to precise and consistent results.
  • Use rule-based classifiers across document collections – Semaphore rulebases are not dependent on the information in a specific document collection – rulebases can be applied to new collections without modifications or lengthy processes.
  • Simplify rule generation and publication – Semaphore rulebases are generated from the model and can be quickly customized to reflect your content. Rule publication has never been simpler; rules are published with a single click of a button.
  • Provide greater flexibility than statistical classification results. Algorithms and learning methods require re-training when new information is encountered. This learning process is time consuming, costly and impacts an organizations ability to make strategic business decisions based on newly acquired information.

Semaphore puts your model to work within the enterprise and lets you immediately leverage the benefits and cost savings of auto classification to improve results.

Classification Server provides superior results

Semaphore’s Classification Server uses multiple classification strategies to precisely describe the information within your content:

  • Aboutness classification: the process of combining evidence - concepts and relationships found within a model – to return relevant metadata tags.
    • Rules are created to match evidence terms and entities to make decisions on how content should be categorized and tagged with the correct metadata.
    • Terms are weighted depending on how discriminatory they are for a topic and classification scores are adjusted according to the frequency of the term, where the term is located and its context.
  • Entity and fact extraction: Classification Server identifies important information found within your content, which is not part of your model, but improves classification results:
    • Named entities such as, people, places, measurements, dates and URIs.
    • Parts of speech: noun, verb, etc.
    • Zones of text - the sentence, phrase or group of words following a named entity.
    • Relationships between entity types such as, a person and an address or birth date.
    • Relationships between concepts within your model and entities not in your model for example, relating the drug name “Aspirin” found within a model and dosage “325mg” a measurement entity.
    • Groups of facts – combining a list of ingredients found in a recipe: “1 cup” (entity), “flour” (modeled concept), “sieved” (modeled concept).
  • Entity classification: the use of named entity rulebases to normalize concepts, for example, a geographic listing which mentions “USA” might also return “United States of America” as an agreed label.

Enterprise-wide classification needs

In today’s enterprise complete information is key to good decision making and successful outcomes. Information is spread throughout the organization in disparate silos and the vocabulary is specific to department, group and use case. The speed at which new information arrives is at a pace humans simply cannot manage; auto classification is an imperative.

The ability to understand enterprise information – control it, enable users to access, find, reuse and repurpose it – improves organizational efficiency and drives change. Semaphore is an enterprise grade solution that integrates with existing IT infrastructure so that organizations can take control of all information through the consistent application of metadata.

Improving search and retrieval and driving information governance

Download our whitepaper to learn how Smartlogic’s Semaphore provides automatic and assisted, rule-based, information classification to improve search and retrieval and drive information governance.  

Download The ABC's of of Content Classification

Ask a question

Please leave your details and one of our experts will get back to you.

All fields are required.

Stay in the loop

Sign up for our newsletter to receive the latest updates, features and news on Semaphore.