Metadata Extraction and Classification; Ontology or Statistically Derived?

Posted on: July 02, 2015, by: Ann Kelly

Back to News and Blog

Let’s face it, if you’re like most organizations you have more data than you can manage and the amount of data flowing into your systems is increasing at a rate faster than you can handle. You’ve most likely implemented a content management system with the confidence that your problems are solved, another item checked off your list right?

Yet as you begin to organize data, your folder structures become complex. With increased document volumes, accurate and precise manual tagging and classification of your content becomes a burden and is often done in a hurried and imprecise manner (or not at all) which means that not only is it difficult to find content, but the metadata that you depend on for proper information governance is missing.

The next logical step is autoclassification; the process of using technology (rather than to rely on humans) to review and apply metadata with a goal to result in precise, consistent and time saving classification. Yet not all metadata extraction and classification processes are equal, some organizations “teach” or train their software using algorithms and learning and others, like Smartlogic, use a model-driven approach.

When using an algorithm, each time new information is encountered the algorithms need to be modified and the training process repeated. This learning process is time consuming, costly and impacts an organizations ability to make strategic business decisions based on newly acquired information.

At Smartlogic we begin with an ontology. We can start with an existing industry model and import it directly into Semaphore Ontology Manager and based on the sophistication of models in the public domain, there’s a good chance you’ll find one that fits. We publish the model and create rulebases, which drive the automatic classification process. The tools within Semaphore Workbench allow users to review the results of the auto classification process and perform fine tuning to result in precisely tagged information.

So what’s the best way? I guess that depends on your motivation. With Semaphore you begin to reap the benefits quickly with a level of precision others can only achieve after a long period of “teaching and tuning.” Semaphore’s approach lets you immediately incorporate new information into decision making processes to gain and maintain a competitive advantage. Here’s the facts, I’ll let you decide.