Adding Semantic capability means adding metadata

The Linked data initiative suggests "Metadata is used to increase its utility for data consumers it assess the quality of published data.”

Using the RDFa standards this metadata can be embedded into content.  But again, the metadata has to be generated by some process, manual or automated.

Semaphore Classification Server does just that.   It can process a piece of existing text and return a number of useful things:

  • Entity Extraction - The system can automatically identify up to 30 different entity types (towns, addresses, peoples names, organization names, measures, etc.) within the content and extract it so clear defined metadata is made available.
  • Ontology Classification – The system generates a classification rule-base set directly from the ontology.  There is a complex natural language processing system to split the content into tokens (phrases, sentences, words, titles, etc.) which is matched against our weighted rule logic (if term is in title it has more weight than in the body).  This means that the mention of a word in the content is not necessarily enough to return the ontology “tag” – enough evidence has to be found in what we call “about-ness” tagging.