Text Analytics

Text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for the purposes of driving business intelligence, data analysis, and research.

At its core, text analytics is breaking a stream of text into meaningful words or phrases, but meaningful is a relative term – how do you decide or discover just what information is important or meaningful? What’s more, how do you do this over hundreds of thousands of documents?

Semaphore helps you define what is meaningful to your organization so you can analyze your content for important, valuable insights. It uses a rulebase system to auto-classify content, and the output of that classification is machine-readable metadata describing the core concepts in the text. Semaphore also extracts facts from text, and outputs metadata describing both concepts and facts.

The use of text analytics is particularly difficult when language is so fluid – words may have many different meanings, and there are many ways to describe the same thing. Semaphore accounts for variety in language with a classification model, also known as a taxonomy or ontology. Semaphore generates rules directly from the model; consequently, its classification truly represents what is meaningful to your organization.

Semaphore’s text analytics is highly accurate and scales to very large sets of documents – letting you focus on discovering new value in text.