Uncover Actionable Intelligence from Unstructured Content with Text Analytics

Why are text analytics important?

Organizations worldwide are inundated with unstructured information; emails, online reviews, contracts, call center support notes, engineering diagrams, product documentation as well as other content, all which contain valuable insight into customer wants and needs but only if you can unlock its true meaning. Text analytics extracts meaning from unstructured information and turns it into business intelligence so you can gain insight, uncover patterns and drive innovation.

Semaphore Advanced Language Packs

Semaphore Advanced Language Packs help you extract the vocabulary and relationships from your classification model so content can be put in the context of the business. Using text analytics and natural language processing strategies such as stemming, tokenization, lemmatization and part of speech tagging, you can identify the sentiment and context within unstructured information and use it to improve your organization.

The process begins with Semaphore Ontology Editor where you create a model that reflects the topics, concepts and unique characteristics of the organization. Semaphore Rulebase Generator creates rulebases directly from the model and Classification Server uses them to perform precise, complete and consistent metadata tagging.

Semaphore Advanced Language Packs manage tokenization, lemmatization and part of speech tagging processes for more than 20 languages. Processing results are deployed to Semaphore Classification Server where part-of-speech tagging results can be used to identify and apply additional metadata tags such as:

  • Noun phrases – identifies candidate noun and adjective phrases used by Text Miner to further enrich a taxonomy or ontology
  • Suggested entities – using an algorithmic and dictionary approach, already identified noun phrases are typed as likely company Names, People Names, Place Names, etc.
  • Facts – using a capture rule in association with rule and entity matching, Semaphore can extract social security numbers, project reference codes, or protective markers, etc.

Advanced Language Packs increase the granularity of part-of-speech tagging, which identifies a word’s grammatical category (i.e. noun, verb). This information is used in conjunction with rule logic to perform complex matching. With Semaphore, content can be analyzed in bulk, Semaphore can then generate RDF triples and use graph-based technology to visualize results and drive information discovery.  

Advance Language Pack strategies

Advanced Language Packs use a number of sophisticated linguistic strategies to analyze unstructured information and identify sentiment, context and meaning:

  • Language identification – automatic identification of the language found within the text (French, English or German) as well as the text format i.e. plaintext or html.
  • Document Analyzer – parses documents and identifies paragraphs and sentences.
  • Case Normalization – identifies case-normalized alternatives for words within your content based on document position such as, within a sentence or in a title.
  • Word Segmentation – performs basic tokenization; breaks text into syntactic units (tokens). Identifies abbreviations and multi-word tokens (i.e. out-of-the-box) so they can be processed as single words.
  • Stemmer - identifies the base form (stem) for each token found within the text. For example, the words speaks and speaking have a stem of speak.
  • Part-of-Speech tagging – identifies and labels the part of speech (i.e. noun, verb) as well as sub-class attributes – singular or plural for nouns and present or past tense for verbs - for each word in context using the surrounding context.  
  • Tagged Stemming – provides complete linguistic analysis of input text, including stemming with respect to part-of-speech information. This operation segments text into words and punctuation, performs document analysis, case normalization, and part-of-speech tagging.
  • Phrase Grouping - identifies sequences of tokens that function as a single syntactic unit in text. Given sequences of words labeled with part-of-speech tags, the phrase grouping uses grammar rules defined in the language-specific modules to form phrases.

Semaphore Advanced Language Packs are available in multiple languages such as, Arabic, Bokmål, Catalan, Croatian, and Chinese (Simplified and Traditional), Czech, Danish, Dutch, English, Finnish, French, German, Italian, Japanese, Korean, Nynorsk, Polish, Portuguese, Russian, Serbian, Slovak, Slovenian and Spanish languages.

Improve classification outcomes and tame Big Data with Advanced Language Packs

Semaphore Advanced Language Packs provide organizations with sophisticated text analytics strategies and techniques to transform unstructured information into actionable data. With Semaphore, Big Data volumes can be quickly analyzed, information tagged and the results used for real-time decision support. Organizations who unlock Big Data knowledge can extract meaning, identify patterns, understand customer sentiment and examine results to improve customer service and drive innovation.

Ask a question

Please leave your details and one of our experts will get back to you.

All fields are required.

Stay in the loop

Sign up for our newsletter to receive the latest updates, features and news on Semaphore.