Meta-tagging information feeds
Entity normalization and entity identification maps your vocabulary
In an environment of complex business and especially regulatory requirements, the supporting technology systems must be able to accurately identify business entities throughout their systems and to match those to other internal or external business systems. Semaphore open semantic platform delivers this capability from a central, easy to administer service.
Entity normalization and entity identifier mapping
Semaphore helps organizations achieve a unified view of customer, prospect and third-party information through the process of entity identifier mapping.
Purchased data feeds from organizations like Thompson-Reuters, Factiva, Lexis Nexis and OneSource will provide a comprehensive mark-up of companies, topics, people, stock tickers, etc.
The problem is that they each have their own master vocabulary (for example the company “Saint-Gobain”, “St Gobain”, “Compagnie de Saint-Gobain”, “SGO” “SGO FP”).
Your organization is likely to have its own definition of companies and people from the Accounting or CRM systems. Generating a holistic view requires an open semantic platform that can:
- process huge volumes of incoming information, often being loaded in a small time window
- locate and extract any existing entity identifiers
- use matching logic, maintained by an authoritative central team to add the "corporate" layer of metadata over the external or local source layer.
Smartlogic and our partners can provide master control taxonomies covering 60,000 companies as well as sectors (various schemas), geography and others.
Entity identification from Semaphore
While managed external data feeds will always have some entity metadata, other sources of information might not. Your portal could be populated by crawling external websites, or loading data from department specific databases.
Semaphore can read the text and extract different entities like companies, people, locations, even weights and measures (useful for a property feed to pull out square footage...). A full list of supported entities is available here.
This process can, of course, be used in conjunction with entity normalization.
Subject Identification
Classifying content often involves adding "tags" from several facets or metadata elements. The harder ones to populate involve mimicking the human capability to read a document and asses, for example its subject matter or even sentiment (optimistic, pessimistic, neutral).
Semaphore classification server provides a cost-effective method to define complex linguistic rule-bases that assess and weight the evidence found in a document and return the tags only if a threshold of "about-ness" is passed.
















































