Posted on: May 23, 2015, by: Ann Kelly
Basic search works by examining the text in a group of documents and building an index of their contents. Then a user comes along and queries that index using a particular word or phrase. Sounds as though it’ll do the job, doesn’t it? But what if there are millions of documents that contain the same words, how do you know which document(s) have the information you’re looking for and of those, which are the most relevant? That’s where things get a bit tricky.
Back in the day, we had internet search engines like AltaVista that were purely about indexing, and finding the right information was painful. Then one day along came Google with a new approach – knowing the web is made up of linked documents, they assumed the documents with the most incoming links were the most relevant (since other documents were citing them as a source), and so PageRank and Google took over the search world.
What’s that got to do with Enterprise Search? Google defines user expectations when it comes to search. Every search returns a long list of results in milliseconds and Google’s not afraid to tell us how quickly it serviced the request. Yet when it comes to Enterprise Search those expectations can’t be met. Why? PageRank, the key to Google’s relevance magic, doesn’t work for enterprise content because the content isn’t made up of web pages. More often than not enterprise content is made up of PDF files, Word Documents, and spreadsheets and there aren’t links in these documents so we can’t assume anything about relevance.
So we’re back to plain old indexing. Let’s say you want to search for information about one of your products or customers. With plain old indexing – it’ll turn up everything that mentions them, worse than that it’ll turn up every version of everything leaving you to wade through so much data that you’ll most likely give up or worse, make a business decision based on inaccurate or incomplete information. Not a nice story!
Installing Semaphore adds some useful new capabilities to search, but to really make the most of what’s available customers need to understand the limitations of enterprise search and adjust their expectations. Semaphore can provide a solution to this problem but only once the problem has been clearly identified. They then need to look at what Semaphore can do to help overcome these limitations.
The Semaphore platform can help an organization close the enterprise search gap. Where basic search relies on the user to manually add accurate metadata and isn’t able to derive the context in which a word or phrase is used, using Semaphore Ontology Manager to create an ontology allows you to define the concepts relevant to your organization and the relationships between concepts.
From the ontology, rules are published and used by Semaphore Classification Server to automatically classify the documents; the result, precise and accurate meta-data tagged content which can then be leveraged by an enterprise search engine to enhance search results.
Now users can ask more specific questions and SharePoint administrators can better tailor the search experience to suit particular audiences. Let’s be honest, at the end of the day, when it comes to enterprise search, there’s no getting away from plain old indexing, but what we can do is make sure that our index contains the aspects of our content that really matter to our users. The Semaphore platform makes that happen by applying intelligence to your content.
US: +1 408-213-9500
US Federal: +1 703-956-2600
UK: +44 203-176-4500
Copyright ©2022 MarkLogic Corporation