The language of data

on: June 27, 2016, by: Ann Kelly

With a broad range of experience in a variety of academic and professional venues, I’ve come to the conclusion that a large part of success in any discipline is learning its particular language. I first saw this about 5 years ago when I began to investigate how Sentara could improve their SharePoint search on our Intranet to take advantage of all the valuable content it contained. And thus began my journey of learning about semantics, taxonomies, ontologies and content enrichment.

As this looked like a promising direction to solve the problem, I began the process to secure the funding, technology and resources. While it was relatively easy to secure those components, introducing an unfamiliar technology into the enterprise was difficult and challenging. The focus quickly shifted from funding to absorbing a number of terminologies associated with Human Resources, insurance, wellness programs, and payroll. This was not just from the esoteric terms found in each department, but also how our more than 30,000 employees perceive and search for information.

I soon realized that what separated me from my technology colleagues was a penchant to study five foreign languages (other than English) as well as medical and scientific terminology. It also occurred to me that much of the Big Data people talk about is mostly about “Big Words.” Even structured data originated from a human interaction that was reduced to numbers and cryptic metadata. This led me to an epiphany; data has more to do with language than technology.

Our initial Intranet experiment was successful (read Smartlogic’s blog post entitled: Sentara Health drives employee self-service portal and report search with Smartlogic Semaphore) and we’ve moved to the next phase; a robust Solr-Lucene search platform integrated with Semaphore Ontology Editor. Based on the success of our original work, we have been asked to expand our search to include report templates, pharmacy newsletters and supply-chain databases. All of these present the challenge of learning new industry terminologies but are similar in that they need to categorize and enrich the information in order to enhance the meaning.

This brings me to a new dilemma, where does this discipline fit within a traditional organization? It requires IT and data support but approaches information from a totally different perspective. Last year my department received approval to hire an Ontologist. Since this position was not in our Human Resource classification system it had to be submitted for review. Piecing together aspects from other industry job postings I created a comprehensive position description that our compensation department had little external supporting information to grade as the industry does not support consistent titles, salaries, and job requirements.

Another concern is how do we describe what we do: semantic search, cognitive linguistics and NLP or knowledge management? These are critical elements for defining your role in the organization, developing a budget, hiring staff and gaining credibility as an integral part of the organization. This becomes more blurred as the role of machine learning, cognitive computing and IBM’s Watson compete for market acceptance.

Over time I have read numerous articles, spoken with highly respected people, and witnessed first- hand the need to make sense of the information that we have spent millions of dollars turning into some type of digital format. The over-riding condition is that it needs to be understandable to a human as words, graphics or speech. True, some machines will work autonomously but the ingested data still requires relevant meaning.

This is not a new experience for me. In the early 1990s I was a lone voice in the organization for many years as to the future potential of the Internet. Only after significant demonstrations of the value, cost reduction and efficiency did it begin to become an established as a business critical function. The world is moving very quickly but it still takes time for new ideas to move from the periphery into the core. As universities adopt curriculums to include new digital library skills, technology platforms become more integrated and the industry develops a consistent identity I believe it will follow a main-stream approach similar to the growth of the web.

I am glad we started early and made our mistakes quickly so that we are better prepared to deliver value to customers. We are learning daily and are engaging with people doing amazing work using various semantic technologies to extract meaning from data. It is encouraging that my team is able to intellectually discuss the technical and cognitive components with the most sophisticated practitioners knowing that we still have a very long way to go on this journey.