How is information management evolving

Posted on: May 29, 2015, by: Jeremy Bentley

Back to News and Blog

To answer that we need to examine how information technology has changed from the days before computing when information was housed in paper files or in people’s heads, to how it’s managed today.

When information was first transposed from paper, the available technology was restrictive in the amount and type of information that could be stored. Using and making sense of the technology required the specialized skills of programmers who managed and explained the internal workings of this structured data to the rest of the organization. For 30 years these newly formed IT departments used rigid hierarchical batch systems to manage the small subset of information that was stored in their databases. These systems were brittle in the sense that making a change to them took time, money and required specialized skills.

In the 1980’s a fundamental shift took place, the older batch and brittle systems gave way to a new model of real-time flexible systems motivated by the structured information model’s change from “hierarchical” to “relational.” This change was not driven out of performance but from the self-describing nature of the relational model (albeit within the constraints of a SQL schema.) Self-description allowed business users to query data without needing programmer’s skills, systems were able to connect to other systems and provide straight through processing and business users became good at analysing the self-describing information using business intelligence tools. The profound effect of self-description led to the $40 billion per year market we now call the “data processing” market.

So to the present day, where we find that structured data systems still contain only a subset of the information universe found in every organization – 20% by most leading analysts accounts. What’s happened to the 80% originally paper based information that did not become structured information?

To start with it’s called “unstructured information” and initially it remained in filing cabinets. Over time it migrated into image management systems, shared folders, and finally content management systems. IT departments indexed the unstructured information and created search engines for retrieval. Unfortunately, the unstructured information still resides in a batch and brittle world where change requires specialized technical skills, the information is housed in silos and cannot be processed by other systems and being content–bound, the information is hard to unlock and analyse in the way that a structured data resource allows.

So the simple answer to the question “How is information management developing?” is to compare what we do today with the good definition that Gartner provides and expect business demand to close the gap. Ideally we see that information needs to be unconstrained by organizational and technological boundaries and yet today it is still separated as structured or unstructured, internal or external – hardly a unified information set. Business is increasingly demanding a complete view of information that is trustworthy, and easy to process and analyse in an unconstrained way.

In order to take this step and enable unified information, the unstructured information needs to break out of its batch and brittle environment and become “self-describing.” This requires a technology that solves a different set of challenges than those associated with structured data. Yet this is what needs to happen if the whole information set is to become – process-able, analyse-able and addressable as a whole

This is the job of Content Intelligence systems such as Smartlogic’s Semaphore.

Content Intelligence is the technology and approach that makes unstructured information self-describing. Such self-description means that content-based information can be described in the same way as structured information. Once in this form, the structured and unstructured can be unified.

This unification is a new type of information management model called Graph or triple stores, representing yet another once in every 30-year change in the data organizational model. The impetus for this change is powered by:
Using this new unified model, organizations can leverage all of the information available to them; structured and unstructured, internal and external to determine relationships, explore opportunities and answer questions they could not answer before.

  • The performance constraints of existing systems to handle Big Data (even though Big Data only addresses the structured 20%).
  • The need for a flexible data model that can handle structured, semi-structured and unstructured data.
  • An end to the restrictions imposed by having to pre-suppose questions – a feature of schema based data models.
  • The ability to include the 80% of information that is currently not part of the structured data set.
  • The ability to link to information that already exists in internal and external sources so that available information is not duplicated.