Posted on: October 04, 2016, by: Ann Kelly
Recently I’ve been trying to explain to some very bright people who have significant knowledge about “data and analytics”, the difference between “relational thinking” and “semantic thinking”. These folks have a deep background in relational data, but very little knowledge of semantics. And like many people, they have sort of conflated “semantic technology” with “Semantic Web” and have mentally consigned the topic to the category of overwrought, futuristic and unrealistic musings.
I’ve been talking to these folks about why “semantic technology” and “semantic web” are not the same thing and how our clients are using semantic technology to solve real business problems today – but that required me to explain to them why semantics requires you to think about information differently than we are used to.
I think this is important because I think we are at an inflection point much like the one we were at when we began to adopt relational data thinking in place of hierarchical data thinking. Up till the move to relational thinking, we had gotten some value out of information as we were able to automate transactions. Relational thinking opened up a huge amount of value – all of a sudden we had the ability not only to automate transactions but to do efficient reporting and actually analyze data for business intelligence. And as we added more data to the mix, moving away from siloes and taking an enterprise approach, more and more value was created.
We would have expected that with the advent of big data, the slope of the value curve would continue to rise. But it didn’t. We added lots more data, in all kinds of formats from all kinds of sources, but we didn’t get as much value as we were expecting. My contention is that the reason for this is that we need to stop thinking about data relationally and start thinking about it semantically.
This is not an easy thing to do. Most of you (or at least the ones as old as I am) will remember the applications of the 1980s that had one table with 3000 columns. Those were built by people who had not made the shift from hierarchical thinking to relational thinking. Semantic thinking is the next shift. And we won’t need to do it for everything; relational systems are good enough to manage the transactions and the data warehouses are efficient at managing the reporting - but in order to realize the new wave of value that we are expecting, some of the data needs to be thought about differently.
Anyway, that’s why I went down this path and here’s what I came up with:
First, relational thinking is about entities and attributes. You have an entity, like “Order”, and it has attributes, such as “customer”, “product” and so on. You have another entity “Customer” and that entity has attributes such as “address”, “contact number”, etc. Now you want to establish some kind of a relationship between the attribute “customer” of Order and the entity “Customer” and you can do that – but only in a very restricted way, There’s only one kind of relationship, and it’s basically “is equal to”.
When you think about information semantically, you stop thinking about “entities” and you start thinking about the classes of concepts that exist in the problem domain and about the relationships between them. Now you have a concept class “Transaction” (because there are lots of different transactions in the problem domain) and a subclass of “Order”. You have another concept class, which is “Counterparty”, and that has subclasses such as “Customer”, “Supplier” and “Partner”. Each one of these subclasses of concepts inherits properties from the concept class, but there are differences between them. There may be other concepts “Location” and “Contact Details”.
The job of a semantic modeler is to define concepts and the relationships between them. Because the number and type of relationships are almost infinite, your model is extremely flexible for example:
Now you have a flexible way of modeling the problem domain, which is actually closer to the real world.
And then there’s vocabulary: Different relational data structures may refer to the same concept in different ways; in one system, the value of an order might be represented as “amount”, in another, it might be “total”. Semantic models allow you to establish equivalence between different terms that are used to refer to the same concept, and it’s a damn good thing because no one has yet managed to get every person and system in the enterprise to use the same words when describing the same thing. That is not a doable thing, and this is the reason why we are still sucking wind with MDM and canonical data models (god bless their little cotton socks).
A semantic model allows you to harmonize all the different variants of language within the enterprise without forcing people and systems to use different language. We see this a lot in the Pharma industry, where the R&D refers to something by the compound ID, the clinical trial group refers to it by the chemical formulation and the regulatory group, who are making the submission, refer to it by the brand name. They all want to know about the same thing, but they use different language.
But perhaps the most important thing is this: relational thinking contains a closed world assumption. You’re not modeling the problem space, you are structuring the data (before you ever see it) in order to answer a specific question. You have to know everything there is to know about the question before you even engage with the data so that you can get it into a format that will provide the answer. And if a piece of data comes in that does not conform exactly to the model, the system either ignores it or pees on the floor (depending on how badly you get bored by error routines). Nothing can be considered but what is already included in the structure. Data must be transformed by a time consuming and expensive process into the appropriate structure. And because the model is optimized to answer a specific question, answering any new or different questions requires either a new structure or adjustments to the existing structure.
Semantic thinking, on the other hand, contains an open-world assumption. It assumes that the data informs the model and the model informs the data, and the model is never “complete”. The concepts and relationships in the problem domain are modeled and then the data is examined in the context of the model – if there is data that does not exactly conform to the model, you can examine it and infer (depending on how you are doing it) whether or not this new bit of data is a new subclass of an existing class because it has many, but not all, of the characteristics of other concepts in the class hierarchy. You can add new concepts to the model quite easily and expand the scope of questions you can answer and you don’t have to restructure the data in order to do that. Because you are traversing the data in the context of the model, you can identify all the relevant bits at query time and assemble them into an answer, irrespective of where they came from, what they are called or what format they are in.
Semantic thinking is not going to replace relational thinking in many parts of the organization and it shouldn’t. Relational thinking has done a good job for us in many areas and will continue to do so. But in those places where new questions have to be answered, new insights are required and new types of information need to be exploited, semantic thinking will deliver the value that enterprises need today.
US: +1 408-213-9500
US Federal: +1 703-956-2600
UK: +44 203-176-4500
Copyright ©2022 MarkLogic Corporation