DEV Community

Cover image for Classification
Paula Gearon
Paula Gearon

Posted on • Edited on

Classification

My previous post showed how to use a reasoner like Pellet to make some interesting inferences about stated data. In that case, it demonstrated how a compound class could be determined with insufficient data to identify which of its components was satisfied.

Interesting though it was, this was a contrived question made against data that was specifically designed for it, and required a complex reasoner that is limited in its ability to scale. Less contrived scenarios are more interesting, precisely because they can be applied to real-world data at scale.

This leads to the question of what sorts of inferences are valid to make in OWL?

Specification

The OWL 2 specification describes the vocabulary and how this maps into Description Logic. Most of it is rather obtuse due to the need to provide a precise mathematical definition.

Fortunately, a number of the concepts are relatively straightforward to follow. Indeed, many of the concepts can be codified into rules to be applied mechanically.

Entailment

The basic structure of most rules is a description of conditions that must be met by existing data, and new data that may be asserted. Mathematical logic describes this using the Horn Clause, and this has become the standard tool for logic programming systems as well.

As an example, we can make a statement saying that all people are mortal, so if an individual is a person, they must be mortal:

Mortal(X) :- Person(X).
Enter fullscreen mode Exit fullscreen mode

Here, I have described Person and Mortal as classes, and X as an individual.

To read a Horn clause, the outcome is described first, given the conditions that come after the :- symbol. So the above says, "X is Mortal, if X is a Person".

The outcome of the clause is referred to as the Head of the clause, while the conditions are referred to as the Body.

Multiple necessary conditions may be described in the body by separating them with commas. For instance, if a person has a child, then they are a parent:

Parent(X) :- Person(X), hasChild(X,Y).
Enter fullscreen mode Exit fullscreen mode

Note that the body now includes an entity Y who is not referred to in the head.

A more mathematical approach for this statement might be:

∀x x∊Person: ∃y (x,y)∊hasChild
Enter fullscreen mode Exit fullscreen mode

i.e. For all x where x is a person, there exists a y where relationship of x,y is a member of the role-relationship hasChild. Or, less strictly, for x where x is a person, there exists a y where x.hasChild.y

To be honest, a lot of these notations are merely very precise and tedious way to express concepts that don't seem too difficult. But the precision allow mathematicians to eventually getting around to saying more interesting things, while proving that they got it right.

RDF and SPARQL

Data

The Resource Description Framework (RDF) and RDF Schema (RDFS) were designed to describe classes and data in a form that can be managed by a computer.

To declare the above classes we can say that the they are each instances of the schema type called Class, using Turtle syntax:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix : <http://quoll.gearon.org/data/demo.ttl#> .

:Person a rdfs:Class .
:Mortal a rdfs:Class .
:Parent a rdfs:Class .
Enter fullscreen mode Exit fullscreen mode

The prefixes here will be used throughout this post. Strictly speaking, they should appear with each block of Turtle and each SPARQL query, but I will elide them for brevity.

We can then use these classes to declare instances of data, say :mary and :jane:

:mary a :Person .
:jane a :Person .
Enter fullscreen mode Exit fullscreen mode

We can also declare the role :hasChild, and connect Mary and Jane with it:

:hasChild a rfds:Property .
:mary :hasChild :jane .
Enter fullscreen mode Exit fullscreen mode

There is a whole technology stack that can start working with this already, but for now, I'm going to stick to the Horn Clauses we've already discussed.

Querying

Mortality

Let's consider the first clause:

Mortal(X) :- Person(X).
Enter fullscreen mode Exit fullscreen mode

This means we are looking for all values of X that have a type of Person. Once we have those values for X, we want to declare that each X has a type of Mortal. We can create a graph based on this query:

construct { ?x a :Mortal }
where { ?x a :Person }
Enter fullscreen mode Exit fullscreen mode

This would result in:

:mary a :Mortal .
Enter fullscreen mode Exit fullscreen mode

Alternatively, we could assert all this back in with the original data by saying insert rather than construct:

insert { ?x a :Mortal }
where { ?x a :Person }
Enter fullscreen mode Exit fullscreen mode

With either expression, the where clause describes the conditions for the body of the Horn Clause, and the insert or construct describes the head of the clause.

Parenthood

Similarly, we can look at the second clause again in order to convert it:

Parent(X) :- Person(X), hasChild(X,Y).
Enter fullscreen mode Exit fullscreen mode

The two expressions in body will result in a pair of patterns in the where clause of the query:

construct { ?x a :Parent }
where { ?x a :Person . ?x :hasChild ?y }
Enter fullscreen mode Exit fullscreen mode

This would result in:

:mary a :Parent .
Enter fullscreen mode Exit fullscreen mode

RDFS Entailment

The above rules are fine for a specific application, but they aren't generally useful. Instead, we'd like to be able to describe data with greater expressivity, and identify rules that can be applied more generally.

In the early days of the Semantic Web, the W3C had planned on releasing RDF as a data model, along with an expressive language to describe that data. There was a lot of progress with prior systems like DAML, OIL and DAML+OIL, but getting the semantics of these systems fully defined and correct is a very difficult task, leading to the creation of each of these systems, which all attempted to correct the issues of their predecessors. These efforts culminated in the Web Ontology Language (OWL) which was formally released along with the first formal release of RDF in 2004.

Prior to the formal releases of these standards, there was already a lot of RDF in use, which meant that there was a need to describe that data. While the complicated details of ontologies were being worked out, the working groups published a minimal vocabulary that could describe a taxonomy of classes and roles. This was first published as a draft for RDF Schema, or RDFS.

Alongside RDFS came a set of "semantics" (a word that means "meaning") which described what the vocabulary meant when it said things. A blog post isn't the place to get into what I mean when I say, "Class" or "Role" (also called a "property"), but I expect that if you're reading this then you may already have an idea. Instead, I'm going to look at a few other elements of RDFS.

Subclass Descriptions

The first concept I'm going to consider is the "subclass". If individuals can be classified into a class, then a subclass is a refinement on that classification, resulting in a smaller group.

Considering the classes we had earlier, we can see that all instances of :Person must be in the class :Mortal, but there is nothing that says that all :Mortals must in the the class of :Person. So :Person is at most equal to :Mortal, and most likely a smaller group. One way to write this mathematically is:

Person ⊑ Mortal
Enter fullscreen mode Exit fullscreen mode

This is essentially the same notation as "subset" ⊆, and it means the same sort of thing, just in relation to classes.

RDFS expresses the same thing with the rdfs:subClassOf relationship:

:Person rdfs:subClassOf :Mortal .
Enter fullscreen mode Exit fullscreen mode

In a similar way, all Parents are Persons, but not all people are parents:

Parent ⊑ Person
Enter fullscreen mode Exit fullscreen mode
:Parent rdfs:subClassOf :Person .
Enter fullscreen mode Exit fullscreen mode

Domain and Range Descriptions

Whether we use the term "role", "property", "predicate" or "attribute", we are referring to some kind of relationship between things. Once we introduce relationships it starts being useful to declare what a type of things can be related to each other. For instance, hasChild is an interesting relationship to have between people, but is not typically associated with something like an item of furniture.

RDFS uses the term "domain" to describe the class of things that a property can be applied to. For instance, the hasChild property can be applied to the class of Parent. This is declared with:

:hasChild rdfs:domain :Parent .
Enter fullscreen mode Exit fullscreen mode

In the same way, the things that a property can refer to are referred to as the "range":

:hasChild rdfs:range :Person .
Enter fullscreen mode Exit fullscreen mode

Note that only parents have children, but any person can be someone's child.

Entailment

We now have enough to start looking at some of the entailments defined by RDFS.

Given that RDF uses an Open World Assumption (OWA), RDFS never considers applying a property to something whose type does not match to be an error. One reason for this is because the data model can always have new data added, and it is presumed that this data will be consistent with whatever exists.

One effect of this is that we can sometimes derive what that new data must be. For instance, say we introduce a new person:

:jane :hasChild :kathy .
Enter fullscreen mode Exit fullscreen mode

Until now, :jane has only been a :Person, but because :hasChild has a domain of :Parent we now know that Jane is also a parent. Similarly, we haven't seen Kathy before, but the appearance in this statement tells use that she is a person.

We can codify these with rules:

Parent(X) :- hasChild(X,Y).
Person(Y) :- hasChild(X,Y).
Enter fullscreen mode Exit fullscreen mode

But this is where having a vocabulary for describing the system becomes useful. Instead of describing rules for each property, we can base the rules on the vocabulary and create a more general form:

C(X) :- rdfs:domain(P,C), P(X,Y).
D(Y) :- rdfs:range(P,D), P(X,Y).
Enter fullscreen mode Exit fullscreen mode

Having variable predicates like this is often considered 2nd order logic, but in this case it's not really. It just looks like it is because of the syntax we use. Instead, I'll use the term that my supervisor called it: 1.5th order logic.

To explain why this is not 2nd order logic, consider it this way: We may represent (subject,predicate,object) as predicate(subject,object) but a true Predicate-Logic representation would be with ternary predicates like: statement(subject,predicate,object). This is not as useful for our purposes, and is aesthetically less inviting, but nevertheless more correct.

Domain and Range Entailment

The above rules can also be converted into SPARQL:

construct { ?x a ?c }
where { ?p rdfs:domain ?c . ?x ?p ?y }
Enter fullscreen mode Exit fullscreen mode
construct { ?y a ?d }
where { ?p rdfs:range ?d . ?x ?p ?y }
Enter fullscreen mode Exit fullscreen mode

The statements created from these queries are considered valid entailments from RDFS:

:mary a :Parent .
:jane a :Parent .
:jane a :Person .
:kathy a :Person .
Enter fullscreen mode Exit fullscreen mode

Note that some of the entailed statements already exist. This is fine, as graphs are considered to be a set of edges, so duplicates are ignored.

Subclass Entailment

Subclass entailment can also be described using queries. In this case, there are two entailments to consider.

We know that:

  • :Person is a subclass of :Mortal, therefore all instances of :Person will be an instance of :Mortal.
  • :Parent is a subclass of :Person, therefore all instances of :Parent will be an instance of :Person.

Putting this together:

  • Instances of :Parent must be instances of :Person.
  • Instances of :Person must be instances of :Mortal.
  • Therefore: instances of :Parent must be instances of :Mortal.

The formal term for this is that rdfs:subClassOf is a transitive property. This is something that OWL describes explicitly, but I'll leave that for later. More succinctly:

A ⊑ B ⊑ C  ⊢  A ⊑ C
Enter fullscreen mode Exit fullscreen mode

Or as a Horn Clause:

rdfs:subClassOf(A,C) :- rdfs:subClassOf(A,B), rdfs:subClassOf(B,C).
Enter fullscreen mode Exit fullscreen mode

This can be evaluated in SPARQL with:

construct { ?a rdfs:subClassOf ?c }
where { ?a rdfs:subClassOf ?b . ?b rdfs:subClassOf ?c }
Enter fullscreen mode Exit fullscreen mode

The next part is considering the instances. We've already stated that if an entity is an instance of a class, then it will also be an instance of whatever that class is a subclass of (also called the superclass, just like in software development):

construct { ?x a ?d }
where { ?x a ?c . ?c rdfs:subClassOf ?d }
Enter fullscreen mode Exit fullscreen mode

With all of these rules in place, suddenly we have:

:Parent rdfs:subClassOf :Mortal .
:mary a :Parent .
:mary a :Mortal .
:jane a :Parent .
:jane a :Mortal .
:kathy a :Person .
:kathy a :Parent .
:kathy a :Mortal .
Enter fullscreen mode Exit fullscreen mode

Incidentally, both rdfs:domain and rdfs:range have a domain of rdf:Property and a range of rdfs:Class. So a full entailment regime means that we don't need to declare that :hasChild is a property, and we don't need to explicitly declare that any of our classes a rdfs:Class. I will sometimes include these declaration for the sake of clear documentation, but they're not needed.

More Entailment

The patterns of RDFS entailment are all described in the RDF Semantics specification, and collected in the section 9.2 of that document.

The rules described above correspond to several of the RDFS Entailment patterns:

# rdfs2
construct { ?x a ?c } where { ?p rdfs:domain ?c . ?x ?p ?y }
# rdfs3
construct { ?y a ?d } where { ?p rdfs:range ?d . ?x ?p ?y }
# rdfs11
construct { ?a rdfs:subClassOf ?c } where { ?a rdfs:subClassOf ?b . ?b rdfs:subClassOf ?c }
# rdfs9
construct { ?x rdfs:subClassOf ?d } where { ?x a ?c . ?c rdfs:subClassOf ?d }
Enter fullscreen mode Exit fullscreen mode

Using the patterns above, the process for converting all of the entailment patterns to executable rules takes very few steps.

# rdfs1
construct { ?aaa a rdfs:Datatype } where {?xxx ?yyy ?ddd FILTER isLiteral(?ddd) BIND (datatype(?ddd) AS ?aaa)}
# rdfs2
construct { ?yyy a ?xxx } where { ?aaa rdfs:domain ?xxx . ?yyy ?aaa ?zzz }
# rdfs3
construct { ?zzz a ?xxx } where { ?aaa rdfs:range ?xxx . ?yyy ?aaa ?zzz }
# rdfs4a
construct { ?xxx a rdfs:Resource } where { ?xxx ?aaa ?yyy }
# rdfs4b
construct { ?yyy a rdfs:Resource } where { ?xxx ?aaa ?yyy FILTER !isLiteral(?yyy)}
# rdfs5
construct { ?xxx rdfs:subPropertyOf ?zzz } where { ?xxx rdfs:subPropertyOf ?yyy . ?yyy rdfs:subPropertyOf ?zzz }
# rdfs6
construct { ?xxx rdfs:subPropertyOf ?xxx } where { ?xxx a rdf:Property }
# rdfs7
construct { ?xxx ?bbb ?yyy } where { ?aaa rdfs:subPropertyOf ?bbb . ?xxx ?aaa ?yyy }
# rdfs8
construct { ?xxx rdfs:subClassOf rdfs:Resource } where { ?xxx a rdfs:Class }
# rdfs9
construct { ?zzz a ?yyy } where { ?xxx rdfs:subClassOf ?yyy . ?zzz a ?xxx }
#rdfs10
construct { ?xxx rdfs:subClassOf ?xxx } where { ?xxx a rdfs:Class }
# rdfs11
construct { ?xxx rdfs:subClassOf ?zzz } where { ?xxx rdfs:subClassOf ?yyy . ?yyy rdfs:subClassOf ?zzz }
# rdfs12
construct { ?xxx rdfs:subPropertyOf rdfs:member } where { ?xxx a rdfs:ContainerMembershipProperty }
# rdfs13
construct { ?xxx rdfs:subClassOf rdfs:Literal } where { ?xxx a rdfs:Datatype }
Enter fullscreen mode Exit fullscreen mode

Executing them all together is a task for system that can coordinate the rules, and is typically tied to a specific database implementation though this is not necessary (e.g. Naga, though I have yet to port this to SPARQL).

Was This Needed?

Well… probably not. If you're reading this then maybe you already knew everything I had to say here. However, I also have more useful things to say, and I'm about to go on to say it. This further discussion will build on what I've discussed so far, and I wanted to establish a base of understanding before moving on.

Next

I continue this discussion in my next post...

Top comments (0)