Reasoning with Northwind
Reasoning is probably the most powerful feature and main selling point of RDF Graph Databases.
RDF Graph Databases with advanced reasoning capabilities are believed to be the future of AI, as Mike Tung put in his Forbes article Knowledge Graphs Will Lead To Trustworthy AI:
The era of black-box AI systems is over. Next-generation systems will optimize the explainability and trustworthiness of the overall human-AI system, and knowledge graphs will serve as a key ingredient that makes these systems more explainable, inspectable, auditable and, ultimately, controllable.
What’s Semantic Reasoning?
Semantic reasoning is the ability of a system to infer new facts from existing data based on inference rules or ontologies. In simple terms, rules add new information to the existing dataset, adding context, knowledge, and valuable insights — Oxford Semantic Technologies
The Northwind Article Series
The Northwind sample database series started its journey with an introduction to SPARQL for professionals with a relational database background, followed by a few other publications covering topics such as Named Graphs, Exploring Graph Databases, and Future Proposals for SPARQL 1.2—for more details on those, please refer to the links in the “References” section in the end of this article.
This is the first article on reasoning where we are going to create inference rules to simplify and optimise queries and data management. We are planning to extend the reasoning capabilities of the Northwind database by adding axioms, an ontology and additional rules to answer more complex questions in future articles.
We are going to use RDFox, an in-memory high performance knowledge graph and semantic reasoning engine. RDFox uses Datalog rule language to express rules — for more details on RDFox Reasoning, Datalog, and Rules, please refer to the links in the “References” section in the end of this article.
It’s a learning by example experience and not much theory will be covered here.
If you choose to set up the environment in order to execute the queries yourself, please refer to the “Setting up the demo environment” section further down in this article. Otherwise, you can just browse the queries and screenshots below.
Rule Examples
Rules define conditions to be matched in the data in order to infer new triples that become available to queries. They provide a mechanism that allows tailor-made performance improvements to specific queries.
In this section we are going introduce three practical examples (use cases) to explain how rules work.
Each use case will contain an original query, a rule and a modified version of the query that uses the rule, producing the same result.
Use case 01 — list customers who bought a specific product
Original query
The original SPARQL query used to return a list of customers who bought product-61.
PREFIX : <http://www.mysparql.com/resource/northwind/>
PREFIX kggraph: <http://www.mysparql.com/resource/northwind/graph/>
# Customers who bought product-61
SELECT DISTINCT # eliminates duplicates in case the same customer bought a product more than once
?customer
?companyName
?contactName
WHERE {
GRAPH kggraph:dataGraph {
?customer a :Customer ;
:companyName ?companyName ;
:contactName ?contactName .
?order a :Order ;
:hasCustomer ?customer .
?orderDetail a :OrderDetail ;
:hasProduct :product-61 ;
:belongsToOrder ?order .
}
}
ORDER BY ?customer
Original query result
Note: Query above completed in 12ms and screenshot displays 4 out of 22 results.
By using Property Path
we can easily demonstrate the path that needs to be traversed to answer the question.
# Path: customer → order → orderDetail → productPREFIX : <http://www.mysparql.com/resource/northwind/>
PREFIX kggraph: <http://www.mysparql.com/resource/northwind/graph/>
SELECT DISTINCT
?customer
WHERE {
GRAPH ?graph {
?customer ^:hasCustomer/^:belongsToOrder/:hasProduct :product-61 .
}
}
ORDER BY ?customer
That’s quite a long way to answer such a typical question. We want to create a shortcut, which will not only speed up things but also make the query more intuitive and easier to maintain. This is archived by rule 01 below.
Rule 01 — boughtProduct
A rule that defines which product was bought by a customer.
Rule definition
PREFIX : <http://www.mysparql.com/resource/northwind/>
PREFIX kggraph: <http://www.mysparql.com/resource/northwind/graph/>
[?customer, :boughtProduct, ?product] :-
[?customer, a, :Customer],
[?order, a, :Order],
[?orderDetail, a, :OrderDetail],
[?product, a, :Product],
[?orderDetail, :hasProduct, ?product],
[?orderDetail, :belongsToOrder, ?order],
[?order, :hasCustomer, ?customer] .
Add rule to the data store
There are many ways of adding rules to an RDFox data store. The following example uses curl
through a REST API.
curl -X POST -G --data-urlencode "default-graph-name=http://www.mysparql.com/resource/northwind/graph/dataGraph" -H "Content-Type:" -T "rules/01-customer-bought-product.dlog" "localhost:12110/datastores/Northwind/content"
Note that the destination named graph for the rule is specified in the curl command.
For those who set up the environment with RDFox running in a Docker container, the required authentication will need to be added to the curl command:
-u admin:admin
Modified query
The original query was modified to consume the new rule we have just created. The modified query produces the same result.
PREFIX : <http://www.mysparql.com/resource/northwind/>
PREFIX kggraph: <http://www.mysparql.com/resource/northwind/graph/>
# Customers who bought product-61
SELECT
?customer
?companyName
?contactName
WHERE {
GRAPH kggraph:dataGraph {
?customer a :Customer ;
:boughtProduct :product-61 ;
:companyName ?companyName ;
:contactName ?contactName .
}
}
ORDER BY ?customer
Modified query result
Note: Query above completed in 8ms and screenshot displays 4 out of 22 results.
Since version 5.6, it’s possible to highlight reasoning on the RDFox web console. The following shows the new derived fact, which is materialised in RDFox as a new triple in the graph.
Use case 02 — top 5 customers by product count
Original query
Lists the top 5 customers by product count.
PREFIX : <http://www.mysparql.com/resource/northwind/>
PREFIX kggraph: <http://www.mysparql.com/resource/northwind/graph/>
# Top 5 customers by product count
SELECT
?customer
?companyName
?contactName
(COUNT(?product) as ?count)
WHERE {
GRAPH kggraph:dataGraph {
?orderDetail :hasProduct ?product ;
:belongsToOrder ?order .
?order :hasCustomer ?customer .
?customer :companyName ?companyName ;
:contactName ?contactName .
}
}
GROUP BY ?customer ?companyName ?contactName
ORDER BY DESC(?count)
LIMIT 5
Original query Result
Note: query above executed in 12ms.
Create Rule 02 — hasProductCount
The following rule defines relations based on the result of an aggregate calculation.
Rule definition
PREFIX : <http://www.mysparql.com/resource/northwind/>
[?customer, :hasProductCount, ?productCount] :-
AGGREGATE (
[?customer, a, :Customer],
[?order, a, :Order],
[?orderDetail, a, :OrderDetail],
[?product, a, :Product],
[?orderDetail, :hasProduct, ?product],
[?orderDetail, :belongsToOrder, ?order],
[?order, :hasCustomer, ?customer]
ON ?customer
BIND COUNT(?product) AS ?productCount
) .
Add rule to the data store
curl -X POST -G --data-urlencode "default-graph-name=http://www.mysparql.com/resource/northwind/graph/dataGraph" -H "Content-Type:" -T "rules/02-customer-has-product-count.dlog" "localhost:12110/datastores/Northwind/content"
Modified query
PREFIX : <http://www.mysparql.com/resource/northwind/>
PREFIX kggraph: <http://www.mysparql.com/resource/northwind/graph/>
# Top 5 customers by product count
SELECT
?customer
?companyName
?contactName
?productCount
WHERE {
GRAPH kggraph:dataGraph {
?customer :hasProductCount ?productCount ;
:companyName ?companyName ;
:contactName ?contactName .
}
}
ORDER BY DESC(?productCount)
LIMIT 5
Modified query result
Note: It’s very important to define the types and make rules as selective as possible to improve rule materialisation and query answering times. For example, adding the types
[?customer, a, :Customer]
,[?order, a, :Order]
and[?orderDetail, a, OrderDetail]
to the previous rule brought query execution time from 10 down to 3ms. More guidelines on how to create rules can be found in The Do’s and Don’ts of Rule and Query Writing article.
The following illustration highlights the inferred facts (in cyan) as a result of rules 01 and 02 .
Use Case 03 — customers who never placed an order
Original query
Lists the customers who never placed an order.
PREFIX : <http://www.mysparql.com/resource/northwind/>
PREFIX kggraph: <http://www.mysparql.com/resource/northwind/graph/>
# Customers who never placed an order
SELECT DISTINCT
?customer
?companyName
?postalCode
?city
?country
WHERE {
GRAPH ?graph {
?customer a :Customer ;
:customerID ?customerID ;
:companyName ?companyName ;
:city ?city ;
:country ?country .
OPTIONAL { ?customer :postalCode ?postalCode } .
OPTIONAL {
?order a :Order .
?customer ^:hasCustomer ?order .
}
FILTER (!BOUND(?order))
}
}
ORDER BY ?customer
Original query result
The SPARQL query above can be re-written using MINUS or FILTER NOT EXISTS, producing the same result. For the differences on how these commands get evaluated, please refer to the comments in the file
queries/03–1-customers-who-never-placed-an-order-before-rule-03.sparql
from the demo github repo.
Create Rule 03 — CustomerWithoutOrder
Negation as failure is a very powerful feature of rules in RDFox.
Rule definition
PREFIX : <http://www.mysparql.com/resource/northwind/>
PREFIX kggraph: <http://www.mysparql.com/resource/northwind/graph/>
[?customer, a, :CustomerWithoutOrder] :-
[?customer, a, :Customer], # All customers
NOT EXISTS ?order IN (
[?order, a, :Order],
[?order, :hasCustomer, ?customer] # Only customers who placed orders
) .
Add rule to the data store
curl -X POST -G --data-urlencode "default-graph-name=http://www.mysparql.com/resource/northwind/graph/dataGraph" -H "Content-Type:" -T "rules/03-customer-without-order.dlog" "localhost:12110/datastores/Northwind/content"
Modified query
PREFIX : <http://www.mysparql.com/resource/northwind/>
PREFIX kggraph: <http://www.mysparql.com/resource/northwind/graph/>
# Customers who never placed an order
SELECT DISTINCT
?customer
?companyName
?postalCode
?city
?country
WHERE {
GRAPH ?graph {
?customer a :CustomerWithoutOrder ;
:customerID ?customerID ;
:companyName ?companyName ;
:city ?city ;
:country ?country .
OPTIONAL {?customer :postalCode ?postalCode} .
}
}
ORDER BY ?customer
Modified query result
The following illustration highlights the derived facts (in cyan) as a result of rule 03.
And, finally, the following highlights (in cyan) the derived facts as a result of all previous rules created so far.
Let’s see what happens if a CustomerWithoutOrder
places an order.
PREFIX : <http://www.mysparql.com/resource/northwind/>
PREFIX kggraph: <http://www.mysparql.com/resource/northwind/graph/>
# Add an order to a customer :customer-FISSA
INSERT DATA {
GRAPH kggraph:dataGraph {
:order-99999 a :Order ;
:hasCustomer :customer-FISSA .
:orderDetail-99999-61 a :OrderDetail ;
:hasProduct :product-61 ;
:belongsToOrder :order-99999 .
}
}
When we execute the modified query a second time, :customer-FISSA
is not returned. That’s because the derived fact CustomerWithoutOrder
was retracted when that customer placed an order.
And, what if we delete rule 03 from the Northwind data store altogether?
curl -X PATCH -G --data-urlencode "default-graph-name=http://www.mysparql.com/resource/northwind/graph/dataGraph" -H "Content-Type:" -T "rules/03-customer-without-order.dlog" "localhost:12110/datastores/Northwind/content?operation=delete-content"
Then, query 03 will not produce any results.
How Do Rules Work In RDFox?
- RDFox uses parallel reasoning and does incremental materialisation of inferred triples.
- Data updates will cause new materialised triples to be derived as a logical consequence or retracted when they are no longer justified. This happens automatically when adding or deleting facts or rules, and it’s done in an incremental and very efficient fashion.
- RDFox allows us to add inferred triples to Named Graphs other than the Default Graph.
The above are considered to the most desirable features of an advanced reasoning engine.
Conclusion
We started our journey with a simple demonstration on how inference rules can enrich an existing triplestore. We are planning to extend the reasoning capabilities of the Northwind sample database in future articles by adding axioms, an ontology and additional rules to answer more complex questions. Stay tuned!
Setting Up The Demo Environment
If you choose to run the queries in this demonstration, please follow the steps below to set up the demo environment.
Clone the Northwind Repository
The following github repository contains the sample data, queries and rules used in this demonstration.
IMPORTANT! If you are on MacOS, you may choose to follow the instructions in the git repo above and skip the remaining steps in this section. The repo will start a persisted instance of RDFox in a Docker container with the Northwind data store already loaded and configured.
By using this option, the only thing that changes for you when executing the steps in the demo is the curl commands to add rules. You will need to append the authentication-u admin:admin
before executing them.
Download RDFox
Request an RDFox license here. You will need a commercial or academic email.
Download the appropriate version of RDFox onto your machine.
Copy the license file RDFox.lic
to the directory where the RDFox executable is located.
Launch RDFox
In a terminal, from the same directory above, execute ./RDFox sandbox
on MacOS/Linux or RDFox.exe sandbox
on Windows to launch RDFox.
MacOS Only
If you get a warning message saying that RDFox is not from an identified developer, click Cancel
.
Go to System Preferences > Security and Privacy > General Tab
and then click on Allow Anyway
, as illustrated below and run the sandbox
command again.
If you get another warning message, choose Open
to start the RDFox shell.
If everything goes fine, you should get the following message in the terminal: A new server connection was opened as role ‘guest’ and stored with name ‘sc1’.
Expose RDFox REST API
In the Shell, execute the following to expose the RDFox REST API, which includes a SPARQL over HTTP endpoint.
endpoint start
MacOS only
if you get the following message, choose Allow
.
You should get the following message: The REST endpoint was successfully started at port number/service name 12110 with XX threads.
Warning! Do not close the terminal window as that would stop the RDFox server. Also, any of the commands to add rules in this demo must be executed in a separate terminal window.
At this point you should be able to navigate to the RDFox web console at http://localhost:12110/console/
Create the Northwind Data Store
On the Console UI, click on + Create data store
and name it “Northwind”.
Cancel the Import Content
popup as we need to create a graph before importing the data.
Execute the following query on the RDFox web console to create the dataGraph
where we are going to store the data and rules.
PREFIX kggraph: <http://www.mysparql.com/resource/northwind/graph/>
CREATE GRAPH kggraph:dataGraph
Import The Northwind Sample Data
From … Menu
, choose Add content
Select dataGraph
from the drop down and then select the northind.nt
file under the nortwind/data
directory in your local branch or download it from github repo.
You should get a confirmation message saying that 30780 facts
were added to the data store.
Now, go to the beginning of this article for the instructions on how to create rules and run the SPARQL queries.
Once you are done with this demonstration, you can stop the RDFox Server by executing the command quit
in the original terminal window.
References
Level Up Coding
Thanks for being a part of our community! Before you go:
- 👏 Clap for the story and follow the author 👉
- 📰 View more content in the Level Up Coding publication
- 💰 Free coding interview course ⇒ View Course
- 🔔 Follow us: Twitter | LinkedIn | Newsletter
🚀👉 Join the Level Up talent collective and find an amazing job