Northwind Use Case on Amazon Neptune

mbarbieri

Follow

Published in

Level Up Coding

11 min readApr 16, 2024

--

Northwind is a well-known e-commerce database which is largely used for training purposes across various database platforms. Refer to w3schools.com for more info.

In this demonstration we are going to use the RDF Knowledge Graph version of the Northwind database to provide hands-on experience on Amazon Neptune by executing use case stories (SPARQL queries) on a Jupyter Notebook. It’s a practical, learn-as-you-go experience.

EKGF Guiding Principals

The Enterprise Knowledge Graph Forum (EKGF) is now part of the Object Management Group (OMG).
The EKGF was established to define best practice and mature the marketplace for EKG adoption and provides 10 guiding principles which are intended to provide guidelines for the development and deployment of an Enterprise Knowledge Graph (EKG). The principles emphasise shared meaning and content reuse that are the cornerstone of operating in complex and interconnected environments.

The Northwind Use Case demonstrates Principle 7, which says:

All artefacts around and information in the EKG are linked to defined and prioritised use cases. Nothing in the EKG exists without a known business justification and purpose.

Setting up the Demo

For instructions on how to set up this demonstration to execute the queries on a Neptune Notebook yourself, please refer to the “Resources” section in the end of this article.

The Northwind Notebook has been planned to be added to the Amazon Neptune Jupyter Notebooks sample applications, so you may get it by default next time you start a Jupyter Notebook on AWS.

Northwind Model

Here’s a simplified version of the Northwind schema (RDF Graph Database)

SPARQL

This demonstration covers a great deal of the syntax and semantics of the SPARQL query language, including FILTER, UNION, LIMIT, OFFSET, GROUP BY, ORDER BY, DISTINCT, OPTIONAL, BIND, BOUND, MINUS, FILTER NOT EXISTS, INSERT, DELETE, DESCRIBE, CONSTRUCT, REGEX, CONTAINS, HAVING, as well as String Matching and Manipulation, Aggregation Functions, Subqueries, and Property Paths, among others.

Northwind Use Case

When documenting your use cases, identify key stakeholders, outline primary and secondary business outcomes for short and long terms, document relevant personas, and capture the concepts involved in the use case and stories.

The Neptune Jupyter Notebook used in this demonstration contains a template for documenting use cases.

Stories

Crafting effective user stories is important for a comprehensive understanding of your project. Consider this template: ‘As a <Role>, I want <Desired Outcome> in order to achieve <Business Objective>. This structured approach ensures clear communication of user needs, desired outcomes, and the ultimate business goal.

The following are the Stories of the Northwind Use Case:

Create a concise report listing all the employees in the company

Full story:
As a <Human Resources Manager>,
I want to <create a concise report listing all the employees in the company>
in order to <learn their respective positions within the organization>

Main Concepts:
<Human Resources Manager>, <Employee>

Query:
Given a Human Resources Manager persona, WHEN they want to create a report with all employees in the company, THEN the system should execute a SPARQL query to retrieve values for the rdfs:label, foaf:title, foaf:lastName, and foaf:firstName properties of each Employee.

Amazon Neptune associates every triple with a named graph. The default graph is defined as the union of all named graphs.
If you submit a SPARQL query without explicitly specifying a graph via the GRAPH keyword or constructs such as FROM NAMED, Neptune always considers all triples in your DB instance. Triples that appear in more than one graph are returned only once.
In this demonstration, we will skip the template for the remaining stories and provide only brief descriptions for simplicity.

As a Human Resources Manager, I want to know all the employees located in the USA.

Note that the same filter can be applied directly as follows:

As a Human Resources Manager, I want to know if the company has employees in the UK.

As a Sales Manager, I want to be able to search companies by name.

Note that the query above shows two ways of implementing the filter. You can comment out the first filter and uncomment the second one to verify its result.

As a Sales Manager, I want to create a basic report showing products supplied by companies located in the USA.

As a Sales Manager, I want to create a basic report showing customers who placed at least one order.

As a Marketing Manager, I want to create a basic report showing customers who never placed an order.

The same result can be obtained by using MINUS.

NOT EXISTS and MINUS in SPARQL represent two ways of thinking about negation and they will be explored in more details in a future article.

As a Marketing Manager, I want to search products by name or a combination of identification number and price.

As a Marketing Manager, I want to know which products are in a given price range.

As a Marketing Manager, I want to create a list of all suppliers located in Japan or Italy.

As a Marketing Manager, I want to create a report containing all suppliers.

As a Marketing Manager, I want to create a report containing all suppliers that have a fax number.

Note: Fax was a machine from the 90s able to scan and transmit a document over the phone line :-)

As a Marketing Manager, I want to create a report containing all suppliers that don’t have a fax number.

The same result can be obtained by using the NOT EXISTS filter below.

As a Marketing Manager, I want to create a report of products grouped by category and sorted by unit price descending.

As a Marketing Manager, I want to create a report with all countries I buy from.

As a Data Steward, I want to generate an identification code for each of our employees.

As a Sales Manager, I want to create a report with the top 5 largest quantity of a product sold in a single order.

As a Sales Manager, I want to retrieve the second page of a report with the top largest quantity of a product sold in a single order.

As a Sales Manager, I want to know the total number of suppliers.

As a Sales Manager, I want to know the number of countries I buy from.

As a Sales Manager, I want to know the top 5 most sold products.

As a Sales Manager, I want to know the top 5 largest orders shipped to the USA.

As a Sales Manager, I want to know the orders over 10K shipped to the USA.

As a Sales Manager, I want to know the top 5 supplier representatives by number of products sold.

As a Sales Manager, I want to know the contact details of suppliers, customers and employees to send out Xmas cards.

As a Sales Manager, I want to know all products that belong to the Seafood category an their quantity in stock.

As a Sales Manager, I want to calculate the average number of orders processed per year.

As a Sales Representative, I want to be able to insert a new customer.

Checking if new customer has been added successfully.

As a Sales Representative, I want to be able to update an existing customer.

Step 1: Insert

Step 2: Update (Delete/Insert)

Checking if existing customer has been updated successfully.

As a Sales Representative, I want to be able to delete an existing customer.

Checking if existing customer has been deleted successfully.

Recommendation Stories

The next queries focus on recommendations.

As a Sales Manager, I want to know which products were bought together in the same order.

Query: Customers who bought product-61 also bought which products in the same order and how many times?

As a Sales Manager, I want to know which products were bought together across all orders.

Query: Customers who bought product-61 also bought which products across all orders and how many times?

As a Sales Manager, I want to know how many times two given products where bought by the same customer.

Query: How many times products 2 and 61 where bought by the same customer.

For didactical purposes, the following are the 6 steps taken to develop the Property Path query above.

Step 1: Graph pattern traversing Product, OrderDetail, Order and Customer nodes, which are represented by the order, orderDetail, product and customer bound variables in the query.

# List order items that contain product-61
SELECT *
WHERE {
    ?orderDetail :hasProduct ?product ;
                 :belongsToOrder ?order .
    ?order :hasCustomer ?customer .
    FILTER (?product = :product-61)
}

Note the naming convention where bound variable names are declared in CamelCase, whereas the Class names, as seen in the diagram, are declared in PascalCase.

Step 2: Invert the hasProduct path expression to match the following direction: product → orderDetail → order → customer

# The same result is returned
SELECT *
WHERE {
    ?product ^:hasProduct ?orderDetail . # Invert direction
    ?orderDetail :belongsToOrder ?order .
    ?order :hasCustomer ?customer .
    FILTER (?product = :product-61)
}

Step 3: Use sequence path to omit the binding of the ?orderDetail variable.

SELECT *
WHERE {
    ?product ^:hasProduct/:belongsToOrder ?order .
    ?order :hasCustomer ?customer .
    FILTER (?product = :product-61)
}

Step 4: Use sequence path to omit the binding of the ?order variable.

# All customers that bought product-61
# Distinct eliminates duplicates in case the same customer bought a product more than once
SELECT DISTINCT *
WHERE {
    ?product ^:hasProduct/:belongsToOrder/:hasCustomer ?customer .
    FILTER (?product = :product-61)
}
ORDER BY ?product

Step 5: Bind the filter value to the subject variable directly.

SELECT DISTINCT *
WHERE {
    :product-61 ^:hasProduct/:belongsToOrder/:hasCustomer ?customer .
}
ORDER BY ?product

Step 6: And finally, omit the binding of the ?customer variable and invert the full path back to product.

SELECT (COUNT (1) AS ?Count)
WHERE {
    :product-2  ^:hasProduct/:belongsToOrder/:hasCustomer/
    ^(^:hasProduct/:belongsToOrder/:hasCustomer) :product-61
}

Graph Visualisation

As a Data Engineer, I want to visualise a graph representation of a given Order.

Note the panel on the right, which contains details on the node selected: order-10370.

Zooming in you you can see that the order-10370 has employee, customer, shipper and two order details, which contain products with their respective categories and suppliers.

Resources

As stated previously, the Northwind Notebook has been planned to be added to the Amazon Neptune Jupyter Notebooks sample applications under the following location:
01-Neptune-Database > 03-Sample-Applications > 07-Northwind-Use-Case

You can download the resources used in this demonstration from the following locations in the Northwind GitHub repository:

After setting up a Neptune DB instance and an S3 bucket on AWS, you just need to upload the the Jupyter Notebook mentioned above into a new Neptune Notebook, then upload the Northwind N-Triple file to an S3 bucket and use the magic command %load in the notebook itself to load the data into the RDF Graph Database. For more information on how to create an instance of Neptune and set up a Notebook, please refer to the Neptune user guide.