Feature Proposals for SPARQL 1.2

mbarbieri
Level Up Coding
Published in
9 min readFeb 16, 2021

--

Photo by Sander Weeteling on Unsplash

The aim of this article is to introduce the Northwind and AdventureWorks sample databases to the community as a valuable resource to work on development of feature proposals for new versions of SPARQL, as well as introduce a preliminary list of proposed features.

These databases can be easily set up on SQL Server by following the instructions in the “Setting up the sample databases on SQL Server section of the Northwind SQL vs SPARQL article.

The SPARQL query language for RDF is currently in version 1.1. The SPARQL 1.2 Community Group is a forum for discussion and refinement of SPARQL 1.1 and has been working on new feature proposals for version 1.2.

The AdventureWorks database is available in SQL Server only.

This article contains examples for the following proposed features:

  • Correlated Subquery
  • Window Functions
  • Ranking Functions
  • Aggregate Functions
  • Temporary Tables
  • Select Top with Ties

The queries shown in this article can be found on github.

You may choose to skip the environment setup if you don’t want to execute the queries yourself, but only browse the results in the screenshots.

Proposed Features

Correlated Subquery

In queries that include a correlated subquery (also known as a repeating subquery), the subquery depends on the outer query for its values. This means that the subquery is executed repeatedly, once for each row that might be selected by the outer query.

Select the 3 most recent orders of each customer.

USE Northwind;
SELECT
cst.CustomerID,
cst.City,
cpp.OrderID,
cpp.OrderDate
FROM
Customer AS cst
CROSS APPLY
(
SELECT TOP 3
ord.OrderID, ord.OrderDate, ord.CustomerID
FROM
[Order] AS ord
WHERE
ord.CustomerID = cst.CustomerID
ORDER BY
ord.OrderDate DESC
) AS cpp
ORDER BY
cst.CustomerID,
cst.City,
cpp.OrderDate DESC

Note that some queries using correlated subquery can be rewritten to use window functions and vice-versa.

Window Functions

The SELECT — OVER clause determines the partitioning and ordering of a rowset before the associated window function is applied. It defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.

The following example replaces the previous correlated subquery with a window function.

Select the 3 most recent orders of each customer

USE Northwind;
SELECT
ptt.*
FROM
(
SELECT
cst.CustomerID,
cst.City,
ord.OrderID,
ord.OrderDate,
ROW_NUMBER() OVER(PARTITION BY cst.CustomerID ORDER BY ord.OrderDate DESC) AS [RowNumber]
FROM Customer AS cst
INNER JOIN [Order] AS ord
ON cst.CustomerID = ord.CustomerID
) ptt
WHERE
ptt.[RowNumber] <= 3

Refer to the Ranking Functions section for more information on ROW_NUMBER.

The following are two more examples on window functions. Note that they could be rewritten as correlated subqueries, as well.

Top 5 most expensive products in each product category

USE Northwind;
SELECT
ptt.*
FROM
(
SELECT
ctg.CategoryName,
prd.ProductName,
prd.UnitPrice,
ROW_NUMBER() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RowNumber]
FROM
Product prd
INNER JOIN Category ctg
ON prd.CategoryID = ctg.CategoryID
) ptt
WHERE
ptt.[RowNumber] <= 5
ORDER BY
ptt.CategoryName,
ptt.RowNumber

Percentage of a product in relation to the total amount of all products bought in the same Order.

USE Northwind;
SELECT
ord.OrderID,
ord.ProductID,
ord.Quantity,
SUM(ord.Quantity) OVER(PARTITION BY ord.OrderID) AS Total,
CAST(1. * ord.Quantity / SUM(ord.Quantity) OVER(PARTITION BY ord.OrderID) * 100 AS DECIMAL(5,2)) AS "PercByProduct"
FROM
OrderDetail ord
WHERE
ord.OrderID IN(10248,10249, 10250)

Ranking Functions

Ranking functions return a ranking value for each row in a partition. Depending on the function that is used, some rows might receive the same value as other rows. Ranking functions are nondeterministic.

ROW_NUMBER: Numbers the output of a result set. More specifically, returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.

The following example calculates a row number for the salespeople in Adventure Works based on their year-to-date sales ranking.

USE AdventureWorks2017; 
SELECT
ROW_NUMBER() OVER(ORDER BY SalesYTD DESC) AS Row,
FirstName,
LastName,
ROUND(SalesYTD,2,1) AS "Sales YTD"
FROM
Sales.vSalesPerson
WHERE
TerritoryName IS NOT NULL
AND SalesYTD <> 0

RANK: Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question. ROW_NUMBER and RANK are similar. ROW_NUMBER numbers all rows sequentially (for example 1, 2, 3, 4, 5). RANK provides the same numeric value for ties (for example 1, 2, 2, 4, 5).

The following example ranks the products in inventory the specified inventory locations according to their quantities. The result set is partitioned by LocationID and logically ordered by Quantity. Notice that lines 1 and 2 (products 494 and 495) have the same quantity and therfore, both have a rank value of one. Line 3 (product 493) has rank value of 3 (highlighted), creating a gap from the previous rank value.

USE AdventureWorks2017;  
SELECT i.ProductID, p.Name, i.LocationID, i.Quantity
,RANK() OVER
(PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS Rank
FROM Production.ProductInventory AS i
INNER JOIN Production.Product AS p
ON i.ProductID = p.ProductID
WHERE i.LocationID BETWEEN 3 AND 4
ORDER BY i.LocationID

DENSE_RANK: This function returns the rank of each row within a result set partition, with no gaps in the ranking values. The rank of a specific row is one plus the number of distinct rank values that come before that specific row.

This example ranks the products in inventory, by the specified inventory locations, according to their quantities. DENSE_RANK partitions the result set by LocationID and logically orders the result set by Quantity. Notice that lines 1 and 2 (products 494 and 495) have the same quantity and therfore, both have a rank value of one. Line 3 (product 493) has rank value of 2 (highlighted), creating no gaps from the previous rank value.

USE AdventureWorks2017;  
SELECT i.ProductID, p.Name, i.LocationID, i.Quantity
,DENSE_RANK() OVER
(PARTITION BY
i.LocationID ORDER BY i.Quantity DESC) AS Rank
FROM Production.ProductInventory AS i
INNER JOIN Production.Product AS p
ON i.ProductID = p.ProductID
WHERE i.LocationID BETWEEN 3 AND 4
ORDER BY i.LocationID

NTILE: Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.

The following example divides rows into four groups of employees based on their year-to-date sales. Because the total number of rows is not divisible by the number of groups, the first two groups have four rows and the remaining groups have three rows each.

USE AdventureWorks2017;   
SELECT p.FirstName, p.LastName
,NTILE(4) OVER(ORDER BY SalesYTD DESC) AS Quartile
,CONVERT(NVARCHAR(20),s.SalesYTD,1) AS SalesYTD
, a.PostalCode
FROM Sales.SalesPerson AS s
INNER JOIN Person.Person AS p
ON s.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.Address AS a
ON a.AddressID = p.BusinessEntityID
WHERE TerritoryID IS NOT NULL
AND SalesYTD <> 0

The following is an example on the Northwind database that uses all the Ranking Functions explained previouly.

Top 3 most expensive products in each product category

USE Northwind;
SELECT
ptt.*
FROM
(
SELECT
ctg.CategoryName,
prd.ProductName,
prd.UnitPrice,
ROW_NUMBER() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RowNumber],
RANK() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RANK],
DENSE_RANK() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [DENSE_RANK],
NTILE(6) OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [NTILE]
FROM
Product prd
INNER JOIN Category ctg
ON prd.CategoryID = ctg.CategoryID
) ptt
WHERE
ptt.[RowNumber] <= 3
ORDER BY
ptt.CategoryName,
ptt.RowNumber

Update using Window Functions

The following is an example of common update scenario using window functions.

Apply a 10% discount on the top 3 most expensive products in each product category.

USE Northwind;
UPDATE
Product
SET
UnitPrice = UnitPrice * 0.9
WHERE
ProductID IN
(
SELECT
ptt.ProductID
FROM
(
SELECT
prd.ProductID,
ROW_NUMBER() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RowNumber]
FROM
Product prd
INNER JOIN Category ctg
ON prd.CategoryID = ctg.CategoryID
) ptt
WHERE
ptt.[RowNumber] <= 3
)

Temporary Tables

Temporary Tables can be used in SQL to store a dataset that goes under many calculation steps before being committed to a physical table on the database. It holds intermediate results that can be consumed multiple times at different stages of a long SQL query, within the same session or transaction that created it. Currently, there is no such a feature available in SPARQL.

The following query uses a temp table to save the list of products to be updated from the previous example.

Apply a 10% discount on the top 3 most expensive products in each product category.

USE Northwind;
# Save products to temporary table
SELECT

ptt.ProductID,
ptt.UnitPrice
INTO
#ProdDiscount
FROM
(
SELECT
prd.ProductID,
prd.UnitPrice,
ROW_NUMBER() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RowNumber]
FROM
Product prd
INNER JOIN Category ctg
ON prd.CategoryID = ctg.CategoryID
) ptt
WHERE
ptt.[RowNumber] <= 3 ;
# Update products in temporary table
UPDATE

Product
SET
UnitPrice = UnitPrice * 0.9
WHERE
ProductID IN (SELECT ProductID FROM #ProdDiscount)

Check the updated records.

USE Northwind;
SELECT
prd.ProductID,
prd.UnitPrice
FROM
Product prd
INNER JOIN #ProdDiscount tpr
ON prd.ProductID = tpr.ProductID
ORDER BY
prd.ProductID

SELECT TOP WITH TIES

Returns two or more rows that tie for last place in limited results set. You must use this argument with the ORDER BY clause. WITH TIES might cause more rows to be returned than the value specified in expression. For example, if expression is set to 5 but two additional rows match the values of the ORDER BY columns in row 5, the result set will contain 7 rows.

Top 5 Supplier Representative by number of products sold.

SELECT 
TOP 5
WITH TIES -- returns rows that tie for last place
spl.ContactName,
COUNT(prd.ProductID) as ProductCount
FROM
Product prd
INNER JOIN Category ctg
ON prd.CategoryID = ctg.CategoryID
INNER JOIN Supplier spl
ON prd.SupplierID = spl.SupplierID
GROUP BY
spl.SupplierID,
spl.ContactName
ORDER BY
ProductCount DESC

SPARQL cannot return rows that tie for last place.

SELECT 
?supplierContactName
(COUNT(?product) as ?productCount)
WHERE
{
?product a :Product ;
:hasSupplier ?supplier .
?supplier a :Supplier ;
:contactName ?supplierContactName .
}
GROUP BY
?supplierContactName
ORDER BY
DESC(?productCount)
LIMIT 5

References

Refer to the following W3C github page for a discussion on SQL Window Function proposal to be included in SPARQL 1.2.

Level Up Coding

Thanks for being a part of our community! Before you go:

🚀👉 Join the Level Up talent collective and find an amazing job

--

--