Feature Proposals for SPARQL 1.2

Published in

Level Up Coding

9 min readFeb 16, 2021

The aim of this article is to introduce the Northwind and AdventureWorks sample databases to the community as a valuable resource to work on development of feature proposals for new versions of SPARQL, as well as introduce a preliminary list of proposed features.

These databases can be easily set up on SQL Server by following the instructions in the “Setting up the sample databases on SQL Server” section of the Northwind SQL vs SPARQL article.

The SPARQL query language for RDF is currently in version 1.1. The SPARQL 1.2 Community Group is a forum for discussion and refinement of SPARQL 1.1 and has been working on new feature proposals for version 1.2.

The AdventureWorks database is available in SQL Server only.

This article contains examples for the following proposed features:

Correlated Subquery
Window Functions
Ranking Functions
Aggregate Functions
Temporary Tables
Select Top with Ties

The queries shown in this article can be found on github.

You may choose to skip the environment setup if you don’t want to execute the queries yourself, but only browse the results in the screenshots.

Proposed Features

Correlated Subquery

In queries that include a correlated subquery (also known as a repeating subquery), the subquery depends on the outer query for its values. This means that the subquery is executed repeatedly, once for each row that might be selected by the outer query.

Select the 3 most recent orders of each customer.

USE Northwind;
SELECT 
    cst.CustomerID, 
    cst.City,
    cpp.OrderID, 
    cpp.OrderDate   
FROM 
    Customer AS cst
CROSS APPLY 
(
    SELECT TOP 3 
        ord.OrderID, ord.OrderDate, ord.CustomerID
    FROM 
        [Order] AS ord
    WHERE 
        ord.CustomerID = cst.CustomerID 
    ORDER BY 
        ord.OrderDate DESC
) AS cpp
ORDER BY 
    cst.CustomerID, 
    cst.City,
    cpp.OrderDate DESC

Note that some queries using correlated subquery can be rewritten to use window functions and vice-versa.

Window Functions

The SELECT — OVER clause determines the partitioning and ordering of a rowset before the associated window function is applied. It defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.

The following example replaces the previous correlated subquery with a window function.

Select the 3 most recent orders of each customer

USE Northwind;
SELECT 
    ptt.*
FROM
(
    SELECT
    cst.CustomerID, 
    cst.City,
    ord.OrderID, 
    ord.OrderDate, 
    ROW_NUMBER() OVER(PARTITION BY cst.CustomerID ORDER BY ord.OrderDate DESC) AS [RowNumber]
    FROM Customer AS cst
    INNER JOIN [Order] AS ord 
    ON cst.CustomerID = ord.CustomerID
) ptt
WHERE 
    ptt.[RowNumber] <= 3

Refer to the Ranking Functions section for more information on ROW_NUMBER.

The following are two more examples on window functions. Note that they could be rewritten as correlated subqueries, as well.

Top 5 most expensive products in each product category

USE Northwind;
SELECT 
    ptt.*
FROM
(
    SELECT
        ctg.CategoryName,
        prd.ProductName, 
        prd.UnitPrice,
        ROW_NUMBER() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RowNumber] 
    FROM 
        Product prd
        INNER JOIN Category ctg 
        ON prd.CategoryID = ctg.CategoryID  
) ptt
WHERE 
    ptt.[RowNumber] <= 5
ORDER BY
    ptt.CategoryName,
    ptt.RowNumber

Percentage of a product in relation to the total amount of all products bought in the same Order.

USE Northwind;
SELECT 
    ord.OrderID, 
    ord.ProductID, 
    ord.Quantity,  
    SUM(ord.Quantity) OVER(PARTITION BY ord.OrderID) AS Total,  
    CAST(1. * ord.Quantity / SUM(ord.Quantity) OVER(PARTITION BY ord.OrderID) * 100 AS DECIMAL(5,2)) AS "PercByProduct"  
FROM 
    OrderDetail ord 
WHERE 
    ord.OrderID IN(10248,10249, 10250)

Ranking Functions

Ranking functions return a ranking value for each row in a partition. Depending on the function that is used, some rows might receive the same value as other rows. Ranking functions are nondeterministic.

ROW_NUMBER: Numbers the output of a result set. More specifically, returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.

The following example calculates a row number for the salespeople in Adventure Works based on their year-to-date sales ranking.

USE AdventureWorks2017; 
SELECT 
    ROW_NUMBER() OVER(ORDER BY SalesYTD DESC) AS Row, 
    FirstName, 
    LastName, 
    ROUND(SalesYTD,2,1) AS "Sales YTD" 
FROM 
    Sales.vSalesPerson 
WHERE 
    TerritoryName IS NOT NULL 
    AND SalesYTD <> 0

RANK: Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question. ROW_NUMBER and RANK are similar. ROW_NUMBER numbers all rows sequentially (for example 1, 2, 3, 4, 5). RANK provides the same numeric value for ties (for example 1, 2, 2, 4, 5).

The following example ranks the products in inventory the specified inventory locations according to their quantities. The result set is partitioned by LocationID and logically ordered by Quantity. Notice that lines 1 and 2 (products 494 and 495) have the same quantity and therfore, both have a rank value of one. Line 3 (product 493) has rank value of 3 (highlighted), creating a gap from the previous rank value.

USE AdventureWorks2017;  
SELECT i.ProductID, p.Name, i.LocationID, i.Quantity  
    ,RANK() OVER   
    (PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS Rank  
FROM Production.ProductInventory AS i   
INNER JOIN Production.Product AS p   
    ON i.ProductID = p.ProductID  
WHERE i.LocationID BETWEEN 3 AND 4  
ORDER BY i.LocationID

DENSE_RANK: This function returns the rank of each row within a result set partition, with no gaps in the ranking values. The rank of a specific row is one plus the number of distinct rank values that come before that specific row.

This example ranks the products in inventory, by the specified inventory locations, according to their quantities. DENSE_RANK partitions the result set by LocationID and logically orders the result set by Quantity. Notice that lines 1 and 2 (products 494 and 495) have the same quantity and therfore, both have a rank value of one. Line 3 (product 493) has rank value of 2 (highlighted), creating no gaps from the previous rank value.

USE AdventureWorks2017;  
SELECT i.ProductID, p.Name, i.LocationID, i.Quantity  
    ,DENSE_RANK() OVER   
    (PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS Rank  
FROM Production.ProductInventory AS i   
INNER JOIN Production.Product AS p   
    ON i.ProductID = p.ProductID  
WHERE i.LocationID BETWEEN 3 AND 4  
ORDER BY i.LocationID

NTILE: Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.

The following example divides rows into four groups of employees based on their year-to-date sales. Because the total number of rows is not divisible by the number of groups, the first two groups have four rows and the remaining groups have three rows each.

USE AdventureWorks2017;   
SELECT p.FirstName, p.LastName  
    ,NTILE(4) OVER(ORDER BY SalesYTD DESC) AS Quartile  
    ,CONVERT(NVARCHAR(20),s.SalesYTD,1) AS SalesYTD  
    , a.PostalCode  
FROM Sales.SalesPerson AS s   
INNER JOIN Person.Person AS p   
    ON s.BusinessEntityID = p.BusinessEntityID  
INNER JOIN Person.Address AS a   
    ON a.AddressID = p.BusinessEntityID  
WHERE TerritoryID IS NOT NULL   
    AND SalesYTD <> 0

The following is an example on the Northwind database that uses all the Ranking Functions explained previouly.

Top 3 most expensive products in each product category

USE Northwind;
SELECT 
    ptt.*
FROM
(
    SELECT
        ctg.CategoryName,
        prd.ProductName, 
        prd.UnitPrice,
        ROW_NUMBER() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RowNumber], 
        RANK() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RANK],
        DENSE_RANK() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [DENSE_RANK],
        NTILE(6) OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [NTILE]
    FROM 
        Product prd
        INNER JOIN Category ctg 
        ON prd.CategoryID = ctg.CategoryID  
) ptt
WHERE 
    ptt.[RowNumber] <= 3
ORDER BY
    ptt.CategoryName,
    ptt.RowNumber

Update using Window Functions

The following is an example of common update scenario using window functions.

Apply a 10% discount on the top 3 most expensive products in each product category.

USE Northwind;
UPDATE 
    Product
SET 
    UnitPrice = UnitPrice * 0.9
WHERE 
    ProductID IN
    (
        SELECT 
            ptt.ProductID
        FROM
        (
            SELECT
                prd.ProductID,
                ROW_NUMBER() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RowNumber] 
            FROM 
                Product prd
                INNER JOIN Category ctg 
                ON prd.CategoryID = ctg.CategoryID  
        ) ptt
        WHERE 
            ptt.[RowNumber] <= 3
    )

Temporary Tables

Temporary Tables can be used in SQL to store a dataset that goes under many calculation steps before being committed to a physical table on the database. It holds intermediate results that can be consumed multiple times at different stages of a long SQL query, within the same session or transaction that created it. Currently, there is no such a feature available in SPARQL.

The following query uses a temp table to save the list of products to be updated from the previous example.

Apply a 10% discount on the top 3 most expensive products in each product category.

USE Northwind;
# Save products to temporary table
SELECT 
    ptt.ProductID,
    ptt.UnitPrice
INTO    
    #ProdDiscount
FROM
(
    SELECT
        prd.ProductID,
        prd.UnitPrice,
        ROW_NUMBER() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RowNumber] 
    FROM 
        Product prd
        INNER JOIN Category ctg 
        ON prd.CategoryID = ctg.CategoryID  
) ptt
WHERE 
    ptt.[RowNumber] <= 3 ;
# Update products in temporary table
UPDATE 
    Product
SET 
    UnitPrice = UnitPrice * 0.9
WHERE 
    ProductID IN (SELECT ProductID FROM #ProdDiscount)

Check the updated records.

USE Northwind;
SELECT
    prd.ProductID,
    prd.UnitPrice
FROM 
    Product prd
    INNER JOIN #ProdDiscount tpr
    ON prd.ProductID = tpr.ProductID
ORDER BY
    prd.ProductID

SELECT TOP WITH TIES

Returns two or more rows that tie for last place in limited results set. You must use this argument with the ORDER BY clause. WITH TIES might cause more rows to be returned than the value specified in expression. For example, if expression is set to 5 but two additional rows match the values of the ORDER BY columns in row 5, the result set will contain 7 rows.

Top 5 Supplier Representative by number of products sold.

SELECT 
    TOP 5 
    WITH TIES -- returns rows that tie for last place 
    spl.ContactName,
    COUNT(prd.ProductID) as ProductCount
FROM
    Product prd 
    INNER JOIN Category ctg
    ON prd.CategoryID = ctg.CategoryID
    INNER JOIN Supplier spl
    ON prd.SupplierID = spl.SupplierID
GROUP BY
    spl.SupplierID,
    spl.ContactName
ORDER BY    
    ProductCount DESC

SPARQL cannot return rows that tie for last place.

SELECT 
  ?supplierContactName 
  (COUNT(?product) as ?productCount)
WHERE
{ 
  ?product a  :Product ;
              :hasSupplier ?supplier .
  ?supplier a :Supplier ;
              :contactName ?supplierContactName .
} 
GROUP BY 
  ?supplierContactName
ORDER BY
  DESC(?productCount)
LIMIT 5

References

agnos.ai: Enterprise Knowledge Graph Consultants

We are an international partnership of knowledge graph architects, methodology specialists, software engineers, and…

agnos.ai

SPARQL Query Language for RDF

RDF is a directed, labeled graph data format for representing information in the Web. This specification defines the…

www.w3.org

SPARQL 1.2 Community Group

The SPARQL 1.2 Community Group is a forum for discussion and refinement of SPARQL 1.1. It will document features found…

www.w3.org

Support window functions · Issue #47 · w3c/sparql-12

SPARQL should add support for window functions. This would increase expressivity and address some existing use cases…

github.com

Subqueries (SQL Server) — SQL Server

Applies to: SQL Server (all supported versions) Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics…

docs.microsoft.com

Ranking Functions (Transact-SQL) — SQL Server

Applies to: SQL Server (all supported versions) Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics…

docs.microsoft.com

OVER Clause (Transact-SQL) — SQL Server

Applies to: SQL Server (all supported versions) Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics…

docs.microsoft.com

Aggregate Functions (Transact-SQL) — SQL Server

Applies to: SQL Server (all supported versions) Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics…

docs.microsoft.com

JOIN LATERAL or Correlated Subquery · Issue #100 · w3c/sparql-12

Why? GraphQL is a hot topic amongst developers and tool vendors. There are several implementations of GraphQL over RDF…

github.com

Refer to the following W3C github page for a discussion on SQL Window Function proposal to be included in SPARQL 1.2.

Level Up Coding

Thanks for being a part of our community! Before you go:

👏 Clap for the story and follow the author 👉
📰 View more content in the Level Up Coding publication
💰 Free coding interview course ⇒ View Course
🔔 Follow us: Twitter | LinkedIn | Newsletter

🚀👉 Join the Level Up talent collective and find an amazing job

Feature Proposals for SPARQL 1.2

Proposed Features

Correlated Subquery

Window Functions

Ranking Functions

Update using Window Functions

Temporary Tables

SELECT TOP WITH TIES

References

agnos.ai: Enterprise Knowledge Graph Consultants

We are an international partnership of knowledge graph architects, methodology specialists, software engineers, and…

SPARQL Query Language for RDF

RDF is a directed, labeled graph data format for representing information in the Web. This specification defines the…

SPARQL 1.2 Community Group

The SPARQL 1.2 Community Group is a forum for discussion and refinement of SPARQL 1.1. It will document features found…

Support window functions · Issue #47 · w3c/sparql-12

SPARQL should add support for window functions. This would increase expressivity and address some existing use cases…

Subqueries (SQL Server) — SQL Server

Applies to: SQL Server (all supported versions) Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics…

Ranking Functions (Transact-SQL) — SQL Server

Applies to: SQL Server (all supported versions) Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics…

OVER Clause (Transact-SQL) — SQL Server

Applies to: SQL Server (all supported versions) Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics…

Aggregate Functions (Transact-SQL) — SQL Server

Applies to: SQL Server (all supported versions) Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics…

JOIN LATERAL or Correlated Subquery · Issue #100 · w3c/sparql-12

Why? GraphQL is a hot topic amongst developers and tool vendors. There are several implementations of GraphQL over RDF…

Level Up Coding

Written by mbarbieri