Optimizing Stream Queries by Reusing Aggregated Results

Kush Bisen

Optimizing Stream Queries by Reusing Aggregated Results

Kush Bisen

UGent - imec

RDF Stream Processing

RDF Stream

Windows in RDF Stream Processing

Processing with Windows

RSP-Query Language (RSPQL)

Querying Average Temperature

          PREFIX ex: <http://rsp.js/>
          REGISTER RStream  AS
          SELECT (AVG(?temperature) AS ?temp)
          FROM NAMED WINDOW ex:w1 ON STREAM ex:stream [RANGE 30min STEP 15min]
          WHERE { 
            WINDOW ex:w1 {  
              ?sensor :hasTemperature ?temperature .
            } 
          }
      

Use Case

Use Case

Reusing Stream Queries

In distributed and decentralized systems,
Different RSP Agents may have different queries running over the same or different streams.

The results of these queries can be reused to optimize the execution of other queries.

Finding Similarities in Queries

Query containment
Results of one query are always a subset of the results for another query

Query Isomorphism
Queries are structurally identical, and will produce the same results.

Finding Similarities in Queries

Given a set of RSP Agents and their queries, if a parent query is introduced to the system.

Query Containment relationships between the parent query and the child queries are used to determine the reusability of the child query results.

Joining Query Results

Joining Query Results - Same Window Semantics

Combining Results from different Window Semantics

Joining Query Results - Different Window Semantics

Combining Results from different Window Semantics

Merge Results from different Window Semantics

Approximating the Results

So to get dependable correct results. We should just query the original data source?

Chunking Strategies as a Solution

Chunking Strategies

Chunking Strategies as a Solution

Chunking Strategies

What about Aggregation Functions?

New Query can be registered into the RSP Agent with the chunked window semantics.

The query registered can have the same aggregation function as the parent query.

What about Aggregation Functions?

SPARQL 1.1 Functions Derivable Aggregation
COUNT, SUM AVG
AVG, COUNT SUM
MIN, MAX RANGE (compute seperately, not SPARQL 1.1)

Including Extension Functions?

SPARQL Functions Derivable Aggregation
COUNT, SUM AVG
AVG, COUNT SUM
MIN, MAX RANGE (compute seperately)
SUM, COUNT, SUM OF SQUARES, SQRT VAR, STDEV
COUNT(DISTINCT), COUNT(*) COUNT_DUPLICATES

Streaming Query Hive

Streaming Query Hive is a Microservice which combines Multiple Streaming Queries to solve for a single parent query.
It is currently in development.

Streaming Query Hive

Questions?

Link to Repository