SPARQL Builder Metadata Specification (Version Sep. 2015)

Editor

The collaborative SPARQL Builder Development group, Metadata specification team.

    •  Norio KOBAYASHI (ACCC, RIKEN)
    • Atsuko YAMAGUCHI (DBCLS, ROIS)
    • Kouji KOZAKI (Univ. Osaka)
    • Kai LENZ (ACCC, RIKEN)
    • Hogyan WU (DBCLS, ROIS)
    • Yasunori YAMAMOTO (DBCLS, ROIS)

Overview

SPARQL Builder Metadata Specification defines the data schema (metadata) for an RDF that describes RDF data structure in a SPARQL endpoint.  The RDF metadata file is generated by a software module called “crawler” that extracts such metadata from SPARQL endpoints in advance, and used to construct a class graph including classes, properties, domains and ranges for properties by executing SPARQL queries with a low load even for the SPARQL endpoint having large data.

The metadata is also very useful to write a SPARQL query since the metadata briefly describes the corresponding RDF graph structure of SPRQL endpoint. From this point, although the metadata should be written based on standardised specifications, there is not such standardized vocabulary which supports characteristic data of SPARQL Builder including relationships between classes. Therefore, we generated a specification for the metadata description by adding original vocabulary having namespace “sbm:” to the existing specifications of SPARQL 1.1 Service description (http://www.w3.org/TR/sparql11-service-description/) and VoID (http://www.w3.org/TR/void/).

This version is an extension of the previous version (May 2014) to support clawing each graph published by a SPARQL endpoint.

Public Domain Mark
This work (SPARQL Builder Metadata Specification, by SPARQL Builder Project Team: Norio KOBAYASHI, Atsuko YAMAGUCHI, Kouji KOZAKI, Kai LENZ, Hogyan WU, Yasunori YAMAMOTO as of April 2016) is free of known copyright restrictions.

Prefixes

The prefixes used in this document are the following.

  • rdf:    http://www.w3.org/1999/02/22-rdf-syntax-ns#
  • rdfs:  http://www.w3.org/2000/01/rdf-schema#
  • sd:     http://www.w3.org/ns/sparql-service-description#
  • void:  http://rdfs.org/ns/void#
  • sbm:  http://sparqlbuilder.org/2015/09/rdf-metadata-schema#

 

Metadata Schema

The metadata schema is shown in the following figure.

SBM_201509

Using SPARQL 1.1 Service description, a dataset included in a SPARQL endpoint is described as sb:Dataset. In order to describe the detailed RDF dataset data structure, we employ property partition and class partition defined in void and introduce statistical indicators as categories for property, class and endpoint as our extension.

Statistical indicators

Property Category

This category is defined for each user property used on a SPARQL endpoint except the properties defined in RDF Schema 1.1 such as rdf:type and rdfs:subclassOf. We say class decidable in triple when the triple whose classes of subjects and objects are explicitly defined using rdfs:domain and rdfs:range, and/or can be extracted by classes of subject and object instances. The property category indicates a comprehensiveness of class decidable triples for each property. An instinctive semantics of property category is as follows:

  • Property category 1 (Complete): for all triples are class decidable.
  • Property category 2 (Complete by inference): for all triples are class decidable but the domain and range classes of the property are not explicitly declared.
  • Property category 3 (Partial): some but not all triples are class decidable.
  • Property category 4 (none): no triples are class decidable.

Class Category

This category is defined for each dataset on a SPARQL endpoint. We define junk class as a class that is not used to declare a domain or range class nor class of instance as subject or object of triple having a user property.

  • Class category 1 (Complete): no junk classes exist.
  • Class category 2 (Partial): some but not all classes are junk classes.
  • Class category 3 (none): all classes are junk classes.

Endpoint category

This category is about coverage of triples and classes that are not junk on a SPARQL endpoint

  • Endpoint category 1 (Complete): the following tow conditions are satisfied: (1) every property category of user property are property category 1 or 2. (2) the class category is 1.
  • Endpoint category 3 (none): the class category is 3.
  • Endpoint category 2 (partial): the endpoint category is neither 1 nor 3.

Inferred property structure as class relationships

One of the major functionality of the crawler is inference of property domain and/or range classes by extracting subject and/or object classes of triples having the property. Such subject-object classes relationship is here called class relationship. The class relationship is declared as a part of property partition with subject and classes, object datatype, numbers of triples, distinct subjects and distinct objects of the class relationship.

 

SPARQL Builder Matadata (sbm) vocabulary

Prefixes

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema> .
@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix sbm: <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> .

Classes

<http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> a owl:Ontology ;
          dc:title “The RDF Metadata Schema vocabulary for SPARQL Builder” .

sbm:ClassRelation a rdfs:Class ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “ClassRelation” ;
        rdfs:comment “A Relationship between subject and object classes .” .

sbm:CrawlLog a rdfs:Class ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “CrawlLog” ;
        rdfs:comment “A log of crawling on the dataset.” .

Properties

sbm:endpointCategory a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “endpolongCategory” ;
        rdfs:comment “Endpolong category of the dataset.” ;
        rdfs:domain sd:Dataset ;
        rdfs:range xsd:long .

sbm:classCategory a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “classCategory” ;
        rdfs:comment “Class category of the dataset.” ;
        rdfs:domain sd:Dataset ;
        rdfs:range xsd:long .

sbm:propertyCategory a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “propertyCategory” ;
        rdfs:comment “Property category of the dataset.” ;
        rdfs:domain sd:Dataset ;
        rdfs:range xsd:long .

sbm:searchableTriples a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “searchableTriples” ;
        rdfs:comment “Number of searchable triples of the dataset.” ;
        rdfs:domain sd:Dataset ;
        rdfs:range xsd:long .

sbm:classRelation a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “classRelation” ;
        rdfs:comment “Describe an instance of ClassRelaton.” ;
        rdfs:domain sd:Dataset ;
        rdfs:range sbm:ClassRelation .

sbm:datatypes a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “datatypes” ;
        rdfs:comment “Number of datatypes of the dataset.” ;
        rdfs:domain sd:Dataset ;
        rdfs:range xsd:long .

sbm:subjcetClass a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “subjectClass” ;
        rdfs:comment “The class of a subject instance of ClassRelation.” ;
        rdfs:domain sbm:ClassRelation ;
        rdfs:range rdfs:Class .

sbm:objcetClass a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “objectClass” ;
        rdfs:comment “The class of an object instance of ClassRelation.” ;
        rdfs:domain sbm:ClassRelation ;
        rdfs:range rdfs:Class .

sbm:objcetDatatype a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “objectDatatype” ;
        rdfs:comment “The datatype of an object literal of ClassRelation.” ;
        rdfs:domain sbm:ClassRelation ;
        rdfs:range rdfs:Datatype .

sbm:sample a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “sample” ;
        rdfs:comment “Description of sample triples of the classRelation.” ;
        rdfs:domain sbm:ClassRelation ;
        rdfs:range xsd:string .

sbm:subjcetClasses a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “subjectClasses” ;
        rdfs:comment “Number of subject classes of the dataset.” ;
        rdfs:domain sd:Dataset ;
        rdfs:range xsd:long .

sbm:objcetClasses a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “objectClasses” ;
        rdfs:comment “Number of object classes of the dataset.” ;
        rdfs:domain sd:Dataset ;
        rdfs:range xsd:long .

sbm:objcetDatatypes a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “objectDatatypes” ;
        rdfs:comment “Number of object datatypes of the dataset.” ;
        rdfs:domain sd:Dataset ;
        rdfs:range xsd:long .

sbm:propertyCategorySubset a rdf:Property ;

        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “propertyCategorySubset” ;
        rdfs:comment “Describe sub-datasets of properties associated with given property category.” ;
        rdfs:domain sd:Dataset ;
        rdfs:range sd:Dataset .

sbm:endpolongAccesses a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “endpolongAccesses” ;
        rdfs:comment “Number of access during crawling over the dataset of the endpolong.” ;
        rdfs:domain sd:Dataset ;
        rdfs:range xsd:long .

sbm:crawlLog a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “crawlLog” ;
        rdfs:comment “Describe an instance of CrawlLog.” ;
        rdfs:domain sd:Dataset ;
        rdfs:range sbm:CrawlLog .

sbm:crawlStartTime a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “crawlStartTime” ;
        rdfs:comment “The datetime when the crawling started.” ;
        rdfs:domain sbm:CrawlLog ;
        rdfs:range xsd:datetime .

sbm:crawlEndTime a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “crawlEndTime” ;
        rdfs:comment “The datetime when the crawling finished.” ;
        rdfs:domain sbm:CrawlLog ;
        rdfs:range xsd:datetime .

sbm:crawlEndTime a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “crawlEndTime” ;
        rdfs:comment “The datetime when the crawling finished.” ;
        rdfs:domain sbm:CrawlLog ;
        rdfs:range xsd:datetime .

sbm:metadataGraphURI a rdf:Property ;
        rdfs:isDefinedBy <http://www.sparqlbuilder.org/2015/09/rdf-metadata-schema#> ;
        rdfs:label “metadataGraphURI” ;
        rdfs:comment “Specify a graphURI in the SPARQL Endpoint publishing SBM metadata. The graphURI is defined for each clawled SPARQL endpoint and its graph.” ;
        rdfs:domain void:Dataset, sd:Dataset ;
        rdfs:range xsd:anyURI .