This page describes a version of :BaseKB that is no longer supported. Currently supported products are described here.

About :BaseKB

This document explains the transformation that converts Freebase into :BaseKB. If you knew everything about RDF, SPARQL and Freebase, this would be sufficient for you use :BaseKB.

As our knowledge isn't perfect, this is just the first document of many describing the use of :BaseKB.

RDF Namespaces

The following RDF Namespaces are used in this document. These are quite different from the Freebase namespaces that will be discussed later.

@prefix basekb: <http://rdf.basekb.com/ns/>
@prefix public: <http://rdf.basekb.com/public/>
@prefix fbase: <http://rdf.freebase.com/ns/>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
@prefix dbpedia: <http://dbpedia.org/resource/>

Statements in :BaseKB

The subject of a statement (triple) in :BaseKB is always a unique mid identifier (i.e. basekb:m.0838f)

This principle holds true for the data we publish, although you're free to violate it through inference (owl:inverseOf) or by adding more data.

Predicates cointed by Freebase (say basekb:m.02hqc86, known as /book/book/genre) are also represented with mid identifiers. We occasionally use predicates from well-known namespaces such as rdf:type and rdfs:label and also have coined a few predicates of our own in the public: namespace.

When we ingest a statement from Freebase for which the object is a Freebase concept, we will also express the concept using the unique mid. However, there are three circumstances under which we'll use an identifier other than a mid in the object of a statment:

  1. When the predicate is public:knownAs and the object is a human-memorable identifier like basekb:en.water.
  2. When the type of the object in Freebase is /type/uri; in this case the object is almost an ordinary (not Linked Data) URI outside of :BaseKB and Freebase.
  3. When :BaseKB has inferred a connection between a Freebase object and a major Linked Data namespace such as dbpedia:

Freebase Namespaces in :BaseKB

:BaseKB uses a different mechanism than the official Freebase RDFization to represent keys and namespaces.

Freebase contains a mechanism for representing keys and namespaces. When Freebase expresses these in RDF, Freebase writes

    fbase:/en/water fbase:type.object.key [
        fbase:type.key.namespace fbase:wikipedia.en ;
        fbase:type.value.value "Oxygen_dihydride"
    ] . 

This creates a blank node and generates a total of three triples.

We decided to use a more efficient representation, where we use the namespace itself as a predicate to state that "the ?object is a key for the ?subject in the ?predicate namespace. BaseKB represents the memorable names as such:

    basekb:m.0gt9  public:knownAs basekb:wikipedia.en .
    basekb:m.0838f public:knownAs basekb:en.water .

and uses the mid identifiers to express a key with a single statement:

    basekb:m.0838f basekb:m.0gt9 "Oxygen_dihydride" .

You could fetch all of the Wikipedia names for water with this SPARQL query:

    SELECT ?key {
        ?subject public:knownAs basekb:en.water .
        ?ns public:knownAs basekb:wikipedia.en .
        ?subject ?ns ?key .
    }

public:gravity

With millions of concept in :BaseKB you will sometimes find that important concepts are hard to find in a forest of similar but similar things.

:BaseKB infers a predicate we call Gravity, that indicates the importance of a concept in our collective human awareness. Ideally, gravity would measure the frequency with which a concept occurs in general discourse, but practically, we compute gravity with network algorithms.

Because importance is a subjective thing, we expect to improve our heuristic for gravity in the future. The one property of gravity that will not change is that in a given release of :BaseKB, things that we think are important have a greater gravity score than things we think are unimportant.

Freebase and :BaseKB Identifiers

BaseKB identifiers are derived from Freebase identifiers.

There are two types of Freebase identifiers: machine identifiers like /m/0cfv1 and other identifiers like /en/water. Machine identifiers (or mids) are unique identifiers for concepts; other identifiers have names that are convenient to remember but are not unique.

It is easy to get correct results with Freebase data if you always use the mid as a unique name for a concept. As such, :BaseKB uses the mid identifier to derive a unique name for all concepts and always uses this unique identifier in statements.

Freebase maps its identifiers to RDF by replacing the / character with a . character and appending http://rdf.freebase.com/ns to the beginning. :BaseKB does the same, except that we append http://rdf.basekb.com/ns to the beginning -- this gives us the choice of publishing :BaseKB as Linked Data.

Grounding with basekb-tools

basekb-tools makes queries readable

Correct operation of :BaseKB require that the unique name assumption holds. Every object in :BaseKB has a unique, but unreadable, identifier.

The MQL language provided by Freebase solves this problem by resolving names using data from the Freebase/:BaseKB graph. We solve this problem with basekb-tools, a package that implements the same name resolution behavior as MQL for SPARQL 1.1.

Although this grounding takes effort, a lack of grounding has been a mortal problem with previous RDFizations of Freebase -- by grounding all Freebase identifiers to mids, we find that queries and inference give predictable and correct answers.

Grounding with :knownAs

public:knownAs maps readable identifiers to unique identifiers

If you're not using basekb-tools, you can still resolve names using a mechanism used in the EA 1 release. Identifiers can be looked up like so

    SELECT ?mid {
        ?mid public:knownAs basekb:book.book.genre
    }

:BaseKB expresses :knownAs only for schema objects and names in the /en/ namespace; more complete name resolution is attained by basekb-tools.

In the future we expect to make :knownAs an owl:FunctionalProperty, that is, to publish at most a single readable name for any object. This means that fewer names will resolve, so we advise you to use basekb-tools if at all possible.

Special predicates

:BaseKB handles a few predicates differently from others

  1. fbase:type.object.type is expressed as rdf:type
  2. fbase:type.type.instance is ignored
  3. fbase:type.permission.controls is reversed to basekb:base.basekb.thing.controlled_by and is ignored when the subject is fbase:boot.all_permission.
  4. fbase:dataworld.gardening_hint.replaced_by is reversed to basekb:basekb.thing.replaces.

:BaseKB also independently infers a few useful properties such as rdfs:label and public:gravity.

rdfs:label

Freebase represents human-readable labels with predicates such as basekb:type.object.name and basekb:common.topic.alias. :BaseKB passes these predicates through exactly as it does other predicates.

We're unsatisfied by the way /type/object/name is used in Freebase, specifically, in that it gives the same human-readable name (ex. "Manchester") for similar but different things such as "Manchester, England" and "Manchester, New Hampshire." Other objects lack a label entirely.

Although we make no guarantee that rdfs:label will be unique, all concepts in :BaseKB have a label. :BaseKB uses heuristics to infer labels that are less ambiguous than /type/object/name and that we think are more usable in user interfaces.

You'll find at most one rdfs:label for a concept in a given language. Currently, :BaseKB always generates an english-language label for all subjects, but we may remove these labels for non-concept concepts in the future.

rdfs:comment

:BaseKB extracts the text descriptions of Freebase concepts and connects them to the the concepts with the rdfs:comment predicate.

:BaseKB is a product of Ontology2. See our privacy policy and terms of use. :BaseKB contains data from Freebase.