This document explains the transformation that converts Freebase into :BaseKB. If you knew everything about RDF, SPARQL and Freebase, this would be sufficient for you use :BaseKB.
As our knowledge isn't perfect, this is just the first document of many describing the use of :BaseKB.
The following RDF Namespaces are used in this document. These are quite different from the Freebase namespaces that will be discussed later.
@prefix basekb: <http://rdf.basekb.com/ns/> @prefix public: <http://rdf.basekb.com/public/> @prefix fbase: <http://rdf.freebase.com/ns/> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> @prefix dbpedia: <http://dbpedia.org/resource/>
The subject of a statement (triple) in :BaseKB is always a unique mid identifier (i.e. basekb:m.0838f)
This principle holds true for the data we publish, although you're free to violate it through
inference (owl:inverseOf) or by adding more data.
Predicates cointed by
Freebase (say basekb:m.02hqc86, known as /book/book/genre) are also
represented with mid identifiers. We occasionally use predicates from well-known
namespaces such as rdf:type and rdfs:label and also
have coined a few predicates of our own in the public: namespace.
When we ingest a statement from Freebase for which the object is a Freebase concept, we will also express the concept using the unique mid. However, there are three circumstances under which we'll use an identifier other than a mid in the object of a statment:
public:knownAs and the object is a human-memorable identifier like
basekb:en.water.
/type/uri;
in this case the object is almost an ordinary (not Linked Data) URI outside of :BaseKB and Freebase.
dbpedia:
:BaseKB uses a different mechanism than the official Freebase RDFization to represent keys and namespaces.
Freebase contains a mechanism for representing keys and namespaces. When Freebase expresses these in RDF, Freebase writes
fbase:/en/water fbase:type.object.key [
fbase:type.key.namespace fbase:wikipedia.en ;
fbase:type.value.value "Oxygen_dihydride"
] .
This creates a blank node and generates a total of three triples.
We decided to use a more efficient representation, where we use the namespace itself
as a predicate to state that "the ?object is a key for the ?subject
in the ?predicate namespace. BaseKB represents the memorable
names as such:
basekb:m.0gt9 public:knownAs basekb:wikipedia.en .
basekb:m.0838f public:knownAs basekb:en.water .
and uses the mid identifiers to express a key with a single statement:
basekb:m.0838f basekb:m.0gt9 "Oxygen_dihydride" .
You could fetch all of the Wikipedia names for water with this SPARQL query:
SELECT ?key {
?subject public:knownAs basekb:en.water .
?ns public:knownAs basekb:wikipedia.en .
?subject ?ns ?key .
}
With millions of concept in :BaseKB you will sometimes find that important concepts are hard to find in a forest of similar but similar things.
:BaseKB infers a predicate we call Gravity, that indicates the importance of a concept in our collective human awareness. Ideally, gravity would measure the frequency with which a concept occurs in general discourse, but practically, we compute gravity with network algorithms.
Because importance is a subjective thing, we expect to improve our heuristic for gravity in the future. The one property of gravity that will not change is that in a given release of :BaseKB, things that we think are important have a greater gravity score than things we think are unimportant.
BaseKB identifiers are derived from Freebase identifiers.
There are two types of Freebase identifiers: machine identifiers
like /m/0cfv1 and other identifiers like /en/water. Machine identifiers (or mids)
are unique identifiers for concepts; other identifiers have names that
are convenient to remember but are not unique.
It is easy to get correct results with Freebase data if you always use the mid as a unique name for a concept. As such, :BaseKB uses the mid identifier to derive a unique name for all concepts and always uses this unique identifier in statements.
Freebase maps its identifiers to RDF by replacing the /
character with a . character and appending http://rdf.freebase.com/ns to
the beginning. :BaseKB does the same, except that we append http://rdf.basekb.com/ns
to the beginning -- this gives us the choice of publishing :BaseKB as Linked Data.
Correct operation of :BaseKB require that the unique name assumption holds. Every object in :BaseKB has a unique, but unreadable, identifier.
The MQL language provided by Freebase solves this problem by resolving names using data from the Freebase/:BaseKB graph. We solve this problem with basekb-tools, a package that implements the same name resolution behavior as MQL for SPARQL 1.1.
Although this grounding takes effort, a lack of grounding has been a mortal problem with previous RDFizations of Freebase -- by grounding all Freebase identifiers to mids, we find that queries and inference give predictable and correct answers.
If you're not using basekb-tools, you can still resolve names using a mechanism used in the EA 1 release. Identifiers can be looked up like so
SELECT ?mid {
?mid public:knownAs basekb:book.book.genre
}
:BaseKB expresses :knownAs only for schema objects and names in the /en/ namespace; more
complete name resolution is attained by basekb-tools.
In the future we expect to
make :knownAs an owl:FunctionalProperty, that is, to publish
at most a single readable name for any object. This means that fewer names will resolve,
so we advise you to use basekb-tools if at all possible.
:BaseKB handles a few predicates differently from others
fbase:type.object.type is expressed as rdf:type
fbase:type.type.instance is ignored
fbase:type.permission.controls is reversed to basekb:base.basekb.thing.controlled_by and
is ignored when the subject is fbase:boot.all_permission.
fbase:dataworld.gardening_hint.replaced_by is reversed to basekb:basekb.thing.replaces.
:BaseKB also independently infers a few useful properties such as rdfs:label and
public:gravity.
Freebase represents human-readable labels with predicates such as basekb:type.object.name
and basekb:common.topic.alias. :BaseKB passes these predicates through exactly as
it does other predicates.
We're unsatisfied by the way /type/object/name is used in Freebase,
specifically, in that it gives the same human-readable name (ex.
"Manchester") for similar but different things such as "Manchester, England"
and "Manchester, New Hampshire." Other objects lack a label entirely.
Although we make no guarantee that rdfs:label will be unique,
all concepts in :BaseKB have a label.
:BaseKB uses heuristics to infer labels that are less ambiguous than
/type/object/name and that we think are more usable in
user interfaces.
You'll find at most one rdfs:label for a concept in a given
language. Currently, :BaseKB always generates an english-language label
for all subjects, but we may remove these labels for non-concept concepts
in the future.
:BaseKB extracts the text descriptions of Freebase concepts and connects them
to the the concepts with the rdfs:comment predicate.
:BaseKB is a product of Ontology2. See our privacy policy and terms of use. :BaseKB contains data from Freebase.