Using :BaseKB

:BaseKB is RDF data that can be used in many ways.

A popular way to use it is to install :BaseKB into a triple store that supports SPARQL. SPARQL lets you write queries in a language similiar to SQL. This kind of RDF database can be used in a manner similar to an RDBMS (ex. Mysql) to build web and other applications.

The physical form of :BaseKB is also suitable for parallel processing and use in a streaming mode to conserve memory.

basekb-tools contains a system that resolves names in queries as does Freebase; this helps write queries that are both correct and readable. basekb-tools also contains a test suite that confirms you have a correctly assembled stack.

Installing :BaseKB

OpenLink Virtuoso has been used in the development of :BaseKB so we can give you specific instructions to install :BaseKB into it. We've also received reports of successful use with OWLIM-SE, AllegroGraph and BigData.

In general, you should use the bulk loader that comes with your triple store, tuning it for maximum performance. If you have success with a given product, Contact us so we can add this our our documentation.

Installing basekb-tools

basekb-tools makes it easy to write queries and comes with a test suite that confirms the correct operation of :BaseKB with your tools.

:BaseKB tools depends on the Java, Jena Framework and Maven, which will automatically download dependencies and build and test the system.

Your first SPARQL query

Here's a sample SPARQL query to get you started; the following query produces a list of the 25 "most important" airports in the world.

select ?code ?name ?item {
   graph public:baseKB {    
      ?item a basekb:aviation.airport .
      ?item rdfs:label ?name .
      ?item public:gravity ?gravity .
      ?item basekb:authority.iata ?code . 
      filter(lang(?name)='en')
    }
} order by desc(?gravity) limit 25

Note a few features here:

  • basekb-tools rewrites readable identifiers such as basekb:aviation.airport to true identifiers as basekb:m.01xpjyz.
  • The predicate public:gravity points to a numeric score that measures the subjective importance of concepts
  • The subject of the basekb:authority.iata predicate is the name of the airport in the /authority/iata namespace.

The :BaseKB fundamentals documentation explains the construction of :BaseKB and how :BaseKB is different from the RDF data published by Freebase.

License and Editions

All editions of :BaseKB are freely available under a CC-BY/3.0 license. We cannot guarantee that :BaseKB will be suitable for any specific purpose; by using :BaseKB, you hold Ontology2 harmless for any damages or liability that may be entailed.

:BaseKB contains data from Freebase. Users of :BaseKB must give attribution to both :BaseKB and Freebase.

Hardware Requirements

To install :BaseKB, you need a powerful computer running a 64 bit operating system. You'll have a much easier time installing and using :BaseKB if you have a sufficiently powerful computer.

In particular, if you wish to :BaseKB into a triple store, you'd have best results on a computer with 24G, or preferably, 32G of RAM. Although it's difficult to find a laptop with more than 8GB of RAM, it is inexpensive to add a large amount of RAM to a recent desktop computer. Many people will find that a desktop computer will make a suitable workstation if they contact a reputable memory vendor and max out their RAM.

If you're working with Amazon Web Services, the minimum configuration we reccomend is the m2.2xlarge instance with 34.2G of RAM and 4 CPU cores.

Physical Form

:BaseKB is packaged as a tar archive that contains 1024 gzip-compressed shards, each a N-Triples file with names like baseKB/triples0742.nt.gz

All triples that share a common subject are grouped together, so it's possible to do processing on partial pieces of :BaseKB. For instance, for purposes of testing, it's possible to load just a single shared into a triple store.

:BaseKB Lite is about 2.7 GB after gzip compression and contains aproximately 150 million RDF triples.

A Word About Inference

Although the Freebase Schema is superficially similar to an RDFS Schema, the meaning of terms is somewhat different. Although a naive translation to RDF and OWL would be useful documentation, RDFS inference over :BaseKB would draw incorrect conclusions.

As it turns out, Freebase materializes most of the correct RDFS concusions that one could derive from it, so it can be used profitably without inference. We reccomend that you should get :BaseKB working for you without inference before you enable inference.

We think it would be particularly effective to use RDFS rules to connect :BaseKB vocabulary with standard vocabularies. For instance, RDFS statements like

basekb:people.person rdfs:subClassOf foaf:Person .
basekb:book.written_work.author rdfs:subClassOf foaf:maker .

can bring :BaseKB into alignment with standard vocabularies.

Inference over :BaseKB is a fascinating topic for research. We think that SPIN and RIF should be just as interesting as OWL and RDF.

:BaseKB is a product of Ontology2. See our privacy policy and terms of use. :BaseKB contains data from Freebase.