The technology behind :BaseKB

In the years since we developed the Infovore framework that created :BaseKB we've built on the
capability to convert Freebase data to build something that converts any data to RDF and face down the four challenges of Big Data:

Volume
HDFS and MapReduce made it possible to store and process data on the scale of petabytes.
Velocity
New Frameworks such as Apache Storm Spark and Flink can have 100x the raw speed of the original MapReduce and can process and react to events in real time..
Variety
With thousands of machine-readable data formats together with full-text documents in formats like PDF, our technology understands and reonciles the different syntax and semantics used in structured and unstructured document.
Veracity
What you do with your data doesn't matter if your data is wrong. Every stage of our mapping process can be manually overrided to get 100% accuracy for critical cases. You can construct test cases that ensure the data has certain qualities, starting with our built-in library of validators.

Contact Ontology2 to get your data under control.

:BaseKB Gold Ultimate

What is :BaseKB Gold Ultimate?

Freebase is a database that contains upwards of a billion facts about 40 million topics; 4 million topics from Wikipedia are there, as well as many more people, places and creative works. Freebase was based on a proprietary database named graphd, and could be queried with the proprietary MQL language.

Our research proved that the graph model used in Freebase could be mapped to RDF in a straightforward way and then queried with the SPARQL query language, at least ten databases are known that support data sets of this scale, so this is a competitive market where products are improving rapidly.

Google is in the process of shutting down the Freebase service, which went read only in March 31, 2015. We captured the last RDF data dump from Freebase, published April 19, 2015 and used the Infovore framework to process it into a quality RDF knowledge based that is compatible with standard tools and gives correct answers when the complete data set is installed in a SPARQL database. (This is not true of the dump published by Freebase, which contains hundreds of millions of superfluous, repetitive, ill-formed, uninteresting, incorrect and and occasionally harmful facts.) Anything you can do in MQL, and more, can be done by writing SPARQL queries.

Users of the MQL API provided by Freebase will be shutting down its API in the coming months, MQL users need to find a replacement on short notice. :BaseKB Gold Ultimate, together with an industry standard SPARQL database is the fastest way to satisfy this need and is the only complete and correct rendition of Freebase available to the public. Thanks to this project, the Freebase database permanently outlives the original service.

Live on Amazon Web Services

Until now,  working with billion triple data sets involved working with underpowered and unreliable SPARQL endpoints,  or wrangling special hardware and software and waiting hours for data to load and possibly going through that cycle many times due to compatibility problems. Today, you can be writing SPARQL queries in minutes with the cloud edition of :BaseKB Gold Ultimate.

Our automatic packaging system delivers 1.2 billion triples, containing :BaseKB and :SubjectiveEye, on top of OpenLink Virtuoso Open Source Edition 7.2.1 running on a powerful r3.2xlarge instance in the AWS cloud in most popular availability zones. One-click setup and a low hourly rate makes the cloud edition a great fit for evaluation and product development. Join our mailing list for community support.

Download from Amazon S3

A complete dump of :BaseKB Gold Ultimate is stored in the us-east-1 zone of Amazon Web Services at the following location:

s3://basekb-now/2015-04-19-00-00/

This directory contains 210 files that are roughly 80 MB in size and add up to 16.84 GB. These files consist of N-triples facts, compressed with gzip, compatible with standard RDF tools. These can be loaded into any sufficiently capable triple store and queried with SPARQL as well as be used with scalable batch tools such as Infovore.

You can download this data on a requester-pays basis; this requires an Amazon Web Services account. The download is free in the us-east-1 zone, to other locations you will pay AWS data transfer costs which are always below 10 cents/GB. Join our mailing list for community support.

 

 

:BaseKB is a product of Ontology2. See our privacy policy and terms of use. :BaseKB contains data from Freebase.