In the years since we developed the Infovore framework that created :BaseKB we've built on the
capability to convert Freebase data to build something that converts any data to RDF and face down the four challenges of Big Data:
Contact Ontology2 to get your data under control.
Freebase is a database that contains upwards of a billion facts about 40 million topics; 4 million topics from Wikipedia are there, as well as many more people, places and creative works. Freebase was based on a proprietary database named graphd, and could be queried with the proprietary MQL language.
Our research proved that the graph model used in Freebase could be mapped to RDF in a straightforward way and then queried with the SPARQL query language, at least ten databases are known that support data sets of this scale, so this is a competitive market where products are improving rapidly.
Google is in the process of shutting down the Freebase service, which went read only in March 31, 2015. We captured the last RDF data dump from Freebase, published April 19, 2015 and used the Infovore framework to process it into a quality RDF knowledge based that is compatible with standard tools and gives correct answers when the complete data set is installed in a SPARQL database. (This is not true of the dump published by Freebase, which contains hundreds of millions of superfluous, repetitive, ill-formed, uninteresting, incorrect and and occasionally harmful facts.) Anything you can do in MQL, and more, can be done by writing SPARQL queries.
Users of the MQL API provided by Freebase will be shutting down its API in the coming months, MQL users need to find a replacement on short notice. :BaseKB Gold Ultimate, together with an industry standard SPARQL database is the fastest way to satisfy this need and is the only complete and correct rendition of Freebase available to the public. Thanks to this project, the Freebase database permanently outlives the original service.
Until now, working with billion triple data sets involved working with underpowered and unreliable SPARQL endpoints, or wrangling special hardware and software and waiting hours for data to load and possibly going through that cycle many times due to compatibility problems. Today, you can be writing SPARQL queries in minutes with the cloud edition of :BaseKB Gold Ultimate.
Our automatic packaging system delivers 1.2 billion triples, containing :BaseKB and :SubjectiveEye, on top of OpenLink Virtuoso Open Source Edition 7.2.1 running on a powerful r3.2xlarge instance in the AWS cloud in most popular availability zones. One-click setup and a low hourly rate makes the cloud edition a great fit for evaluation and product development. Join our mailing list for community support.
A complete dump of :BaseKB Gold Ultimate is stored in the us-east-1 zone of Amazon Web Services at the following location:
This directory contains 210 files that are roughly 80 MB in size and add up to 16.84 GB. These files consist of N-triples facts, compressed with gzip, compatible with standard RDF tools. These can be loaded into any sufficiently capable triple store and queried with SPARQL as well as be used with scalable batch tools such as Infovore.
You can download this data on a requester-pays basis; this requires an Amazon Web Services account. The download is free in the us-east-1 zone, to other locations you will pay AWS data transfer costs which are always below 10 cents/GB. Join our mailing list for community support.