Google Published 25 Million Free Datasets

Here’s what you need to know about the largest data repository in the world

Google recently released a dataset search, a free tool for searching 25 million publicly available datasets.

Search Tool Features:

 Search tool features are the following:

  • Filters to limit results based on their license (free or paid),
  • Format (CSV, images, etc. ),
  • Update time.
  • Descriptions of the dataset’s contents as well as
  • Author citations.

Google’s dataset aggregation methodology differs from other dataset repositories like Amazon’s open data registry. Unlike other repositories that curate and host the datasets themselves, Google does not curate or provide direct access to the 25 million datasets directly.

Standards of the schema.org

Instead, Google relies on the dataset publishers to use the open standards of schema.org to describe their dataset’s metadata. Google then indexes and makes that metadata searchable across publishers.

What Is Schema.org?

Schema.org may be a cooperative, community activity with a mission to make, maintain, and promote schemas for structured information on the net, on web content, in email messages, and beyond.

Schema.org
Schema.org

Schema.org vocabulary is used with many various encodings, together with RDFa, Microdata, and JSON-LD. These vocabularies cowl entities, relationships between entities and actions, and may simply be extended through a well-documented extension model. Over ten million sites use Schema.org to markup their web content and email messages. several applications from Google, Microsoft, Pinterest, Yandex, et al. already use these vocabularies to power wealthy, protrusible experiences.

Since publishers are still required to host the datasets themselves, for-profit publishers that conform to schema.org standards will also have their datasets indexed by Google. In my anecdotal experience, I found about half of the datasets in the search results were from for-profit aggregators, with an even higher percentage when searching for market-related datasets.

Other popular dataset publishers on the platform include government agencies and research institutions. Google claims that US government agencies alone have published over 2 million datasets.

According to Google, most of the datasets are related to “geosciences, biology, and agriculture.”

To publish your own datasets, you can simply use the open-standards of schema.org. The number of publicly available datasets is likely to continue growing as more publishers conform to the standard.

NOTE: At this time, Google does not provide an API for searching or downloading the free datasets.

More information about the release is available on Google’s blog.

 

Subscribe to our newsletter
Sign up here to get the latest news, updates and special offers delivered directly to your inbox.
You can unsubscribe at any time

Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More