mobile-logo
Google Dataset API

Improve Your Research With Google Dataset API

There are tens of millions of datasets on the web, with content ranging from university studies data and government records, to results of scientific experiments and business reports. Google Dataset Search engine is a valuable source of datasets from all over the Web. Datasets are mostly used for research projects, training machine learning algorithms, and data visualization, but how you can easily scrape vast volumes of data for such purposes?

In this article, we will show you a solution for automatic dataset monitoring and pulling data from the Google Dataset Search. Besides, we will describe a few use cases for integrating API solutions into your research.

What is Google Dataset API?

The name of the Google Dataset Search speaks for itself – it is a search engine for datasets. It allows users to search for information in thousands of repositories across the Internet using simple keywords.

So how all the data is collected? In brief, Dataset Search relies on Google Web crawling to find pages that contain dataset metadata and to extract the corresponding triples. To share metadata dataset providers include schema.org and similar open standards markups to their Web pages, and the number of them is increasing every day.

When dealing with substantial datasets, automating data extraction will save you an impressive amount of time and effort. One of the most convenient ways to scrape vast data is by using an API.

DataForSEO offers a powerful Google Dataset API that allows accessing Google’s data and allows getting research insights and developing on top of it.

Now let’s talk about Google Dataset API endpoints and how to use them.

Get datasets by keyword for research

Using the Dataset Search endpoint you get the top 20 results of the Google Dataset Search engine. To make a request it is required to indicate a keyword. However, you can advance the search by adding filters.

Here are some useful filters that you can add to your POST request:

  • last_updated
  • file_formats
  • usage_rights
  • is_free
  • topics

You can find possible values of the filters above and more about other parameters in our documentation.

Example request:

[
  {
    "keyword": "water quality",
    "last_updated": "1m",
    "file_formats": [
      "archive",
      "image"
    ],
    "usage_rights": "noncommercial",
    "is_free": true,
    "topics": [
      "natural_sciences",
      "geo"
    ]
  }
]

{
  "version": "0.1.20221214",
  "status_code": 20000,
  "status_message": "Ok.",
  "time": "5.0885 sec.",
  "cost": 0.002,
  "tasks_count": 1,
  "tasks_error": 0,
  "tasks": [
    {
      "id": "01161535-4426-0139-0000-eca053a01ae6",
      "status_code": 20000,
      "status_message": "Ok.",
      "time": "5.0310 sec.",
      "cost": 0.002,
      "result_count": 1,
      "path": [
        "v3",
        "serp",
        "google",
        "dataset_search",
        "live",
        "advanced"
      ],
      "data": {
        "api": "serp",
        "function": "live",
        "se": "google",
        "se_type": "dataset_search",
        "keyword": "water quality",
        "last_updated": "1m",
        "file_formats": [
          "archive",
          "image"
        ],
        "usage_rights": "noncommercial",
        "is_free": true,
        "topics": [
          "natural_sciences",
          "geo"
        ],
        "device": "desktop",
        "os": "windows"
      },
      "result": [
        {
          "keyword": "water quality",
          "se_domain": "datasetsearch.research.google.com",
          "language_code": "en",
          "check_url": "https://datasetsearch.research.google.com/search?query=water%20quality&hl=en&filters=WyJbXCJ1cGRhdGVkX2RhdGVcIixbXCIxbVwiXV0iLCJbXCJmaWxlX2Zvcm1hdF9jbGFzc1wiLFtcIjdcIixcIjVcIl1dIiwiW1wibGljZW5zZV9jbGFzc1wiLFtcIm5vbmNvbW1lcmNpYWxcIl1dIiwiW1wiaXNfYWNjZXNzaWJsZV9mb3JfZnJlZVwiLFtdXSIsIltcImZpZWxkX29mX3N0dWR5XCIsW1wibmF0dXJhbF9zY2llbmNlc1wiLFwiZ2VvXCJdXSJd",
          "datetime": "2023-01-16 13:35:35 +00:00",
          "spell": null,
          "item_types": [
            "dataset"
          ],
          "se_results_count": 11,
          "items_count": 11,
          "items": [
            {
              "type": "dataset",
              "rank_group": 1,
              "rank_absolute": 1,
              "position": "left",
              "xpath": null,
              "dataset_id": "L2cvMTFteHkyemdtNg==",
              "title": "Logan River Observatory: Right Hand Fork above confluence with Logan River Aquatic Site (RHF_CONF_A) Raw Data",
              "image_url": null,
              "scholarly_citations_count": null,
              "links": [
                {
                  "type": "link_element",
                  "title": "hydroshare.org",
                  "description": null,
                  "url": "http://www.hydroshare.org/",
                  "domain": "www.hydroshare.org"
                },
                {
                  "type": "link_element",
                  "title": "dataone.org",
                  "description": null,
                  "url": "http://search.dataone.org/",
                  "domain": "search.dataone.org"
                }
              ],
              "dataset_providers": [
                {
                  "type": "dataset_providers_element",
                  "title": "HydroShare",
                  "url": null,
                  "domain": null
                }
              ],
              "formats": [
                {
                  "type": "formats_element",
                  "format": "zip",
                  "size": null
                }
              ],
              "authors": [
                {
                  "type": "authors_element",
                  "name": "Logan River Observatory",
                  "url": null,
                  "domain": null
                }
              ],
              "licenses": [
                {
                  "type": "licenses_element",
                  "title": "Attribution 4.0 (CC BY 4.0)",
                  "url": "https://creativecommons.org/licenses/by/4.0/",
                  "domain": "creativecommons.org"
                }
              ],
              "updated_date": "2022-12-27 02:00:00 +00:00",
              "area_covered": [
                "North America",
                "Rocky Mountains",
                "Wasatch Range",
                "Right Hand Fork above confluence with Logan River"
              ],
              "period_covered": null,
              "dataset_description": {
                "text": "This dataset contains raw data for all of the variables measured for the aquatic site on Right Hand Fork above confluence with Logan Rive r(RHF_CONF_A). Each file contains a calendar year of data. The file for the current year is updated on a daily basis. The data values were collected by a variety of sensors at 15 minute intervals. The file header contains detailed metadata for the site and the variable and method of each column. This site is currently operated as part of the Logan River Observatory. Prior to 2018 this site was operated as part of the iUTAH GAMUT Network.\n",
                "links": null
              }
            },
            {
              "type": "dataset",
              "rank_group": 2,
              "rank_absolute": 2,
              "position": "left",
              "xpath": null,
              "dataset_id": "L2cvMTFuMDQ3X3B6aA==",
              "title": "Lake Simcoe Monitoring",
              "image_url": null,
              "scholarly_citations_count": 31,
              "links": [
                {
                  "type": "link_element",
                  "title": "canada.ca",
                  "description": null,
                  "url": "http://open.canada.ca/",
                  "domain": "open.canada.ca"
                },
                {
                  "type": "link_element",
                  "title": "arctic-sdi.org",
                  "description": null,
                  "url": "http://catalogue.arctic-sdi.org/",
                  "domain": "catalogue.arctic-sdi.org"
                }
              ],
              "dataset_providers": [
                {
                  "type": "dataset_providers_element",
                  "title": "Government of Ontario",
                  "url": null,
                  "domain": null
                }
              ],
              "formats": [
                {
                  "type": "formats_element",
                  "format": "pdf",
                  "size": null
                },
                {
                  "type": "formats_element",
                  "format": "html",
                  "size": null
                },
                {
                  "type": "formats_element",
                  "format": "zip",
                  "size": null
                }
              ],
              "authors": null,
              "licenses": [
                {
                  "type": "licenses_element",
                  "title": "Open Government Licence - Canada 2.0",
                  "url": "https://open.canada.ca/en/open-government-licence-canada",
                  "domain": "open.canada.ca"
                }
              ],
              "updated_date": "2022-12-30 02:00:00 +00:00",
              "area_covered": null,
              "period_covered": {
                "start_date": "1980-01-01 03:00:00 +00:00",
                "end_date": "2021-12-31 02:00:00 +00:00",
                "displayed_date": "Jan 1, 1980 - Dec 31, 2021"
              },
              "dataset_description": {
                "text": "The Lake Simcoe lake monitoring program provides measurements of chemical and physical water quality limits such as total phosphorus, nitrogen, chlorophyll a, pH, alkalinity, conductivity, dissolved organic and inorganic carbon, silica, other ions, water transparency, temperature and dissolved oxygen. Samples are collected biweekly during the spring, summer and fall. *[pH]: potential of hydrogen\n",
                "links": null
              }
            },
            {
              "type": "dataset",
              "rank_group": 3,
              "rank_absolute": 3,
              "position": "left",
              "xpath": null,
              "dataset_id": "L2cvMTFteHh6MjkwaA==",
              "title": "Logan River Observatory: Logan River at Wood Camp Bridge (LR_WCB_A) Raw Data",
              "image_url": null,
              "scholarly_citations_count": null,
              "links": [
                {
                  "type": "link_element",
                  "title": "hydroshare.org",
                  "description": null,
                  "url": "http://www.hydroshare.org/",
                  "domain": "www.hydroshare.org"
                },
                {
                  "type": "link_element",
                  "title": "dataone.org",
                  "description": null,
                  "url": "http://search.dataone.org/",
                  "domain": "search.dataone.org"
                },
                {
                  "type": "link_element",
                  "title": "dataone.org",
                  "description": null,
                  "url": "http://dataone.org/",
                  "domain": "dataone.org"
                }
              ],
              "dataset_providers": [
                {
                  "type": "dataset_providers_element",
                  "title": "HydroShare",
                  "url": null,
                  "domain": null
                }
              ],
              "formats": [
                {
                  "type": "formats_element",
                  "format": "zip",
                  "size": null
                }
              ],
              "authors": [
                {
                  "type": "authors_element",
                  "name": "Logan River Observatory",
                  "url": null,
                  "domain": null
                }
              ],
              "licenses": [
                {
                  "type": "licenses_element",
                  "title": "Attribution 4.0 (CC BY 4.0)",
                  "url": "https://creativecommons.org/licenses/by/4.0/",
                  "domain": "creativecommons.org"
                }
              ],
              "updated_date": "2022-12-26 02:00:00 +00:00",
              "area_covered": [
                "North America",
                "Rocky Mountains",
                "Wasatch Range",
                "Logan River at Wood Camp Bridge"
              ],
              "period_covered": null,
              "dataset_description": {
                "text": "This dataset contains raw data for all of the variables measured for the aquatic site on the Logan River at the Wood Camp Bridge (LR_WCB_A). Each file contains a calendar year of data. The file for the current year is updated on a daily basis. The data values were collected by a variety of sensors at 15 minute intervals. The file header contains detailed metadata for the site and the variable and method of each column. This site is currently operated as part of the Logan River Observatory.\n",
                "links": null
              }
            },
            {
              "type": "dataset",
              "rank_group": 4,
              "rank_absolute": 4,
              "position": "left",
              "xpath": null,
              "dataset_id": "L2cvMTFtZnQzajlnbQ==",
              "title": "Earth Challenge 2020 Plastics: Raw Data",
              "image_url": "https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcTEZy0oMWhWvUyIPRL9Hsj3bz362JQgK3NRH7qNJs505n14FXyI",
              "scholarly_citations_count": null,
              "links": [
                {
                  "type": "link_element",
                  "title": "kaggle.com",
                  "description": null,
                  "url": "http://www.kaggle.com/",
                  "domain": "www.kaggle.com"
                }
              ],
              "dataset_providers": null,
              "formats": [
                {
                  "type": "formats_element",
                  "format": "zip",
                  "size": 32351259
                }
              ],
              "authors": [
                {
                  "type": "authors_element",
                  "name": "Jonathan K.",
                  "url": null,
                  "domain": null
                }
              ],
              "licenses": [
                {
                  "type": "licenses_element",
                  "title": "CC0 1.0 Universal Public Domain Dedication",
                  "url": "https://creativecommons.org/publicdomain/zero/1.0/",
                  "domain": "creativecommons.org"
                }
              ],
              "updated_date": "2022-12-18 02:00:00 +00:00",
              "area_covered": null,
              "period_covered": null,
              "dataset_description": {
                "text": "The Earth Challenge 2020 app collects data on macroplastic pollution, or plastic pollution visible to the naked eye. Volunteers take a picture to document plastic pollution found in the environment, and indicate whether they have recycled it, left it, or thrown it away. In addition, volunteers can classify these images in accordance with a standardized classification schema.\n\nThis is the raw version of this data set. A version that leverages the OGC SensorThings standard is forthcoming.\n\nData are partially validated. Images are flagged for adult or racy content, and adult or racy images are removed. Data will be validated as images of plastic pollution when an updated data set is published that includes volunteer classifications.\n\nIf you would like to contribute data to Earth Challenge 2020, please visit the project website.  \n\nThis is the raw version of this data set. A version that leverages the OGC SensorThings standard is forthcoming.\n\nSource Description: Source Text\n\nSource Dataset Image: Image Source\n",
                "links": [
                  {
                    "type": "link_element",
                    "title": "Source Text",
                    "description": null,
                    "url": "https://www.google.com/url?q=https%3A%2F%2Fearthchallenge2020.earthday.org%2Fdatasets%2Fd5bb4e8642544bbd9a79e9346ed4dd78_0%3Fgeometry%3D34.096%252C-40.901%252C-16.881%252C61.320&source=datasetsearch",
                    "domain": null
                  },
                  {
                    "type": "link_element",
                    "title": "Image Source",
                    "description": null,
                    "url": "https://www.google.com/url?q=https%3A%2F%2Fwww.swissinfo.ch%2Feng%2Frace-for-water-odyssey_using-drones-to-hunt-for-the-oceans--plastic-pollution%2F41379106&source=datasetsearch",
                    "domain": null
                  }
                ]
              }
            },
            {
              "type": "dataset",
              "rank_group": 5,
              "rank_absolute": 5,
              "position": "left",
              "xpath": null,
              "dataset_id": "L2cvMTFteHkzMzR0ZA==",
              "title": "Logan River Observatory: Logan River Above Wood Camp Aquatic Site (LR_WC_A) Raw Data",
              "image_url": null,
              "scholarly_citations_count": null,
              "links": [
                {
                  "type": "link_element",
                  "title": "hydroshare.org",
                  "description": null,
                  "url": "http://www.hydroshare.org/",
                  "domain": "www.hydroshare.org"
                },
                {
                  "type": "link_element",
                  "title": "dataone.org",
                  "description": null,
                  "url": "http://dataone.org/",
                  "domain": "dataone.org"
                }
              ],
              "dataset_providers": [
                {
                  "type": "dataset_providers_element",
                  "title": "HydroShare",
                  "url": null,
                  "domain": null
                }
              ],
              "formats": [
                {
                  "type": "formats_element",
                  "format": "zip",
                  "size": null
                }
              ],
              "authors": [
                {
                  "type": "authors_element",
                  "name": "Logan River Observatory",
                  "url": null,
                  "domain": null
                }
              ],
              "licenses": [
                {
                  "type": "licenses_element",
                  "title": "Attribution 4.0 (CC BY 4.0)",
                  "url": "https://creativecommons.org/licenses/by/4.0/",
                  "domain": "creativecommons.org"
                }
              ],
              "updated_date": "2022-12-27 02:00:00 +00:00",
              "area_covered": [
                "North America",
                "Rocky Mountains",
                "Wasatch Range",
                "Logan River Above Wood Camp"
              ],
              "period_covered": null,
              "dataset_description": {
                "text": "This dataset contains raw data for all of the variables measured for the aquatic site on the Logan River Above Wood Camp (LR_WC_A). Each file contains a calendar year of data. The file for the current year is updated on a daily basis. The data values were collected by a variety of sensors at 15 minute intervals. The file header contains detailed metadata for the site and the variable and method of each column. This site is currently operated as part of the Logan River Observatory. Prior to 2018 this site was operated as part of the iUTAH GAMUT Network.\n",
                "links": null
              }
            },
            {
              "type": "dataset",
              "rank_group": 6,
              "rank_absolute": 6,
              "position": "left",
              "xpath": null,
              "dataset_id": "L2cvMTFyNGtscHd5dw==",
              "title": "Logan River Observatory: Temple Fork above confluence with Logan River Aquatic Site (TF_CONF_A) Raw Data",
              "image_url": null,
              "scholarly_citations_count": null,
              "links": [
                {
                  "type": "link_element",
                  "title": "hydroshare.org",
                  "description": null,
                  "url": "http://www.hydroshare.org/",
                  "domain": "www.hydroshare.org"
                },
                {
                  "type": "link_element",
                  "title": "dataone.org",
                  "description": null,
                  "url": "http://search.dataone.org/",
                  "domain": "search.dataone.org"
                }
              ],
              "dataset_providers": [
                {
                  "type": "dataset_providers_element",
                  "title": "HydroShare",
                  "url": null,
                  "domain": null
                }
              ],
              "formats": [
                {
                  "type": "formats_element",
                  "format": "zip",
                  "size": null
                }
              ],
              "authors": [
                {
                  "type": "authors_element",
                  "name": "Logan River Observatory",
                  "url": null,
                  "domain": null
                }
              ],
              "licenses": [
                {
                  "type": "licenses_element",
                  "title": "Attribution 4.0 (CC BY 4.0)",
                  "url": "https://creativecommons.org/licenses/by/4.0/",
                  "domain": "creativecommons.org"
                }
              ],
              "updated_date": "2022-12-24 02:00:00 +00:00",
              "area_covered": [
                "North America",
                "Rocky Mountains",
                "Wasatch Range",
                "Temple Fork above confluence with Logan River"
              ],
              "period_covered": null,
              "dataset_description": {
                "text": "This dataset contains raw data for all of the variables measured for the aquatic site on Temple Fork above confluence with Logan River (TF_CONF_A). Each file contains a calendar year of data. The file for the current year is updated on a daily basis. The data values were collected by a variety of sensors at 15 minute intervals. The file header contains detailed metadata for the site and the variable and method of each column. This site is currently operated as part of the Logan River Observatory. Prior to 2018 this site was operated as part of the iUTAH GAMUT Network.\n",
                "links": null
              }
            },
            {
              "type": "dataset",
              "rank_group": 7,
              "rank_absolute": 7,
              "position": "left",
              "xpath": null,
              "dataset_id": "L2cvMTFwYzBiYzZyeg==",
              "title": "Logan River Observatory: Temple Fork below Sawmill Spring Aquatic Site (TF_SAWM_A) Raw Data",
              "image_url": null,
              "scholarly_citations_count": null,
              "links": [
                {
                  "type": "link_element",
                  "title": "hydroshare.org",
                  "description": null,
                  "url": "http://www.hydroshare.org/",
                  "domain": "www.hydroshare.org"
                },
                {
                  "type": "link_element",
                  "title": "dataone.org",
                  "description": null,
                  "url": "http://search.dataone.org/",
                  "domain": "search.dataone.org"
                }
              ],
              "dataset_providers": [
                {
                  "type": "dataset_providers_element",
                  "title": "HydroShare",
                  "url": null,
                  "domain": null
                }
              ],
              "formats": [
                {
                  "type": "formats_element",
                  "format": "zip",
                  "size": 8
                }
              ],
              "authors": [
                {
                  "type": "authors_element",
                  "name": "Logan River Observatory",
                  "url": null,
                  "domain": null
                }
              ],
              "licenses": [
                {
                  "type": "licenses_element",
                  "title": "Attribution 4.0 (CC BY 4.0)",
                  "url": "https://creativecommons.org/licenses/by/4.0/",
                  "domain": "creativecommons.org"
                }
              ],
              "updated_date": "2022-12-27 02:00:00 +00:00",
              "area_covered": [
                "Temple Fork below Sawmill Spring",
                "North America",
                "Rocky Mountains",
                "Wasatch Range"
              ],
              "period_covered": null,
              "dataset_description": {
                "text": "This dataset contains raw data for all of the variables measured for the aquatic site on Temple Fork below Sawmill Spring (TF_SAWM_A). Each file contains a calendar year of data. The file for the current year is updated on a daily basis. The data values were collected by a variety of sensors at 15 minute intervals. The file header contains detailed metadata for the site and the variable and method of each column. This site is currently operated as part of the Logan River Observatory. Prior to 2018 this site was operated as part of the iUTAH GAMUT Network.\n",
                "links": null
              }
            },
            {
              "type": "dataset",
              "rank_group": 8,
              "rank_absolute": 8,
              "position": "left",
              "xpath": null,
              "dataset_id": "L2cvMTFwYzBmM2Q4bA==",
              "title": "Logan River Observatory: Dewitt Springs above confluence with Logan River Aquatic Site (DS_CONF_A) Raw Data",
              "image_url": null,
              "scholarly_citations_count": null,
              "links": [
                {
                  "type": "link_element",
                  "title": "hydroshare.org",
                  "description": null,
                  "url": "http://www.hydroshare.org/",
                  "domain": "www.hydroshare.org"
                },
                {
                  "type": "link_element",
                  "title": "dataone.org",
                  "description": null,
                  "url": "http://search.dataone.org/",
                  "domain": "search.dataone.org"
                }
              ],
              "dataset_providers": [
                {
                  "type": "dataset_providers_element",
                  "title": "HydroShare",
                  "url": null,
                  "domain": null
                }
              ],
              "formats": [
                {
                  "type": "formats_element",
                  "format": "zip",
                  "size": null
                }
              ],
              "authors": [
                {
                  "type": "authors_element",
                  "name": "Logan River Observatory",
                  "url": null,
                  "domain": null
                }
              ],
              "licenses": [
                {
                  "type": "licenses_element",
                  "title": "Attribution 4.0 (CC BY 4.0)",
                  "url": "https://creativecommons.org/licenses/by/4.0/",
                  "domain": "creativecommons.org"
                }
              ],
              "updated_date": "2022-12-27 02:00:00 +00:00",
              "area_covered": [
                "North America",
                "Rocky Mountains",
                "Wasatch Range",
                "Dewitt Springs above confluence with Logan River"
              ],
              "period_covered": null,
              "dataset_description": {
                "text": "This dataset contains raw data for all of the variables measured for the aquatic site on Dewitt Springs above confluence with Logan River (DS_CONF_A). Each file contains a calendar year of data. The file for the current year is updated on a daily basis. The data values were collected by a variety of sensors at 15 minute intervals. The file header contains detailed metadata for the site and the variable and method of each column. This site is currently operated as part of the Logan River Observatory. Prior to 2018 this site was operated as part of the iUTAH GAMUT Network.\n",
                "links": null
              }
            },
            {
              "type": "dataset",
              "rank_group": 9,
              "rank_absolute": 9,
              "position": "left",
              "xpath": null,
              "dataset_id": "L2cvMTFteHh6YjAxeQ==",
              "title": "Logan River Observatory: Spawn Creek above confluence with Temple Fork Aquatic Site (SPC_CONF_A) Raw Data",
              "image_url": null,
              "scholarly_citations_count": null,
              "links": [
                {
                  "type": "link_element",
                  "title": "hydroshare.org",
                  "description": null,
                  "url": "http://www.hydroshare.org/",
                  "domain": "www.hydroshare.org"
                },
                {
                  "type": "link_element",
                  "title": "dataone.org",
                  "description": null,
                  "url": "http://search.dataone.org/",
                  "domain": "search.dataone.org"
                }
              ],
              "dataset_providers": [
                {
                  "type": "dataset_providers_element",
                  "title": "HydroShare",
                  "url": null,
                  "domain": null
                }
              ],
              "formats": [
                {
                  "type": "formats_element",
                  "format": "zip",
                  "size": null
                }
              ],
              "authors": [
                {
                  "type": "authors_element",
                  "name": "Logan River Observatory",
                  "url": null,
                  "domain": null
                }
              ],
              "licenses": [
                {
                  "type": "licenses_element",
                  "title": "Attribution 4.0 (CC BY 4.0)",
                  "url": "https://creativecommons.org/licenses/by/4.0/",
                  "domain": "creativecommons.org"
                }
              ],
              "updated_date": "2022-12-23 02:00:00 +00:00",
              "area_covered": [
                "North America",
                "Rocky Mountains",
                "Wasatch Range",
                "Spawn Creek above confluence with Temple Fork"
              ],
              "period_covered": null,
              "dataset_description": {
                "text": "This dataset contains raw data for all of the variables measured for the aquatic site on Spawn Creek above confluence with Temple Fork (SPC_CONF_A). Each file contains a calendar year of data. The file for the current year is updated on a daily basis. The data values were collected by a variety of sensors at 15 minute intervals. The file header contains detailed metadata for the site and the variable and method of each column. This site is currently operated as part of the Logan River Observatory. Prior to 2018 this site was operated as part of the iUTAH GAMUT Network.\n",
                "links": null
              }
            },
            {
              "type": "dataset",
              "rank_group": 10,
              "rank_absolute": 10,
              "position": "left",
              "xpath": null,
              "dataset_id": "L2cvMTFwYzA4cmhqeg==",
              "title": "Logan River Observatory: South Logan Benson Canal at Benson Irrigation Company Flume, 2300 North 600 West Aquatic Site (SLB_600W_CNL) Quality Controlled Data",
              "image_url": null,
              "scholarly_citations_count": null,
              "links": [
                {
                  "type": "link_element",
                  "title": "hydroshare.org",
                  "description": null,
                  "url": "http://www.hydroshare.org/",
                  "domain": "www.hydroshare.org"
                },
                {
                  "type": "link_element",
                  "title": "dataone.org",
                  "description": null,
                  "url": "http://search.dataone.org/",
                  "domain": "search.dataone.org"
                }
              ],
              "dataset_providers": [
                {
                  "type": "dataset_providers_element",
                  "title": "HydroShare",
                  "url": null,
                  "domain": null
                }
              ],
              "formats": [
                {
                  "type": "formats_element",
                  "format": "zip",
                  "size": null
                }
              ],
              "authors": [
                {
                  "type": "authors_element",
                  "name": "Logan River Observatory",
                  "url": null,
                  "domain": null
                }
              ],
              "licenses": [
                {
                  "type": "licenses_element",
                  "title": "Attribution 4.0 (CC BY 4.0)",
                  "url": "https://creativecommons.org/licenses/by/4.0/",
                  "domain": "creativecommons.org"
                }
              ],
              "updated_date": "2022-12-27 02:00:00 +00:00",
              "area_covered": [
                "Logan",
                "North America",
                "Rocky Mountains",
                "South Logan Benson Canal at Benson Irrigation Company Flume",
                "2300 North 600 West"
              ],
              "period_covered": null,
              "dataset_description": {
                "text": "This dataset contains quality control level 1 (QC1) data for all of the variables measured for the aquatic site on the South Logan Benson Canal at Benson Irrigation Company Flume, 2300 North 600 West (SLB_600W_CNL). Each file contains all available QC1 data for a specific variable. Files will be updated as new data become available, but no more than once daily. These data have passed QA/QC procedures such as sensor calibration and visual inspection and removal of obvious errors. These data are approved by Technicians as the best available version of the data. See published script for correction steps specific to this data series. Each file header contains detailed metadata for site information, variable and method information, source information, and qualifiers referenced in the data. This site is currently operated as part of the Logan River Observatory.\n",
                "links": null
              }
            },
            {
              "type": "dataset",
              "rank_group": 11,
              "rank_absolute": 11,
              "position": "left",
              "xpath": null,
              "dataset_id": "L2cvMTFyNGtqbnMzeA==",
              "title": "Logan River Observatory: Logan River at Dewitt Springs Campground Aquatic Site (LR_DSC_A) Raw Data",
              "image_url": null,
              "scholarly_citations_count": null,
              "links": [
                {
                  "type": "link_element",
                  "title": "hydroshare.org",
                  "description": null,
                  "url": "http://www.hydroshare.org/",
                  "domain": "www.hydroshare.org"
                },
                {
                  "type": "link_element",
                  "title": "dataone.org",
                  "description": null,
                  "url": "http://search.dataone.org/",
                  "domain": "search.dataone.org"
                }
              ],
              "dataset_providers": [
                {
                  "type": "dataset_providers_element",
                  "title": "HydroShare",
                  "url": null,
                  "domain": null
                }
              ],
              "formats": [
                {
                  "type": "formats_element",
                  "format": "zip",
                  "size": null
                }
              ],
              "authors": [
                {
                  "type": "authors_element",
                  "name": "Logan River Observatory",
                  "url": null,
                  "domain": null
                }
              ],
              "licenses": [
                {
                  "type": "licenses_element",
                  "title": "Attribution 4.0 (CC BY 4.0)",
                  "url": "https://creativecommons.org/licenses/by/4.0/",
                  "domain": "creativecommons.org"
                }
              ],
              "updated_date": "2022-12-26 02:00:00 +00:00",
              "area_covered": [
                "North America",
                "Rocky Mountains",
                "Wasatch Range",
                "Logan River at Dewitt Springs Campground"
              ],
              "period_covered": null,
              "dataset_description": {
                "text": "This dataset contains raw data for all of the variables measured for the aquatic site on th eLogan River at Dewitt Springs Campground (LR_DSC_A). Each file contains a calendar year of data. The file for the current year is updated on a daily basis. The data values were collected by a variety of sensors at 15 minute intervals. The file header contains detailed metadata for the site and the variable and method of each column. This site is currently operated as part of the Logan River Observatory. Prior to 2018 this site was operated as part of the iUTAH GAMUT Network.\n",
                "links": null
              }
            }
          ]
        }
      ]
    }
  ]
}

Discover data on the exact dataset by ID

The Dataset Info endpoint is based on the same Google Dataset Search engine. The difference is that the result data is extracted from the dataset page that is displayed separately from the SERP. That means you can search for information about a particular dataset including its content, providers, licenses, and description.

To make a request you need to indicate the dataset ID. There are 2 ways to find it:
1 If your research is based on Google Dataset Search API’s results, you can find the "dataset_id" parameter in the "dataset" item.

Here we can see that the dataset ID is L2cvMTFxcGdsbDMwMQ==.

"items": [
            {
              "type": "dataset",
              "rank_group": 1,
              "rank_absolute": 1,
              "position": "left",
              "xpath": null,
              "dataset_id": "L2cvMTFxcGdsbDMwMQ==",
              "title": "Water Quality Data",
              "image_url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRM49hCs8HKygZqHe5K9VDnBr1WN8JF1ZGdQMmtNdVFYu-D1Nao",
              "scholarly_citations_count": 2,
              {...}

2 You also could find it in the dataset URL. For example, here is a link to the “Water Quality Data” in the search query:
https://datasetsearch.research.google.com/search?hl=en&query=water%20quality&docid=L2cvMTFxcGdsbDMwMQ%3D%3D&filters=bm9uZQ%3D%3D

In this case, the dataset ID is L2cvMTFxcGdsbDMwMQ.

You could notice that in the API response, every ID has a “==” ending. Both ways, with or without the “==” ending, the dataset ID is accepted by our system in the request.

Example request:

[
  {
    "dataset_id": "L2cvMTFxcGdsbDMwMQ=="
  }
]

{
  "version": "0.1.20221214",
  "status_code": 20000,
  "status_message": "Ok.",
  "time": "4.0817 sec.",
  "cost": 0.002,
  "tasks_count": 1,
  "tasks_error": 0,
  "tasks": [
    {
      "id": "01161600-4426-0139-0000-3357191881e9",
      "status_code": 20000,
      "status_message": "Ok.",
      "time": "4.0241 sec.",
      "cost": 0.002,
      "result_count": 1,
      "path": [
        "v3",
        "serp",
        "google",
        "dataset_info",
        "live",
        "advanced"
      ],
      "data": {
        "api": "serp",
        "function": "live",
        "se": "google",
        "se_type": "dataset_info",
        "dataset_id": "L2cvMTFxcGdsbDMwMQ==",
        "device": "desktop",
        "os": "windows"
      },
      "result": [
        {
          "keyword": "L2cvMTFxcGdsbDMwMQ==",
          "se_domain": "datasetsearch.research.google.com",
          "language_code": "en",
          "check_url": "https://datasetsearch.research.google.com/search?docid=L2cvMTFxcGdsbDMwMQ%3D%3D&hl=en",
          "datetime": "2023-01-16 14:00:30 +00:00",
          "spell": null,
          "item_types": [
            "dataset"
          ],
          "se_results_count": 1,
          "items_count": 1,
          "items": [
            {
              "type": "dataset",
              "rank_group": 1,
              "rank_absolute": 1,
              "position": "left",
              "xpath": null,
              "dataset_id": "L2cvMTFxcGdsbDMwMQ==",
              "title": "Water Quality Data",
              "image_url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRM49hCs8HKygZqHe5K9VDnBr1WN8JF1ZGdQMmtNdVFYu-D1Nao",
              "scholarly_citations_count": 2,
              "links": [
                {
                  "type": "link_element",
                  "title": "ca.gov",
                  "description": null,
                  "url": "http://data.ca.gov/",
                  "domain": "data.ca.gov"
                },
                {
                  "type": "link_element",
                  "title": "ca.gov",
                  "description": null,
                  "url": "http://data.cnra.ca.gov/",
                  "domain": "data.cnra.ca.gov"
                }
              ],
              "dataset_providers": [
                {
                  "type": "dataset_providers_element",
                  "title": "California Department of Water Resources",
                  "url": "http://www.water.ca.gov/",
                  "domain": "www.water.ca.gov"
                }
              ],
              "formats": [
                {
                  "type": "formats_element",
                  "format": "csv",
                  "size": null
                }
              ],
              "authors": null,
              "licenses": null,
              "updated_date": "2022-12-16 02:00:00 +00:00",
              "area_covered": null,
              "period_covered": null,
              "dataset_description": {
                "text": "The California Department of Water Resources (DWR) discrete “grab” water quality dataset contains DWR-collected, current and historical, chemical and physical parameters found in drinking water, groundwater, and surface waters throughout the state.\n",
                "links": null
              }
            }
          ]
        }
      ]
    }
  ]
}

How to use Google Dataset API in practice?

Different types of people are searching for datasets, from students looking for data covering their senior topic to business analysts and data scientists developing new tools. However, one thing they could converge on – in most cases, datasets are used for research.

Below we will cover some Google Dataset API use cases for dealing with a significant amount of datasets.

Research Projects

Conveying research always involves scraping a huge amount of data. A scientific approach to research is based on examining previous studies on the theme and it is important to make datasets widely available, allowing easy citation and access. But when it comes to large-scale analysis, you may spend months manually selecting each dataset you need.

With Google Dataset API you can build an automatic dataset monitoring system that will speed up the whole data analysis process. Moreover, the opportunity to indicate additional search parameters such as topics, file formats, and last updated date will make it easier to filter out unwanted data. It could be a powerful basis for developing data-driven types of tools.

For example, you can develop a marketing research tool by analyzing and integrating datasets of market research, numerous review platforms, customer behavior, and economic policy influence on the business in different countries. Our API will scrape sorted datasets that can be used by business analysts to make valuable insights.

Machine Learning

Datasets are an integral part of the field of Machine Learning (ML). Major advances in this field can result from progress in learning algorithms, computer hardware, and the availability of high-quality training datasets.

High-quality datasets are difficult and costly to produce even for unsupervised ML algorithms. If we are talking about supervised and semi-supervised learning, training datasets need to be labeled which takes a great amount of time and makes production twice harder.

That is why you can create an ecosystem with datasets containing tasks and labeled data for any type of ML algorithm based on the Google Dataset API. A lot of standard datasets are available on the Web for free, and for an advanced approach, you can always search for commercial datasets. This way you will save your money and time on developing datasets from scratch. In addition, you will be able to analyze the principle of compiling high-quality datasets, draw conclusions, and elaborate your own models.

Let’s take a look at the example of data application for water quality research. The Institute of Electrical and Electronics Engineers (IEEE) provides datasets on deep learning studies, ML algorithms comparison, and training datasets for ML used for scientific research. On the Google Dataset Search, you can find “Dataset for Assessing Water Quality for Drinking and Irrigation Purposes using Machine Learning Models”, which can be used to train and test ML models to detect the water quality by physico-chemical parameters. That is a vital step in expanding access to potable water for both scientific and commercial purposes.

The cost of using Google Dataset API

Four factors determine the cost of collecting data with Google Dataset API: the number of results you want to collect, the endpoint you will be using, the task execution method, and priority.

Using the Google Dataset Search API you will be billed for each SERP containing up to 20 results. You can specify a depth parameter in the POST request.
As for the Google Dataset Info API, you will be charged for every result. To calculate the cost of the result, multiply the price of SERP by 3. In this case, the price will depend only on the task execution method, and priority.

API has two main methods to deliver results – Standard and Live. The Standard method supports two task priorities – Normal and High. Chosen method and priority will determine the task execution time in frames from instant results to a guaranteed turnaround time of up to 45 minutes.

Now let’s describe the cost of using the endpoints.

Google Dataset Search

Method and priority Price per 20 blocks Price per 1M blocks
Live $0.002 $100
Standard Normal $0.0006 $30
Standard High $0.0012 $60

Note that our system processes 20 results in a row, so we recommend setting the depth in the multiples of 20. If you specify "depth": 21, you will be charged as per 40 results.

Google Dataset Info

Method and priority Price per 1 result Price per 1M results
Live $0.006 $6000
Standard Normal $0.0018 $1800
Standard High $0.0036 $3600

Conclusion

Scientists, governments, companies, and many others publish millions of datasets online. Google Dataset Search extracts dataset metadata from Web pages in order to make datasets discoverable. It is a valuable source for research, but there are no systems for automatically scraping Google’s data, and doing it manually is time-consuming and labor-intensive if you are dealing with a large number of datasets.

DataForSEO developed a solution – with Google Dataset API you can pull out vast volumes of data from Google Dataset Search. Integrating this API into your system will improve your research with minimal investment and give you an opportunity to build products on top of it.

Access up-to-date data sources for your research with rapid API results, reasonable pricing, and 24/7 support. Register to try our Google Dataset API for free!

Anna Plakhtii
No Comments

Sorry, the comment form is closed at this time.

Embed DataForSeo widget on your website


Embed code:
Preview:

深圳SEO优化公司益阳seo网站推广哪家好吉安企业网站设计推荐四平关键词按天扣费多少钱衡阳设计网站铁岭外贸网站设计推荐坑梓SEO按天收费价格丽江SEO按天扣费哪家好绵阳网站制作设计哪家好达州网站推广方案多少钱福州模板网站建设价格思茅百度网站优化价格邢台至尊标王推荐蚌埠网站优化按天扣费成都SEO按天收费报价洛阳关键词按天扣费哪家好湛江网站推广系统多少钱坪山网站制作安顺网站制作设计哪家好晋城网站搜索优化公司大连百度竞价包年推广多少钱漯河百度网站优化杭州关键词排名包年推广公司孝感企业网站制作淮安企业网站建设价格成都优秀网站设计哪家好商洛网站排名优化报价长春网站关键词优化推荐漳州推广网站安康网站推广方案价格安康高端网站设计价格歼20紧急升空逼退外机英媒称团队夜以继日筹划王妃复出草木蔓发 春山在望成都发生巨响 当地回应60岁老人炒菠菜未焯水致肾病恶化男子涉嫌走私被判11年却一天牢没坐劳斯莱斯右转逼停直行车网传落水者说“没让你救”系谣言广东通报13岁男孩性侵女童不予立案贵州小伙回应在美国卖三蹦子火了淀粉肠小王子日销售额涨超10倍有个姐真把千机伞做出来了近3万元金手镯仅含足金十克呼北高速交通事故已致14人死亡杨洋拄拐现身医院国产伟哥去年销售近13亿男子给前妻转账 现任妻子起诉要回新基金只募集到26元还是员工自购男孩疑遭霸凌 家长讨说法被踢出群充个话费竟沦为间接洗钱工具新的一天从800个哈欠开始单亲妈妈陷入热恋 14岁儿子报警#春分立蛋大挑战#中国投资客涌入日本东京买房两大学生合买彩票中奖一人不认账新加坡主帅:唯一目标击败中国队月嫂回应掌掴婴儿是在赶虫子19岁小伙救下5人后溺亡 多方发声清明节放假3天调休1天张家界的山上“长”满了韩国人?开封王婆为何火了主播靠辱骂母亲走红被批捕封号代拍被何赛飞拿着魔杖追着打阿根廷将发行1万与2万面值的纸币库克现身上海为江西彩礼“减负”的“试婚人”因自嘲式简历走红的教授更新简介殡仪馆花卉高于市场价3倍还重复用网友称在豆瓣酱里吃出老鼠头315晚会后胖东来又人满为患了网友建议重庆地铁不准乘客携带菜筐特朗普谈“凯特王妃P图照”罗斯否认插足凯特王妃婚姻青海通报栏杆断裂小学生跌落住进ICU恒大被罚41.75亿到底怎么缴湖南一县政协主席疑涉刑案被控制茶百道就改标签日期致歉王树国3次鞠躬告别西交大师生张立群任西安交通大学校长杨倩无缘巴黎奥运

深圳SEO优化公司 XML地图 TXT地图 虚拟主机 SEO 网站制作 网站优化