Geoconnex API

Feedback on requests and responses

Introduction

Geoconnex is a decentralized metadata catalog that incorporates information about data collected by any organization publishing water data, including the spatial, temporal, and topical subjects of that data.

We are developing an API to allow programmatic access to this catalog

  • for potential new data publishing organizations to enrich their own metadata

  • for data analysts and tool developers to discover water metadata, like a specialized Google search for water data.

Introduction

What kind of water data do you work with?

How do you normally find water data?

What, if any, challenges do you have finding water data from particular or unfamiliar sources?

Introduction

Outline

  • User Story - Data Publisher
  • User Story - Data Analyst/Tool Developer
  • Endpoints, requests, and responses

User Story - Water Data Publisher

  • “As a water data publisher…
  • I want to know all locations from all organizations where data is currently being collected that are near my own monitoring locations
  • So that I know whether to submit reference features to geoconnex.us, and so I know which reference features to tag my own data with when I publish it.

User Story - Water Data Analyst/Tool Developer

  • “As a water information tool/product developer…
  • I want to know all locations from all organizations where data types relevant to my questions are collected, and where that data can be accessed
  • So that I can build my tool or conduct my analysis using as much relevant data as possible

Top-level Endpoints

https://api.geoconnex.us

  • Services to discover and filter water datasets by:

    Space/Geometry Measurement/Modeling Method
    Organization/ data provider Temporal resolution
    Site Type (eg stream, well, dam) Temporal coverage
    Parameter/Variable/Observed Property Feature of Interest

Top-level Endpoints

https://api.geoconnex.us…

  • /processes 1
  • /locationTypes 2
  • /FeatureCollections 3
  • /Features/{catalogingFeatureType} 4
  • /providers 5
  • /observedProperties 6
  • /methodTypes 7
  • /methods 8
  • /timeSpacing 9

example codelist: methods

id provider name description methodType url
noaa-ahps ahps ahps NOAA Advanced Hydrological Prediction Service forecast https://water.weather.gov/ahps/about/about.php
usgs-streamflow nwis WSP-2175 Streamflow measurement and computation in-situ observation https://pubs.usgs.gov/wsp/wsp2175/wsp2175.pdf
usgs-groundwater_level-tape nwis GWPD1 Measuring water levels by use of a graduated steel tape in-situ observation https://pubs.usgs.gov/tm/1a1/pdf/GWPD1.pdf

Processes

Data discovery tools will follow the OGC-API Processes standard, and thus we refer to them as processes. The following tools are proposed:

process name description
referenceMyLocations POST a geospatial dataset of your own locations, find candidate geoconnex reference locations that may correspond to them already
findFeatures 10 GET or POST query parameters to find features of relevance
navigateFeatures 11 GET all hydrologically relevant monitoring features for a given longitude and latitude or feature identifier
findDatasets 12 POST an array of feature identifiers and an array of query parameters to find relevant dataset metadata]

/processes Data publisher use case

  • As a data publisher, would use referenceMyLocations to see if any existing locations in the database exist that are likely the same site/ real-world object as sites I want to publish data about. I could then add links in my metadata that I publish to ensure others know I have data about the same site.

referenceMyLocation Inputs:

  • input data: a geospatial dataset with at least an ID field

  • locationType: from /locationType codelist (eg “gage”, “well”, “dam”)

/processes Data publisher use case example

POST to /processes/referenceMyLocations

Input:

  • locationType: “gage”

  • input data:

/processes Data publisher use case example

POST to /processes/referenceMyLocations

Output:

  • spatial dataset with fields: uri, name, input_id, match_distance_m

/processes Data user imagined workflow

  1. use findFeatures discover locations within a bounding box, radius around a point, or arbitrary spatial polygon, or relevant to reference feature (eg a river or aquifer) or cataloging feature (eg a an administrative boundary or HUC)

    • receive a geospatial dataset, including attribute id populated with geoconnex http URIs (e.g. https://geoconnex.us/usgs/monitoring-location/1000001)
  2. If desired, use navigateFeatures to find all sites downstream and/or upstream of a site identified by a geoconnex http URI or a latitude/longitude

  3. use findDatasets, given a list of URIs and query parameters for provider, observedProperty, period of record, etc., be given links to datasets relevant to your question that are about the locations found by findFeatures and/or navigateFeatures calls

/processes/findFeatures use case

  • As a data user, I would use findFeatures to discover all real-world features that organizations of interest have published metadata about, fitltered to only my area of interest

/processes/findFeatures inputs

input number of elements example or description
locationTypes min:0, max: inf ["gages", "wells"]
providers min:0, max: inf NULL 13, or ["nwis","rise"]
catalogingFeatures min:1, max: inf {"hu02:"14", "state":["CA","AZ"]}
hydrologicFeature min:0, max: inf ["https://geoconnex.us/ref/mainstems/1"]
polygon min:0, max:1 any multipolygon or polygon geometry
bbox min:0, max:1 xmin, xmax, ymin, ymax
radius min: 0, max: 1 lat, lon, distance in km

/processes/findFeatures output

geospatial dataset of all locations meeting input field criteria, with fields:

/processes/navigateFeatures use case

  • As a data user, I would use navigateFeatures to discover all real-world features that organizations of interest have published metadata about, that are hydrologically relevant (eg upstream or downstream of) to a point that I am interested in.

/processes/navigateFeatures inputs and outputs

inputs

  • uri or pointLocation (lat/lon)
  • upstream: null, tributaries, or mainstem
  • downstream: null, diversions, or mainstem
  • distance: integer (km)
  • locationType: text14
  • provider: text15
  • catalogingFeatures: array of uri16

outputs

  • mainstem geospatial feature
  • tributaries geospatial feature collection
  • diversions geospatial feature collection
  • relevant geospatial dataset, with attributes:
    • uri: https://geoconnex.us/foo

    • hydro_relation:

      • “upstream mainstem/tributary” or “downstream mainstem/diversion”

      • distance_km: integer

    • all attributes as from findFeatures

/processes/navigateFeatures example inputs

inputs:

lat: 35.45 lon: -105.14 upstream: tributaries downstream: mainstem distance: 3000 catalogingFeatures: https://geoconnex.us/ref/states/35

/processes/navigateFeatures example outputs

outputs:

/processes/findDatasets use case

  • As a data user, I would use findDatasets to discover all datasets about topics I am interested about, that are relevant to sites a set of sites in the geoconnex system that I am already interested in.

  • Inputs: a list of geoconnex URIs from findFeatures and/or navigateFeatures, and query parameters to filter datasets relevant to my topic of interest

  • Outputs: a table or array of dataset metadata, including which geoconnex URI datasets are relevant to.

/processes/findDatasets inputs and outputs

inputs

  • uri: array (many URIs)
  • providers: null(all) or array of 1 or more providers
  • observedProperties: null(all) or array of 1 or more observed Property codes
  • methodTypes or methods17
  • timeSpacing: the maximum (lowest resolution) timespacing desired
  • minYear: the earliest year you want returned datasets to have coverage over
  • maxYear: the latest year you want returned datasets to have coverage over

outputs

An table of dataset metadata with the following fields

  • url: url for where each dataset lives
  • about_uri: the URI for the geoconnex location each dataset is about
  • provider: code from /providers
  • observedProperty: code from /observedProperties
  • methodType: code from /methodType
  • method: code from /methods
  • timeSpacing: code from /timeSpacing
  • minYear: integer
  • maxYear: integer
  • conformsTo: a url to a website describing the API, data model, or data dictionary relevant to the specific dataset

THANK YOU

Feel free to continue to peruse this presentation

Send any additional feedback to konda@lincolninst.edu

Footnotes

  1. an array of API endpoints in the OGC-API Processes standard for more specific queries that may use the below as query parameters

  2. a nested json array of types of locations available and their definition for each (e.g. [{“id”:“dam”, “name”:“dam”, “description”: “a structure creating an impounded body of water on a stream”)},{“name”:“stream”, “description”: “a flowing body of water on the surface”}])

  3. types of hydrologic (eg river, aquifer) and cataloging (e.g. HUC, catchment, county, state, municipality) features that data may be about or relevant to

  4. an array of the names and identifiers the reference features of a given type

  5. an array of identifiers, names, and URLs for data publication systems and their parent organizations (eg, nwis, National Water Information System, waterdata.usgs.gov, usgs.gov)

  6. an array of identfiers, names, and provider identifiers for observed properties (also often known as parameters, variables, data types) eg {“id”:“inflow_lake”, “name”:“Lake/Reservoir Inflow”,“provider”:“nwis”}.

  7. an array of identifiers, names, and definitions for broad categories of methods and their definitions (eg “id”:“obs”,“name”:“in-situ observation”,“definition”:“observation from an in-situ sensor or sample from a site visit”). Also include “remote sensing”, “estimation”, “simulation model”, “forecast”, “statistical summary”.

  8. an array id ids, names, descriptions, and links to documentation for specific data production methods/sensor eg {“id”:“noaa-ahps” ,“provider”:“noaa”, “name”:“ahps”, “description”:“NOAA Advanced Hydrologic Prediction Service River Forecast Model”,“methodType”:“forecast”, “url”: “https://water.weather.gov/ahps/about/about.php”}

  9. an array of time spacings of datasets available from sites eg [“unknown”,“intermittent”,“discrete”,“event”,“1 second”, “15 minute”, “1 day”, “1 year”]

  10. similar to NWIS Site Service

  11. similar to NLDI or upstream/downstream EPA RAD/WATERS

  12. similar to whatNWISdata function from dataRetrieval

  13. retrieves from all providers

  14. to pre-filter results by location type

  15. to pre-filter results by provider

  16. useful if you want to restrict hydrologic navigation to certain non-hydrologic boundaries

  17. to avoid conflicts between specifying a certain in-situ method and the methodType ’forecast”, for example