OPEN DATA PRODUCT SPECIFICATION
4.0 Release Candidate Version
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
The specification is shared under Apache 2.0 license. Development of the specification is under the umbrella of the Linux Foundation.
RELEASE CANDIDATE VERSION
Version source:
ODPS YAML Schema:
Editors:
Participate:
HowTo - ODPS Knowledge Base
The odps‑examples repository serves as a comprehensive knowledge base for using the Open Data Product Specification (ODPS), combining clear FAQs with real-world YAML examples to guide users through practical implementation. Each FAQ entry tackles a specific task—like defining metadata, setting pricing tiers, assigning SLAs, configuring data quality rules, managing access roles, or enabling AI/agent consumption—accompanied by executable YAML snippets and full .yml files.
The content is modular, code‑first, and designed for easy reuse, enabling teams to discover, govern, monetize, validate, and automate data products using standardized, machine-readable metadata. Users are encouraged to contribute new scenarios or enhancements via GitHub issues.
Introduction
The Open Data Product Specification is a vendor-neutral, open-source machine-readable data product metadata model. It defines the objects and attributes as well as the structure of digital data products. The work is based on existing standards (schema.org), best practices and emerging concepts like Data Mesh. The reasoning is that we reuse and proudly copy instead of reinventing the wheel. More detailed information of the origin can be found from the Open Data Product Specification homepage.
The ODPS 4.0 specification supports referencing mechanisms that improve modularity, reduce duplication, and ease governance. Users can reference internal components, such as SLA, dataQuality, dataAccess, and paymentGateways, from other parts of the product definition using JSON Reference syntax ($ref). In addition, any of these components can be defined and maintained in external YAML files and included via URL-based references. This makes it possible to reuse standardized SLA profiles, DQ rules, or access definitions across multiple data products, helping teams manage changes consistently, reduce errors, and align with enterprise-level governance practices.
You can reference Data Contract as a URL or define Data Contract as an inline element in ODPS. Both Data Contract Specification (DCS) and Open Data Contract Standard (ODCS) supported.
Specification aims and aspects
Specification aims:
- enable interoperability between organizations, data platforms, marketplaces, and tools.
- reduce data product metadata conversions and errors between systems and organizations,
- increase the speed of designing, testing, and implementing data products.
- speed up tools development around data product design, development and management.
- enable creation of automated data product deployment with standard methods (DataOps)
- support flexible data product pricing with plans specific DQ, SLA, and Access (with references)
- enable Everything as Code approach for SLA and Data Quality monitoring
Note! In the "Open Data Product" focus is on the latter words and the prefix 'open' refers to the openness of the standard. Any kind of connotations to open data (a different thing) are not intentional, intended, or desirable.
The specification has been designed with four major aspects of the data product in mind: 1) technical (infrastructure & access), 2) business (pricing & plans), 3) legal (licensing & IPR), and 4) ethical (privacy & mydata). The four aspects are described in 9 objects, which contain attributes and elements.
If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github
Document structure
LEFT COLUMN: Navigation
The left column is navigation which enables fluent and easy movement around the specification.
MIDDLE COLUMN: Principles and components
The middle column contains detailed information about the included components and related options. This is the theory part.
Note! Mandatory elements and attributes are listed separately in the definition tables. This enables user to construct minimum viable specification more easily and fast. https://schema.org provided ready-made definitions are applied when ever possible instead of re-inventing the wheel.
RIGHT COLUMN: Examples
The right column contains YAML formatted examples of how the specification is used. In the future other output formats are added on request basis. YAML can easily be converted to JSON if needed.
Example of YAML formatted snippet from the Open Data Product Specification:
dataQuality:
declarative:
- dimension: accuracy
displaytitle:
- en: Data Accuracy (percent)
Document level attributes
Here's the list of attributes which can occur at the document root level. In the following description, if a field is not explicitly REQUIRED or described with a MUST or SHALL, it can be considered OPTIONAL. Optional attributes are listed in own table and examples are given on the right column.
It is RECOMMENDED that the root OpenAPI document be named: dataproduct.json or dataproduct.yaml.
Mandatory attributes
Example of document level attribute usage and structure:
schema: https://opendataproducts.org/v4.0/schema/odps.yaml
version: 4.0
product:
details:
en:
name: Pets of the year
productID: 123456are
visibility: private
status: draft
type: dataset
fi:
name: Vuoden suosituimmat lemmikit
productID: 123456are
visibility: private
status: draft
type: dataset
Element name |
Type | Options | Description |
---|---|---|---|
schema | URL | Valid URL. See more from RFC 3986. | REQUIRED Defines the URL of Schema. Used often for validation purposes. |
version | string | This is the version of ODPS, for example dev or 2.2 | REQUIRED Defines the ODPS version. |
product | element | root element | REQUIRED Root element to tie all together. |
details | element | product business details | REQUIRED Binds together business details in different languages. |
en | element | ISO 639-1 defined 2-letter codes | REQUIRED - NOTE! This is a dynamic element! This element binds together other product attributes and expresses the langugage used. In the example this is "en", which indicates that product details are in English. If you would like to use French details, then name the element "fr". The naming of this element follows options (language codes) listed in ISO 639-1 standard. You can have product details in multiple languages simply by adding similar sets like the example - just change the binding element name to matching language code. The pattern to implement multilanguage support for data products was adopted from de facto UI translation practices. The attributes inside this element are commonly rendered in the UI for the consumer and providing a simple way to implement that was the driving reasoning. See for example JSON - Multi Language |
name | string | max length 256 chars | REQUIRED The name of the product. |
productID | string | max length 256 chars | REQUIRED Product identifier. |
visibility | string | one of: private, invitation, organisation, dataspace, public | REQUIRED The publicity level eg who can see this product. Private - just the creator. Invitation - visible only to parties explicitly invited. Organisation - visible to all in your organisation. Dataspace - visible to all existent members of the data space. Public - visible to all publicly. |
status | string | one of: announcement, draft, development, testing, acceptance, production, sunset, retired | REQUIRED The status of the product. Lifecycle model discussed in details in here (link). |
type | string | one of: raw data, derived data, dataset, reports, analytic view, 3D visualisation, algorithm, decision support, automated decision-making, data-enhanced product, data-driven service, data-enabled performance, bi-directional. | REQUIRED The type of the product. Options are derived from examples and lists found from academic literature. |
Optional attributes
RecommendedDataProducts OBJECT contains an array of data products which offers means to attach related data products to the data product at hand. The source of the recommended data product might be from the same marketplace/catalog or an external one. Recommended object offers method to extend the reach and promotion escpecially when data product is treated as an independent entity much like described in Data Mesh. Also when data product is published in a marketplace, the Recommended object offers means to promote other than just the data products from the given data marketplace. In short, tis object is mainly for discovery and reach purposes.
RecommendedUseCases OBJECT is an array which contains offers method to attach usefull usecases to the data product. Usecases are informatiove for the the data customer and exemplify how the data product can create value.
Example of document level attribute usage and structure:
schema: https://opendataproducts.org/v4.0/schema/odps.yaml
version: 4.0
product:
contract:
id: 02323M123
type: ODCS
contractVersion: 2.2.2
contractURL: https://demo.datamesh-manager.com/demo834016807886/dataproducts/9bd53b1b-b51e-41a8-a757-4d33b4cde460
details:
en:
name: Pets of the year
productID: 123456are
valueProposition: Design a customised petstore using a data product that describes
pets with their habits, preferences and characteristics.
description: This is an example of a Petstore product.
productSeries: Lovely pets data products
visibility: private
status: draft
productVersion: '0.1.0'
versionNotes: New version with additional details such more accurate pet details
issues: The current issues include incorrect information in the dog breeds. The
resolution for these problems is planned for the next update, scheduled
to be released on July 15th, 2023.
categories:
- pets
standards:
- ISO 24631-6
tags:
- pet
brandSlogan: Passion for the data monetization
type: dataset
contentSample: https://download.com/pets.json
logoURL: https://data-product-business.github.io/open-data-product-spec/images/logo-dps-ebd5a97d.png
OutputFileFormats:
- JSON
- XML
- CSV
- ZIP
- PDF
useCases:
- useCase:
useCaseTitle: Build attractive and lucrative petstore!
useCaseDescription: Use case description how succesfull petstore chain was established in Abu Dhabi
useCaseURL: https://marketplace.com/usecase1
recommendedDataProducts:
- https://marketplace.com/dataproduct.json
- https://marketplace.com/dataproduct-another.json
Element name |
Type | Options | Description |
---|---|---|---|
contract | element | - | Binds together data contract details. You can use both URL and inline (YAML) for data contract content. |
id | string | - | UUID of the data contract |
type | string | one of: ODCS, DCS | Defines the standard used in data contract. Currently supported options: ODCS and DCS. |
contractVersion | string | - | Version of the standard used to define the Data Contract. Type attribute defines the standard/specification. NOTE! This is not the possible iterated version of the data contract itself. |
contractURL | URL | Valid URL. See more from RFC 3986. | URL pointing to data contract in data contract management service or alike. Optionally you can use spec to add data contract details as YAML inline element. |
spec | string | YAML | Inline YAML element to add data contract details instead of using URL. |
created | date | Use ISO 8601 | When product was created. |
updated | date | Use ISO 8601 | When product was last updated. |
valueProposition | string | text content, max length 512 chars | This is the product's value proposition. Often one or two sentences and crystallizes the value for the customer. |
description | string | - | The description of the product. Text only. |
productSeries | string | - | A group of products in the product mix which are associated with each other and they can be obtained for the same type of customers or they are marketable for the same type of market place. |
categories | array | - | Comma separated array of categories. |
standards | array | - | Comma separated array of standards related e.g. to data content or quality, such as ISO 8000 or ISO 19131. |
tags | array | - | Comma separates array of tags. |
productVersion | string | The versioning according to SemVer | The version of the data product. Applies for ODPS metadata as well. |
versionNotes | string | - | Additional information about the version. |
issues | string | - | There may be errors in the data product that require corrections. These issues will be briefly described to users, along with information about when the fixes will be implemented. |
contentSample | URL | Valid URL. See more from RFC 3986. | Sample content of the data product, for example JSON/XML output. This sample should match the actual data product output and give the data consumer an idea what to expect. Obviously if the data product is pure service for example dashboard or algorithm, then consider providing preview version or alike |
logoURL | URL | Valid URL | Valid URL of the logo. See more from RFC 3986. |
outputFileFormats | string | - | Output file formats for data product |
brandSlogan | string | - | Brand related slogan like Nike has just do it. |
useCases | element | array | Contains list of related use cases with description information and link to details. NOTE! These examples are expected to use same language as defined previously in the data product details content binding element. |
useCaseTitle | string | string | Title of the usecase. |
useCaseDescription | string | - | Brief description of the usecase. |
useCaseURL | URL | Valid URL, RFC 3986 | Valid URL of the more detailed usecase description. |
recommendedDataProducts | array | Array of valid URLs (RFC 3986) | Data products to recommend use next to this data product or even as replacement (for comparison). The URL provided MUST reference a description of a data product following this same standard |
If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github
Data SLA
Template structure of SLA array component:
SLA:
declarative:
- dimension: selected dimension
displaytitle:
description:
objective:
unit:
executable:
- dimension: selected dimension
type:
version:
reference:
spec:
Data Service Level Agreement (SLA) Object contains attributes which define the desired and promised quality of the data product.
A Data Service Level Agreement (SLA) is a contractual agreement between a data service provider and its customers that defines the expected level of service quality, performance, and availability for the data services provided. SLAs outline specific metrics, targets, and responsibilities that both parties agree to adhere to, ensuring accountability and transparency in the delivery of data services.
Defining Data SLAs in a machine-readable format enhances automation, facilitates monitoring, enables real-time compliance tracking, and supports seamless integration with monitoring and alerting systems.
Structure notes: The SLA object is divided into 2 parts: declarative and executable.
- Declarative part defines the dimensions and aimed/intended SLA levels in defined unit.
- Executable part contains the machine-readable "as code" rules to validate SLA dimensions. The code inside spec element is intended to be injected as in supporting SLA monitoring platforms in their defined format and structure.
The SLA object is general in nature and should be enough for common (80%) use cases. Note that you can make extensions to the standard with "x-" mechanism in order to fulfill any industry specific needs. The "Specification extensions" section provides details on how to use this feature.
In case standardized options are not enough:
The SLA object is general in nature and should
be enough for common (80%) use cases.
You can make extensions to the standard
with "x-" mechanism in order to fulfill
any industry specific needs.
A suggestive example below
SLA:
declarative:
- x-dimension: custom
displaytitle:
- en: Custom SLA
objective: 99
unit: percent
- dimension: responseTime
objective: 200
unit: milliseconds
SLA can be defined with 11 standardized dimensions with decoupled Everything as Code monitoring
Example of SLA component usage:
SLA:
declarative:
- dimension: uptime
displaytitle:
- en: Uptime
objective: 99
unit: percent
- dimension: responseTime
objective: 200
unit: milliseconds
SLA Dimension |
Description |
---|---|
latency | minimal amount of time before getting any response. |
uptime | Uptime is a measure of system reliability, expressed as the percentage of time a machine, typically a computer, has been working and available. See more https://uptime.is/. |
responseTime | amount of time to process external request. |
errorRate | Maximum tolerated errors in data, percentage. |
endOfSupport | The date at which your product will not have support anymore. |
endOfLife | The date at which your product will not be available anymore. No support, no access. |
updateFrequency | how often data is updates. |
timeToDetect | How fast can you detect a problem? |
timeToNotify | Once you see a problem, how much time do you need to notify your users? |
timeToRepair | How long do you need to fix the issue once it is detected? |
emailResponseTime | How long do you need to respond to email support requests? |
No mandatory attributes at the moment. Optional attributes are listed in own table and an example is given in the right column.
Optional attributes and elements
Example of SLA component usage:
SLA:
declarative:
default:
name:
en: The Basic SLA
description:
en: The basic SLA package
dimensions:
- dimension: uptime
displaytitle:
en: Uptime
objective: 90
unit: percent
- dimension: responseTime
objective: 200
unit: milliseconds
- dimension: updateFrequency
objective: 30
unit: minutes
premium:
name:
en: The Premium SLA
description:
en: The Premium SLA package
dimensions:
- dimension: uptime
displaytitle:
en: Uptime
objective: 99
unit: percent
- dimension: responseTime
objective: 100
unit: milliseconds
- dimension: updateFrequency
objective: 5
unit: minutes
executable:
- dimension: uptime
type: prometheus
reference: 'https://prometheus.io/docs/prometheus/latest/querying/basics/'
spec: |-
avg_over_time(
(
sum without() (up{job="prometheus"})
or
(0 * sum_over_time(up{job="prometheus"}[7d]))
)[7d:5m]
)
- dimension: responseTime
type: prometheus
reference: 'https://prometheus.io/docs/prometheus/latest/querying/basics/'
spec: |-
rate(http_server_requests_seconds_sum[$__rate_interval]) /
rate(http_server_requests_seconds_count[$__rate_interval])
support:
phoneNumber: '+971508976456'
phoneServiceHours: Mon-Fri 8am-4pm (GMT)
email: support@opendataproducts.org
emailServiceHours: Mon-Fri 8am-4pm (GMT)
documentationURL: ''
Example of SLA external profiles usage:
SLA: # the below file contains the same content as above
$ref: 'https://example.org/slas/all-packages.yaml'
Example of SLA external profiles for each profile usage:
SLA:
declarative:
default:
$ref: 'https://example.org/slas/basic.yaml'
premium:
$ref: 'https://example.org/slas/premium.yaml'
Element name |
Type | Options | Description |
---|---|---|---|
SLA | element | - | Binds the SLA-related elements and attributes together. |
$ref | filepath or valid URL | - | Define the SLA in external file for reuse purposes, example $ref: 'https://example.org/slas/all-packages.yaml' See example. This makes it easy to keep related profiles (e.g. default, premium, gold) together, apply versioning and validation once, and publish all variants from a single repo or source. The same pattern can be used in individual SLA profiles instead of doing it inline. See example. This gives finer control if each SLA is owned or updated by a different team, but increases the number of files to track and host. |
default | object | - | This object must always be present and named exactly default if SLA object is used. It acts as the fallback or baseline SLA profile. Users are free to define additional named profiles such as premium , gold , etc., in parallel to the default. In the example above, both default and premium profiles are included. These variants can be referenced from pricing plans or other objects. Example reference usage: SLA: $ref: '#/Product/SLA/default' |
dimensions | array | - | Contains one or more SLA dimension objects. Each defines a measurable SLA metric such as uptime or responseTime. |
dimension | attribute | string, one of: latency, uptime, responseTime, errorRate, endOfSupport, endOfLife, updateFrequency, timeToDetect, timeToNotify, timeToRepair, emailResponseTime | Defines the SLA dimension. |
objective | attribute | integer | Target level to be achieved for the dimension (e.g., 99). |
unit | attribute | Options: percent, milliseconds, seconds, minutes, days, weeks, months, years, never, date, null | Measurement unit for the SLA objective. If "date" is used, format should be dd/mm/yyyy. |
displayTitle | array | - | Dimension title to be shown in UIs. Localized per language. |
description | array | - | Description of the SLA package or specific dimension, localized per language. |
executable | element | - | Grouping element for SLA monitoring logic. Monitoring definitions are provided as code in the spec field for each dimension. |
type | attribute | string | Name of the monitoring system (e.g., Prometheus). Must support SLA-as-code. |
spec | element | YAML/URL/string | Monitoring logic for the defined dimension. Can be inline YAML, plain string, or URL to code file. |
reference | URL | Valid URL | Link to documentation about the monitoring system used. |
support | element | - | Describes how users can get help with usage, billing, or other issues. |
phoneNumber | string | valid phone number | The phone number for support (e.g., +971508976456). Should follow E.164 format. |
phoneServiceHours | string | - | Description of phone support hours, e.g., "Mon–Fri 8am–4pm (GMT)". |
string | valid email address | Email address for support requests. Must follow RFC2822. | |
emailServiceHours | string | - | Description of email support hours. |
documentationURL | URL | Valid URL | Link to documentation describing the support process or SLA handling. |
If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github
Data Quality
Template structure of Data Quality array component:
dataQuality:
declarative:
default:
dimensions:
- dimension: accuracy
displaytitle:
en: Data Accuracy (percent)
objective: 90
unit: percentage
- dimension: completeness
displaytitle:
- en: Data Completeness (percent)
objective: 90
unit: percentage
executable:
- dimension: selected dimension
type:
version:
reference:
spec:
Data quality is essential for one main reason: You give customers the best experience when you make decisions using accurate data. A great customer experience leads to happy customers, brand loyalty, and higher revenue for your business. Information is only valuable if it is of high quality.
By adhering to defined quality characteristics, organizations can maximize the value of their data assets, improve decision-making, enhance operational efficiency, and maintain trust and confidence in their data-driven processes and systems. ODPS is compatible with EDM Council data quality model.
The dataQuality
component in ODPS provides a structured and machine-readable way to declare and monitor the quality characteristics of a data product. It helps align technical validation with business expectations and supports both human understanding and automated tooling.
Structure Overview:
The dataQuality
object consists of two parts:
declarative
: Captures target levels for defined quality dimensions likeaccuracy
,completeness
, ortimeliness
. These values represent your intended or promised quality levels.executive
: Allows integration with supported Everything as Code tools (e.g., SodaCL, DQOps, MonteCarlo) to define verifiable and executable rules that check whether those targets are met in practice.
This structure ensures that both expectations and enforcement logic are documented and machine-actionable in the same place.
Referencing Capability
One of the key features of ODPS is the ability to reuse named data quality profiles via references. For example, quality profiles such as default
, premium
, or gold
can be defined once under dataQuality.declarative
and referenced elsewhere in the YAML—such as in SLA definitions, pricing plans, or tiered service offerings.
Benefits of Referencing:
- DRY Principle: Avoid repetition. Define once, reference many times.
- Clarity: Consumers of your data product can easily see which quality profile is associated with a particular service tier or access plan.
- Scalability: You can support multiple audiences or markets with varying quality expectations.
- Auditability: Clearly link machine-readable checks to business commitments.
referencing examples:
$ref: '#/Product/dataQuality/default'
...
$ref: '#/Product/dataQuality/premium'
Referencing Examples:
To reference a defined quality profile from another part of your YAML (e.g., pricing plan): $ref: '#/Product/dataQuality/default'
. Or reference a named premium quality package: $ref: '#/Product/dataQuality/premium'
Use this component to clearly communicate both intentions and verifiable guarantees about data quality—whether you're reporting to stakeholders, building trust with customers, or enabling automated validation through modern DQ tools.
The Role of default
The default
quality profile is mandatory whenever the dataQuality
object is used. It acts as the baseline definition, ensuring there is always a clear and predictable quality configuration, even when no referencing is used.
You should use the default
profile when:
- You want to describe core quality expectations for the product.
- You don’t yet need pricing or SLA-specific variations.
- You want to ensure future compatibility with advanced features such as AI agents, data marketplaces, or automated governance.
This makes the default
profile both a minimum requirement and a best practice for clarity and interoperability.
The QA object is general in nature and should be enough for common (80%) use cases. You can make extensions to the standard with "x-" mechanism in order to fulfill any industry specific needs.
ODPS offers 8 standardized options to define and measure data quality with Everything as Code monitoring
Data Quality Dimension |
Description |
---|---|
accuracy | The measurement of the veracity of data to its authoritative source |
completeness | Data is required to be populated with a value (aka not null, not nullable). Completeness checks if all necessary data attributes are present in the dataset. |
conformity | Data content must align with required standards, syntax (format, type, range), or permissible domain values. Conformity assesses how closely data adheres to standards, whether internal, external, or industry-wide. |
consistency | Data should retain consistent content across data stores. Consistency ensures that data values, formats, and definitions in one group match those in another group. |
coverage | All records are contained in a data store or data source. Coverage relates to the extent and availability of data present but absent from a dataset. |
timeliness | The data must represent current conditions; the data is available and can be used when needed. |
validity | Validity refers to the extent to which the data accurately and appropriately represents the real-world object or concept it is supposed to describe. |
uniqueness | Uniqueness means each record and attribute should be one-of-a-kind, aiming for a single, unique data entry |
Optional attributes and elements
Example of Data Quality component with some of the data quality dimensions:
dataQuality:
declarative:
default:
displaytitle:
en: The Basic Data Quality
description:
en: The basic quality package
dimensions:
- dimension: accuracy
displaytitle:
en: Data Accuracy (percent)
description:
en: >
Data Accuracy ensures the data product reflects the real-world
entities or events it represents, minimizing errors and providing
reliable insights.
objective: 90
unit: percentage
- dimension: completeness
displaytitle:
- en: Data Completeness (percent)
objective: 90
unit: percentage
premium:
displaytitle:
en: The Premium Data Quality
description:
en: The Preimum quality package
dimensions:
- dimension: accuracy
displaytitle:
en: Data Accuracy (percent)
description:
en: >
Data Accuracy ensures the data product reflects the real-world
entities or events it represents, minimizing errors and providing
reliable insights.
objective: 98
unit: percentage
- dimension: completeness
displaytitle:
en: Data Completeness (percent)
objective: 99
unit: percentage
executable:
- dimension: accuracy
type: SodaCL
reference: https://docs.soda.io/soda-cl/soda-cl-overview.html
spec:
- require_unique(member_id)
- require_range(age_band, 18, 100)
- dimension: completeness
type: DQOps
version: 1.6.0
reference: https://dqops.com/docs/dqo-concepts/running-data-quality-checks/
spec:
columns:
target_column:
profiling_checks:
nulls:
profile_nulls_percent:
warning:
max_percent: 8.0
error:
max_percent: 10.0
fatal:
max_percent: 11.0
Example of DQ external profiles usage:
dataQuality: # the below file contains the same content as above
$ref: 'https://example.org/DQ/all-packages.yaml'
Example of DQ external profiles for each profile usage:
dataQuality:
declarative:
default:
$ref: 'https://example.org/DQ/basic.yaml'
premium:
$ref: 'https://example.org/DQ/premium.yaml'
Element name |
Type | Options | Description |
---|---|---|---|
dataQuality | element | - | Contains array of data quality dimensions with optional computational monitoring object. Under this element Data Quality is divided into declarative and executable parts. |
$ref | filepath or valid URL | - | Define the Data quality profiles in external file for reuse purposes, example $ref: 'https://example.org/DQ/all-packages.yaml' See example. This makes it easy to keep related profiles (e.g. default, premium, gold) together, apply versioning and validation once, and publish all variants from a single repo or source. The same pattern can be used in individual data quality profiles instead of doing it inline. See example. This gives finer control if each Data quality is owned or updated by a different team, but increases the number of files to track and host. |
default | object | - | This object must always be present and named exactly default if dataQuality object is used. It acts as the fallback or primary data quality profile. Users are free to define additional named profiles such as premium , gold , etc., in parallel to the default. In the example, default and premium are both included. These variants can be referred to from other components like pricing plans. Example reference usage: dataQuality: $ref: '#/dataQuality/default' |
dimension | attribute | string, one of: accuracy, completeness, conformity, consistency, coverage, timeliness, validity, or uniqueness. | Defines the data quality dimension. |
objective | attribute | integer | Defines the target value for the data quality dimension |
unit | attribute | string. One of: percentage, number | Defines the unit used in stating the target quality level. |
executable | element | - | Grouping element that collects together data quality monitoring rules. You can define monitoring patterns as code under this element for each dimension. The actual as-code part is defined in the spec element. |
displaytitle | array | - | Dimension title to be shown in various UIs. Array contains title(s) in desired language(s). |
en | attribute | ISO 639-1 defined 2-letter codes | Binds the text elements together in a given language. Multilanguage support is implemented by duplicating content under other ISO language codes. |
description | array | - | Describe the dimension so that it can be used for example in info boxes in UI. Array contains descriptions in desired language(s). |
type | attribute | string, one of: SodaCL, Montecarlo, DQOps, Custom | Data Quality Monitoring as code system name. Use one of the predefined options only. With Custom type you can use your in-house solution. |
version | attribute | string | The version of DQ monitoring tool used. |
reference | URL | Valid URL | Provide URL pointing to the reference documentation. |
spec | element | YAML/URL/string | The content for Data Quality monitoring expressed as code. Accepted as inline YAML, a valid URL pointing to YAML, or a plain string if type is Custom . |
If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github
Data Pricing Plans
Template structure of Data Pricing Plans array component:
pricingPlans:
en:
- name: Premium subscription 1 year
unit: recurring
priceCurrency: EUR
price: 50.00
billingDuration: year
- name: Premium subcsription 1 month
unit: recurring
priceCurrency: EUR
price: 8.00
billingDuration: month
In case standardized options are not enough:
You can make extensions to the standard
with "x-" mechanism in order to fulfill
any industry specific needs.
A suggestive example below
pricingPlans:
en:
- name: Premium subscription 1 year
unit: recurring
priceCurrency: EUR
price: 50.00
billingDuration: year
- x-name: Extension plan
unit: custom
priceCurrency: EUR
price: 50.00
billingDuration: year
Standardized data pricing plans are crucial for transparency, scalability, and customer trust. They ensure that customers can easily understand and compare costs, fostering trust and reducing disputes. For providers, standardized pricing streamlines operations, supports scalability, and simplifies market comparison, allowing for effective competitive positioning. Additionally, it aids in better financial planning and forecasting for both the provider and the customer, ensuring predictable revenue and informed decision-making. Overall, standardized pricing is essential for the sustainable growth and success of data products.
Pricing is the process whereby a business sets the price at which it will sell its products and services. Pricing OBJECT contains pricing plans related metadata to be used for example in displaying the items in a marketplace. If needed the standard metadata is converted to marketplace internal format. We encourage all data product owners to enforce usage of this standard to foster global interoperability.
The 12 pricing plans enabled by ODPS are meticulously defined through an in-depth analysis of pricing models applied across more than 300 data products. We continuously monitor the evolution of pricing plans in the data economy, striving to provide the most comprehensive and up-to-date list of standardized pricing options - yet some gaps might exist.
The Pricing object is general in nature and should be enough for common (80%) use cases. You can make extensions to the standard with "x-" mechanism in order to fulfill any industry specific needs. The "Specification extensions" section provides details on how to use this feature.
Includes the "Pricing Plans as Code" feature. You can define the necessary actions (CRUP) to set up and use the selected payment gateway, initiating the purchase process. CRUP stands for:
- Create (create pricing plan),
- Retire (delete pricing plan),
- Update (update existing pricing plan) and,
- Purchase (generate or get link to ignite purchase process in the gateway).
With this feature, you can translate the pricing plans defined in the declarative part into executable code within payment gateways.
Supported payment gateways:
ODPS supports 12 standardized pricing models
The unit attribute defines the plan and options for that are fixed unless extended with "x-" mechanism.
Pricing plan |
Description |
---|---|
Recurring time period based | In the simplest terms, recurring payments (also known as subscription payments, automatic payments, or recurring billing) take place when customers authorize a merchant to charge them repeatedly for goods or services on a prearranged schedule. |
One time payments plans | One Time Fee Revenue Model is a business model that generates revenue through a single payment for perpetual product use or service access. The One Time Fee Revenue Model is a fundamental concept in the world of small businesses and entrepreneurship. |
Pay-as-you-go plans | The Pay As You Go Plan is a flexible alternative to a monthly plan. Instead of paying a recurring monthly charge, you buy credits as needed. |
Revenue sharing plans | Revenue sharing is a performance-based income model that involves sharing business profits or losses among participating partners. Revenue sharing is a profit-sharing system that ensures all parties involved are compensated for their contribution to the business. |
Data volume plan | Volume pricing is a pricing strategy in which an item's price per unit decreases as the purchase quantity increases. |
Trial | A free trial pricing strategy offers target customers a chance to try your product for free for a limited time. It is a sales promotion technique that uses the product to market itself. |
Dynamic pricing | Dynamic pricing is a pricing strategy that applies variable prices instead of fixed prices. Instead of deciding on a set price for a season, retailers can update their prices multiple times per day to capitalize on the ever-changing market. |
Pay what you want plans | Also known as PWYW pricing, is a pricing strategy in which buyers pay the desired price for a particular product, commodity, or service. The approach may sometimes lead to the value of zero. Following the buyer's guidance, one can set a suggested price and a minimum price. |
Freemium | A type of business model that offers basic features of a product or service to users at no cost and charges a premium for supplemental or advanced features. |
Open data | Access to open data is expected to be free of cost, but in some cases it is also possible to collect fees to cover costs of the service. |
Value-based | Value-based pricing is a strategy of setting prices primarily based on a consumer's perceived value of a product or service. Value-based pricing is customer-focused, meaning companies base their pricing on how much the customer believes a product is worth. Often worth to provide customer a value simulator to see expected value gains and possibly set the price based on that. Pricing would be customized per customer. |
On Request | Access to data product is given only on request. Often provider expects customer to meet provider first. In the discussions conditions, pricing etc are agreed. |
Pricing Plans optional attributes and elements
Example of Pricing component usage in english:
pricingPlans:
declarative:
en:
- name: Standard API subscription 1 month
priceCurrency: EUR
price: 50
billingDuration: month
unit: recurring
maxTransactionQuantity: 200000
offering:
- High Quality Events data
- High amount of transactions
- Billed monthly
paymentGateway:
$ref: '#/paymentGateways/default'
dataQuality:
$ref: '#/dataQuality/default'
SLA:
$ref: '#/SLA/default'
access:
$ref: '#/dataAccess/API'
- name: Premium MCP 1 month
priceCurrency: EUR
price: 500
billingDuration: month
unit: recurring
maxTransactionQuantity: 0
offering:
- High Quality Events data
- High amount of transactions
- Billed monthly
paymentGateway:
$ref: '#/paymentGateways/agent'
dataQuality:
$ref: '#/dataQuality/premium'
SLA:
$ref: '#/SLA/premium'
access:
$ref: '#/dataAccess/agent'
executable:
- name: Premium subscription 1 year
type: Stripe
reference: urls to Stripe docs
create:
spec:
- cmd: stripe products create \
params: '--name="Premium subscription 1 year"'
- cmd: stripe prices create \
params: '--currency=eur --unit-amount=50 -d "recurring[interval]"=month -d "product_data[name]"="Premium subscription 1 year"'
read:
spec:
- cmd: stripe products get \
params: '--name="Premium subscription 1 year"'
update:
spec:
- cmd: stripe products update \
params: '--name="Premium subscription 1 year"'
delete:
spec:
- cmd: stripe products delete \
params: '--name="Premium subscription 1 year"'
Element name |
Type | Options | Description |
---|---|---|---|
pricingPlans | element | - | Binds the pricing plans related elements and attributes together |
en | element | ISO 639-1 defined 2-letter codes | NOTE! This is a dynamic element! This element binds together other product pricing plan attributes and expresses the langugage used. In the example this is "en", which indicates that pricing plan details are in English. If you would like to use French details, then name the element "fr". The naming of this element follows options (language codes) listed in ISO 639-1 standard. You can have product pricing plan details in multiple languages simply by adding similar sets like the example - just change the binding element name to matching language code. The pattern to implement multilanguage support for data products was adopted from de facto UI translation practices. The attributes inside this element are commonly rendered in the UI for the consumer and providing a simple way to implement that was the driving reasoning. See for example JSON - Multi Language |
declarative | element | - | Grouping element which collects together data product pricing plans with business details |
priceCurrency | string | Use standard formats: ISO 4217 currency format e.g. "USD"; Ticker symbol for cryptocurrencies e.g. "BTC" | The primary currency used in pricing. Platforms are assumed to use this as primary currency if currency conversions are used to display product pricing in different locations for various currencies. If the unit is revenue-sharing, then this attribute value MUST be percentage. |
price | string | - | The offer price of a product, or of a price component, or revenue-sharing percentage. If the unit of pricing is revenue-sharing, then this price attribute value is percentage value. Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols. With data-volume the price is for each 1GB of data. |
billingDuration | string | options: instant, day, week, month, year | Specifies for how long this price (or price component) will be billed. Can be used, for example, to model the contractual duration of a subscription or payment plan. |
unit | string | One of: One-time-payment, Pay-per-use, Recurring, Revenue-sharing, Data-volume , Pay-what-you-want, Freemium, Open-data, Value-based, On-request, Trial | One-time-payment is for single time purchase purposes, further purchaces are not intended to continue under same agreement. Pay-per-use is intended for continuous usage and price set is for each successful usage action. Recurrring is intended for continuous time period plans. Revenue sharing is a performance-based income model. An effective revenue sharing deal structure is offering your expertise to a business owner to help them grow their business. In return, you get paid a percentage of the revenue as a royalty fee. Freemium is for free access. Use this option also for open data. Data-volume is for data amount based pricing in which customer pays based on the served data amount. The price is always for 1GB of data. Pay-what-you-want is a pricing system where buyers pay any desired amount for a given commodity, sometimes including zero. In some cases, a minimum (floor) price may be set, and/or a suggested price may be indicated as guidance for the buyer. The buyer can also select an amount higher than the standard price for the commodity. If the floor price is set, use minPrice attribute. Open-data is an explicit pricing plan category for open data. By default open data should be free, but in some cases it can have a price. Value-based is value-based selling unit. Present the outcome of your story with solid data and a measurable impact with help of offering attribute. Example: “We can lower the energy bill in heating by $8-13/square meter in a year. Try out simulator to calculate your value!”. Use optional valueSimulator attribute to provide link (URL) to value simulator you have created. In order to set base fee for value-based plan, you can for example set monthly (billingDuration) plan with base see with help of minPrice attribute. On-request is for plans in which customer is given access to data product after contacting provider. Use provider contact information in providing means to contact data product provider for access permissions request. If the trial is used, then trial duration should be defined in the offering part. Read more about the Pricing plans from ODPS wiki |
maxTransactionQuantity | Integer | Integer | The maximum transaction quantity for the given billing duration. Use this to define for example monthly (or any other period) request limit to the data product. Note! If you want to set unlimited use, value must be 0 (zero). |
offering | string | array | The element that contains pricing plan content as array of strings. Think of this as the list of what is included in the pricing plan and what you offer in return to the price asked. Use the language defined in the plan |
minPrice | string | - | The lowest price if the price is a range. If dynamic pricing is used with this product, this is the lowest price allowed. In dynamic pricing businesses are able to change prices based on algorithms that take into account competitor pricing, supply and demand, and other external factors in the market. Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols. |
maxPrice | string | - | The highest price if the price is a range. If dynamic pricing is used with this product, this is the highest price allowed. Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols. |
valueAddedTaxIncluded | boolean | true/false | Specifies whether the applicable value-added tax (VAT) is included in the price specification or not. |
valueAddedTaxPercentage | Integer | Number percentage value, range 0-100 | Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols. |
validFrom | DateTime | A combination of date and time in ISO 8601 format yyyy-MM-dd'T'HH:mm:ss.SSSZ. | The date when the item becomes valid. |
validTo | DateTime | A combination of date and time in ISO 8601 format yyyy-MM-dd'T'HH:mm:ss.SSSZ. | The date after when the item is not valid. |
additionalPrice | string | - | This is used to define fees for usage which exceeds the defined max transaction quantity. This value is for each additional transaction. Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols. |
maxDataQuantity | Integer | - | The maximum amount of data transferred during the billing duration. Unit is GB. |
valueSimulator | url | valid url | Intended to be used with value-based pricing plan. Provide url to value simulator in which customer can see the value in various cases. In the simulator customer might be able to input own variables to match their exact case and see the gained value. |
notes | string | - | Optional free-text explanatory note associated with the pricing plan. Often used for disclaimers, internal remarks, or elaborating on intent. |
paymentGateway | reference | path to paymentGateways | Reference to an available payment gateway definition under #/paymentGateways/... |
dataQuality | reference | path to dataQuality | Reference to a defined data quality package from #/dataQuality/... |
SLA | reference | path to SLA | Reference to a defined SLA specification under #/SLA/... |
access | reference | path to dataAccess | Reference to a data access method under #/dataAccess/... |
executable | element | - | Grouping element which collects together pricing plans payment gateway management features. You can define the needed action (CRUP) to setup and use the gateway to ignite purchase process. CRUP stands for: Create, Retire, Update, and Purchase. The actual as code part is added with spec element. |
type | attribute | string, one of: Stripe, Checkout, Custom | Payment gateway system name. Use one of the predefined options only. With Custom type you can use your in-house solution. |
version | attribute | string | The version of the payment gateway tool used. |
reference | URL | Valid URL | Provide URL pointing to the reference documentation |
create | element | - | Contains the as code part to create pricing plan in the payment gateway |
retire | element | - | Contains the as code part to retire (delete) pricing plan in the payment gateway |
update | element | - | Contains the as code part to update pricing plan in the payment gateway |
purchase | element | - | Contains the as code part to create or get pricing plan purchase igniting link in the payment gateway |
spec | element | YAML/URL/string | The content the as code part for the pricing plan. Content is intended to be in a form that can be injected as is to type defined payment gateway system. Content depends of the system used and reference attribute is expected to provide more information. Note! By default the rules must be provide as valid YAML, either as inline element (YAML) or as valid URL (filesystem or online) pointing to valid YAML content file. String content is allowed and used only if type attribute value is Custom. In the custom case your string of course can be YAML too. |
If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github
Data Licensing
Data product licensing is essential for ensuring legal compliance, protecting intellectual property rights, enabling monetization and commercialization, facilitating control and governance, managing liability and indemnification, promoting standardization and interoperability, and fostering transparency and trust in data exchange and collaboration. By establishing clear and enforceable licensing agreements, data producers and consumers can effectively leverage data assets while minimizing legal risks and maximizing the value of data products.
The data product may be exploited e.g. by licensing its use and exploitation to third parties. Machine-readable license as part of the specification is implemented for this purpose. It can be used to conclude various agreements regarding data protection, processing and intellectual property rights (IPR). Data can be protected by one or more intellectual property rights. Principle is that when a third party (Data User) exploits the data, it must have a license or other right from Data Holder to exploit the data.
Optional attributes and elements
Example of License Object usage:
license:
en:
scope:
definition: The purpose of this license is to determine the terms and conditions
applicable to the licensing of the data product, whereby Data Holder grants
Data User the right to use the data.
restrictions: Data User agrees not to, directly or indirectly, participate in
the unauthorized use, disclosure or conversion of any confidential information.
geographicalArea:
- EU
- US
permanent: False
exclusive: False
rights:
- Reproduction
- Display
- Distribution
- Adaptation
- Reselling
- Sublicensing
- Transferring
termination:
noticePeriod: 90
terminationConditions: After the expiry of the right
of use, the product and its derivatives must be removed.
continuityConditions: Expired license will automatically continued without written
cancellation (termination) by Data Holder
governance:
ownership: Mindmote Oy, a company specializing in pet industry insights, owns
the license to its proprietary data product 'Pets of the Year'.
damages: During the term of license, except for the force majeure or the Data
Holders reasons, Data User is required to follow strictly in accordance with
the license. If Data User wants to terminate the license early, it needs to
pay a certain amount of liquidated damages.
confidentiality: Data User undertakes to maintain confidentiality as regards all
information of a technical (such as, by way of a non-limiting example, drawings,
tables, documentation, formulas and correspondence) and commercial nature (including
contractual conditions, prices, payment conditions) gained during the performance
of this license.
applicableLaws: This license shall be interpreted, construed and enforced in accordance
with the law of Finland, including Copyright Act 404/1961.
warranties: Data Holder makes no warranties, express or implied, guarantees or
conditions with respect to your use of the data product. To the extent permitted
under local law, Data Holder disclaims all liability for any damages or losses,
including direct, consequential, special, indirect, incidental or punitive,
resulting from Data User use of the data product.
audit: Data Holder will reasonably cooperate with Data Users by providing available
additional information about the data product. Both parties will bear their
own audit-related costs.
forceMajeure: Both parties may suspend their contractual obligations when fulfillment
becomes impossible or excessively costly due to unforeseeable events beyond
their control, such as strikes, fires, wars, and other force majeure events.
Example of License as ref usage:
license:
$ref: 'https://example.org/licenses/default'
Element name |
Type | Options | Description |
---|---|---|---|
license | element | - | Binds the licensing related elements and attributes together. |
$ref | URI | Valid URI | Points to the license text, local file or online. You can use this or then the other approach to define elements of the license with below given elements and attributes |
en | attribute | ISO 639-1 defined 2-letter codes | This element binds together other product attributes and expresses the langugage used. In the example this is "en", which indicates that product details are in English. If you would like to use Arabic details, then name the element "ar". The naming of this element follows options (language codes) listed in ISO 639-1 standard. You can have product details in multiple languages simply by adding similar sets like the example - just change the binding element name to matching language code. The pattern to implement multilanguage support for data products was adopted from de facto UI translation practices. The attributes inside this element are commonly rendered in the UI for the consumer and providing a simple way to implement that was the driving reasoning. See for example JSON - Multi Language |
scope | element | - | Extent, range, coverage, area or space of the license. |
definition | string | text content, max length 512 chars | Background and purpose of the license. |
restrictions | string | text content, max length 512 chars | Restrictions of the license. |
geographicalArea | string | ISO 3166-1 alpha-2 codes | License right restricted to the geographical area. |
permanent | boolean | true/false | License with no expiration date. |
exclusive | boolean | true/false | The exclusive license holder is given complete control over the use of the data product, and no other person or organization is allowed to use it during the term of the license agreement. |
rights | array | Options: Reproduction (rights to reproduce), Display (disclose data to others), Distribution (right to distribute), Adaptation (right for derivate work), Reselling (right to resell), Transferring (transferable data license), Sublicensing (license grant may include a right to sublicense) | Rights granted by the licence. The texts in brackets and italic are intended to describe rights. |
termination | element | - | Licence termination and continuity related conditions. |
noticePeriod | integer | unit is days | The notice period is a particular time that an data product provider or consumer must give before ending the contract. This time window allows both sides to make the necessary preparations, guaranteeing an unhindered transfer. |
terminationConditions | string | text content, max length 512 chars | Cancellation conditions of the license. |
continuityConditions | string | text content, max length 512 chars | Continuity conditions of the license. |
governance | element | - | Governance is the approach taken to ensure that the agreed outcomes are being fulfilled. |
ownership | string | text content, max length 512 chars | Data product licensing ownership. |
audit | string | text content, max length 512 chars | License auditing terms. |
warranties | string | text content, max length 512 chars | License warranties. |
damages | string | text content, max length 512 chars | Damages refers to the sum of money (i.e. indemnifications) for a breach of some duty or violation of license right. |
confidentiality | string | text content, max length 512 chars | Restrictions and requirements imposed on the Data User regarding e.g. the use and disclosure of the Data Holder's confidential information. |
applicableLaws | string | text content, max length 512 chars | Applicable laws, i.e local acts, degrees or law. |
forceMajeure | string | text content, max length 512 chars | Force majeure is a clause that is included in contracts to remove liability for unforeseeable and unavoidable catastrophes that interrupt the expected course of events and prevent participants from fulfilling obligations. These clauses generally cover both natural disasters and catastrophes created by humans. |
Data Access
The dataAccess
object defines how users—or machines—can technically access the data product. It allows publishers to describe multiple, named access methods tailored to different consumer needs: from simple file downloads and APIs to AI agent integration via protocols like MCP.
Each entry under dataAccess
(such as default
, API
, or Agent
) represents a distinct access interface with its own metadata, authentication requirements, and documentation references. This structure makes it possible to:
- Offer flexible access modes for various user personas (analysts, developers, AI agents, etc.)
- Support multilingual UI presentation through localized
name
anddescription
fields - Clearly declare security expectations using
authenticationMethod
- Link to both machine-readable specs (
specsURL
) and human-readable guides (documentationURL
) - Promote reusability by referencing these interfaces throughout the ODPS YAML using
$ref
Including an AI agent-specific access interface (outputPorttype: AI
) supports MCP-based agent interactions, aligning your product with AI-native data delivery patterns.
Referencing Examples
For example in your access
section in Pricing, you can reuse any defined method from dataAccess
like this: $ref: '#/Product/dataAccess/default'
Optional attributes and elements
Example of Data Access object usage:
dataAccess:
default:
name:
- en: Access to zipped package
description:
- en: Latest Dataset and Resources
outputPorttype: file
format: zip
accessURL: url to file as zip
dataonly:
name:
- en: Access to latest dataset
description:
- en: Latest Dataset
outputPorttype: file
format: CSV
accessURL: url to file as CSV
API:
name:
- en: Access to API
description:
- en: API Access to the Latest Dataset
outputPorttype: API
authenticationMethod: OAuth
specification: OAS
format: JSON
accessURL: >-
https://data.cms.gov/data-api/v1/dataset/2/data
specsURL: >-
https://data.cms.gov/provr-enrollment/api-docs
documentationURL: >-
https://data.cms.gov/provr-enrollment/docs
agent:
name:
- en: AI Agent access to the data product
description:
- en: Provides AI agents access to the data product via MCP server.
outputPorttype: AI
description:
- en: MCP interface for structured data access and agent interaction.
authenticationMethod: Token
specification: MCP 2025-03-26
format: MCP
specsURL: https://urbanpulse.ai/llms.txt
documentationURL: https://urbanpulse.ai/llms-full.txt
Example of Data Access external profiles usage:
dataAccess: # the below file contains the same content as above
$ref: 'https://example.org/dataAccess/all-packages.yaml'
Example of extenal Data Access for each profile usage:
dataAccess:
default:
$ref: 'https://example.org/dataAccess/basic.yaml'
API:
$ref: 'https://example.org/dataAccess/api.yaml'
agent:
$ref: 'https://example.org/dataAccess/agent.yaml'
Element name |
Type | Options | Description |
---|---|---|---|
dataAccess | object | - | Root-level object containing named access configurations. Each key (e.g., default , API , Agent ) defines an access method that can be reused across the ODPS YAML. |
$ref | filepath or valid URL | - | Define the Data Access in external file for reuse purposes, example $ref: 'https://example.org/dataAccess/all-packages.yaml' See example. This makes it easy to keep related profiles (e.g. default, API, agent) together, apply versioning and validation once, and publish all variants from a single repo or source. The same pattern can be used in individual Data Access profiles instead of doing it inline. See example. This gives finer control if each Data Access is owned or updated by a different team, but increases the number of files to track and host. |
default | object | - | This object defines the default access interface and must always be present if dataAccess object is used. The name default is fixed and used as the fallback or primary access method. In the example, you will see additional user-defined access methods ( dataonly , API , Agent ) demonstrating how various access interfaces can be added beyond the required default . Example reference usage: access: $ref: '#/dataAccess/default' |
name | object | ISO 639-1 language codes (e.g., en ) |
Multilingual name for the access interface. Can be shown in UIs. |
description | object | ISO 639-1 language codes (e.g., en ) |
Multilingual description for the access interface. Supports user understanding. |
outputPorttype | string | file, API, SQL, AI, gRPC, sFTP, etc. | Describes the technical method for delivering data (e.g., file for file downloads, API for web services). |
format | string | JSON, XML, CSV, Excel, zip, plain text, GraphQL, MCP | Specifies the data format made available through this access channel. |
authenticationMethod | string | OAuth, Token, API key, HTTP Basic, none | Security model required to access the data. |
specification | string | OAS, RAML, Slate, MCP | Defines the type of API or protocol specification used to describe access (e.g., OpenAPI, RAML, or a custom protocol like MCP). |
specsURL | URL | Valid URL | Points to the machine-readable technical documentation (e.g., OpenAPI YAML). |
accessURL | URL | Valid URL | The direct access point to retrieve the data – can be for example an API endpoint or a file link. |
documentationURL | URL | Valid URL | A human-readable documentation or guide for access setup, authentication steps, or onboarding. |
hashType | string | SHA-1, SHA-2, SHA-256, MD5, etc. | (Optional) Defines hash algorithm used when providing file integrity verification. |
checksum | string | any string | (Optional) File hash/checksum value, useful for verifying data integrity after download. |
Data Holder
DataHolder Object describes the Organization legally allowed to create, develop and publish data products.
Data holder means "a legal person, public body, international organisation, or a natural person who is not a data subject with respect to the specific data in question, which, in accordance with applicable Union or national law, has the right to grant access to or to share certain personal data or non-personal data." (Data Governance Act)
The data holder might not be the original IPR owner of the data used, but has rights operate with it. The contract or other agreement between Provider and possible data owner is not part of the standard as metadata or licence wise.
If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github
Optional attributes and elements
Example of Holder component with some of the voluntary attributes:
dataHolder:
en:
legalName: MindMote Oy
businessId: 12243434-12
email: contact@mindmote.fi
taxID: "12243434-12"
vatID: "12243434-12"
logoURL: "https://mindmote.fi/logo.png"
description: "Digital Economy services and tools"
URL: "https://mindmote.fi"
telephone: "+35845 0232 2323"
streetAddress: "Koulukatu 1"
postalCode: "33100"
addressRegion: "Pirkanmaa"
addressLocality: "Tampere"
addressCountry: "Finland"
aggregateRating: ""
ratingCount: "1245"
slogan: ""
parentOrganization: ""
Element name |
Type | Options | Description |
---|---|---|---|
dataHolder | element | - | Binds the provider related business elements and attributes together |
en | attribute | ISO 639-1 defined 2-letter codes | This element binds together other product attributes and expresses the langugage used. In the example this is "en", which indicates that product details are in English. If you would like to use French details, then name the element "fr". The naming of this element follows options (language codes) listed in ISO 639-1 standard. You can have product details in multiple languages simply by adding similar sets like the example - just change the binding element name to matching language code. The pattern to implement multilanguage support for data products was adopted from de facto UI translation practices. The attributes inside this element are commonly rendered in the UI for the consumer and providing a simple way to implement that was the driving reasoning. See for example JSON - Multi Language |
legalName | string | text content, max length 256 chars | REQUIRED The official name of the organization, e.g. the registered company name. |
businessID | string | As defined in RFC 5322 | The business identifier code of the company. Often this is given to the company by authorized public sector organization managing register of businesses. |
contactName | string | - | Contact person name |
string | - | Email to be used in contacting the organization. | |
taxID | string | - | The Tax / Fiscal ID of the organization or person, e.g. the TIN in the US or the CIF/NIF in Spain. |
vatID | string | - | The Value-added Tax ID of the organization or person. |
businessDomain | string | - | In a data mesh architecture, data (or data product) ownership and management are distributed across self-contained business domains. |
logoURL | URL | Valid URL. See more from RFC 3986. | The URL pointing to organisation logo. |
description | string | Max length 512 chars | The introduction to the organization. Often contains information of what the organisation does and focuses on. |
URL | URL | Valid URL. See more from RFC 3986. | The URL of the organization's website. |
telephone | string | Valid telephone number | The telephone number. Use E.164 standard. |
streetAddress | string | - | The street address. For example, 1600 Amphitheatre Pkwy. |
postalCode | string | - | The postal code. For example, 94043. |
addressRegion | string | - | The region in which the locality is, and which is in the country. For example, California or another appropriate first-level Administrative division |
addressLocality | string | - | The locality in which the street address is, and which is in the region. For example, Mountain View. |
addressCountry | string | two-letter ISO 3166-1 alpha-2 country code | The country. |
aggregateRating | string | - | The average rating based on multiple ratings or reviews. |
ratingCount | integer | - | The amount of ratigns and reviews used in calculating the aggregateRating. |
slogan | string | Max length 256 chars | The slogan of the organization. This is often related to showing the brand |
parentOrganization | string | - | The larger organization that this organization is a subOrganization of, if any. |
If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github
Payment Gateways
The paymentGateways
object in ODPS defines how transactions are handled when pricing plans require financial exchanges—whether from humans or AI agents. It allows you to describe and configure multiple named gateway setups (e.g., default
, Agent
, PremiumStripe
) that can be referenced from pricing plans.
Each gateway definition provides a structured way to:
- Describe the payment system (e.g., Stripe, Axio, Custom)
- Point to relevant documentation (
reference
) - Include executable or declarative configuration under
spec
This component enables flexible monetization strategies, including differentiated billing models for human users vs. machine users, and support for both traditional and agent-native payment protocols.
Referencing Payment Gateways:
Named gateway definitions (e.g., default
, Agent
) can be reused across pricing plans using $ref
. This ensures consistent payment logic, minimizes duplication, and allows you to tie multiple plans to a single gateway configuration.
Benefits of Referencing:
- Centralization: Maintain payment logic in one place and link to it from many pricing plans.
- Consistency: Ensure all monetized components use the same gateway logic, version, and authentication flow.
- Flexibility: Easily create alternative gateways for specific user segments (e.g., AI agents vs. humans).
- Transparency: Documented specs help consumers understand how billing works—and let machines integrate autonomously.
referencing examples:
$ref: '#/Product/paymentGateways/default'
...
$ref: '#/Product/dataQuality/agent'
The Role of default
:
Whenever the paymentGateways
object is used, a gateway named default
is expected and required. It serves as the primary or fallback payment method and enables compatibility even in simpler configurations that only need one payment integration.
Use the default
gateway when:
- You want to offer a single, unified payment method (e.g., Stripe for API consumption)
- You’re not using tiered monetization or segmentation
- You want to future-proof your YAML to support pricing plan evolution
Other named gateways (e.g., Agent
, premiumStripe
) can be added freely as needed.
Optional attributes and elements
Example of Payment Gateways object usage:
paymentGateways:
default:
description:
en: API consumption payment gateway for humans
type: Stripe
version: 1
reference: 'https://docs.stripe.com/'
spec: |
// Replace this with your actual implementation or link
stripe.createCheckoutSession({
amount: 100, // in cents
currency: 'usd',
success_url: 'https://your-platform.com/success',
cancel_url: 'https://your-platform.com/cancel'
});
Agent:
description:
en: Payment gateway for AI agents
type: Axio
version: 1
reference: 'https://www.x402.org/'
spec: |
paymentMiddleware("0xYourAddress", {"/your-endpoint": "$0.01"});
Example of Payment Gateways external profiles usage:
paymentGateways: # the below file contains the same content as above
$ref: 'https://example.org/gateways/all-packages.yaml'
Example of extenal Payment Gateway for each profile usage:
paymentGateways:
default:
$ref: 'https://example.org/gateways/basic.yaml'
premium:
$ref: 'https://example.org/gateways/premium.yaml'
Element name |
Type | Options | Description |
---|---|---|---|
paymentGateways | object | - | Object containing all defined payment gateway configurations. Each key is a named reference (e.g. default , agent ) that can be reused in pricing plans. |
$ref | filepath or valid URL | - | Define the Payment Gateway profiles in external file for reuse purposes, example $ref: 'https://example.org/gateways/all-packages.yaml' See example. This makes it easy to keep related profiles (e.g. default, API, agent) together, apply versioning and validation once, and publish all variants from a single repo or source. The same pattern can be used in individual Payment Gateway profiles instead of doing it inline. See example. This gives finer control if each Payment Gateway is owned or updated by a different team, but increases the number of files to track and host. |
default | object | - | Named default payment gateway used. If you use paymentGateways object, this default is expected to be first option and is defined always. Can be referenced using $ref: '#/paymentGateways/default' . In the example, agent payment gateway as an example of freely named gateway which can be referenced using $ref: '#/paymentGateways/agent' . You can follow the same pattern and create more |
description | object | string ISO 639-1 | Multilingual descriptions of the payment gateway. Enables UI rendering in multiple languages. |
type | string | Stripe, Axio, Checkout, Custom | Defines the payment system or protocol used. Use predefined values. Custom allows custom or internal solutions. |
version | string | Free-form version label | Indicates the version of the payment gateway specification, SDK, or integration logic. |
reference | URL | Valid URL | Points to documentation or developer reference for the payment gateway system. |
spec | string / YAML / URL | - | Contains the executable or declarative logic for the payment gateway integration. Can be inline YAML, stringified logic, or link to external file. |
Specification extensions
While the Open Data Product Specification tries to accommodate most use cases, additional data can be added to extend the specification at certain points.
The extensions properties are implemented as patterned fields that are always prefixed by "x-". The extensions may or may not be supported by the available tooling, but those may be extended as well to add requested support (if tools are internal or open-sourced). Open Data Product Initiative Technical Steering Committee does not officially approve external extensions - they are fully independent. Popular extensions however are natural candidates for future additions of the standard.
We encourage you to let us know of useful extensions so that we can consider those in the future releases, raise an issue in Github
Example of extension usage:
product:
name: Pets of the year
productID: 123456are
description: ''
x-internal-id: foobar123
Element name |
Type | Options | Description |
---|---|---|---|
^x- | any | Allows extensions to the Open Data Product Schema. The field name MUST begin with x-, for example, x-internal-id. The value can be null, a primitive, an array or an object. Can have any valid JSON format value. |
Hello world example
Example of complete working Data Product specification instance:
---
schema: 'https://opendataproducts.org/v3.1/schema/odps.yaml'
version: 3.1
product:
contract:
id: 02323M123
type: ODCS
contractVersion: 2.2.2
contractURL: 'https://datamesh-manager.com/urbanltd/dataproducts/9bd530'
details:
en:
name: UrbanPulse Events
productID: urbanpulse-events-001
valueProposition: >-
Enable smarter city experiences by providing structured, real-time
public event data ready for integration into travel apps, tourism
platforms, and smart city services.
description: >-
UrbanPulse Events is a SmartCity data product that aggregates and
structures public event information — concerts, exhibitions, festivals,
sports events — making it accessible through APIs and dashboards for
internal and future external use.
productSeries: SmartCity Living Data Products
visibility: internal
status: production
productVersion: 0.1.0
versionNotes: >-
Initial internal release with basic event metadata structure and shadow
pricing model implemented.
issues: >-
Current limitations include manual ingestion of some event sources and
partial metadata for smaller events. These will be addressed in the next
update with automated feeds and metadata enrichment.
categories:
- city-events
- tourism
- smartcity
standards:
- ODPS 3.1
tags:
- smartcity
- events
- tourism
- public-data
brandSlogan: Turning City Buzz into Business Value
type: dataset
pricingPlans:
declarative:
en:
- name: Basic Reader
priceCurrency: USD
price: 0
billingDuration: month
unit: recurring
maxTransactionQuantity: 100
offering:
- Standard access to event metadata
- Up to 100 SQL queries per month
- Shared SLA (best-effort availability)
- No prioritization in case of peaks
notes: >-
Shadow pricing only for internal visibility. No actual billing
applied.
paymentGateway:
$ref: '#/paymentGateways/default'
dataQuality:
$ref: '#/dataQuality/default'
SLA:
$ref: '#/SLA/default'
access:
$ref: '#/dataAccess/API'
- name: Extended User
priceCurrency: USD
price: 300
billingDuration: month
unit: recurring
maxTransactionQuantity: 1000
offering:
- Prioritized SQL access during high-demand periods
- Up to 1000 SQL queries per month
- Faster response times
- Moderate rate limits
notes: >-
Shadow pricing estimate based on infrastructure and operational cost
models.
paymentGateway:
$ref: '#/paymentGateways/default'
dataQuality:
$ref: '#/dataQuality/default'
SLA:
$ref: '#/SLA/default'
access:
$ref: '#/dataAccess/API'
- name: High Volume Access
priceCurrency: USD
price: 2000
billingDuration: month
unit: recurring
maxTransactionQuantity: 500000
offering:
- Dedicated API channel for bulk usage
- 'Up to 500,000 SQL queries per month'
- Guaranteed SLA for availability and response time
notes: Shadow pricing for strategic high-usage internal consumers.
paymentGateway:
$ref: '#/paymentGateways/agent'
dataQuality:
$ref: '#/dataQuality/premium'
SLA:
$ref: '#/SLA/premium'
access:
$ref: '#/dataAccess/Agent'
SLA:
declarative:
default:
name:
en: The Basic SLA
description:
en: The basic SLA package
dimensions:
- dimension: uptime
displaytitle:
en: Uptime
objective: 90
unit: percent
- dimension: responseTime
objective: 200
unit: milliseconds
- dimension: updateFrequency
objective: 30
unit: minutes
premium:
name:
en: The Premium SLA
description:
en: The Premium SLA package
dimensions:
- dimension: uptime
displaytitle:
en: Uptime
objective: 99
unit: percent
- dimension: responseTime
objective: 100
unit: milliseconds
- dimension: updateFrequency
objective: 5
unit: minutes
support:
phoneNumber: '+971508976456'
phoneServiceHours: Mon–Fri 8am–4pm (GMT)
email: support@opendataproducts.org
emailServiceHours: Mon–Fri 8am–4pm (GMT)
documentationURL: ''
dataQuality:
declarative:
default:
displaytitle:
en: The Basic Data Quality
description:
en: The basic quality package
dimensions:
- dimension: accuracy
displaytitle:
en: Data Accuracy (percent)
description:
en: >
Data Accuracy ensures the data product reflects the real-world
entities or events it represents, minimizing errors and providing
reliable insights.
objective: 90
unit: percentage
- dimension: completeness
displaytitle:
en: Data Completeness (percent)
objective: 90
unit: percentage
premium:
displaytitle:
en: The Premium Data Quality
description:
en: The premium quality package
dimensions:
- dimension: accuracy
displaytitle:
en: Data Accuracy (percent)
description:
en: >
Data Accuracy ensures the data product reflects the real-world
entities or events it represents, minimizing errors and providing
reliable insights.
objective: 98
unit: percentage
- dimension: completeness
displaytitle:
en: Data Completeness (percent)
objective: 99
unit: percentage
dataAccess:
default:
name:
en: Access to zipped package
description:
en: Latest Dataset and Resources
outputPorttype: file
format: zip
accessURL: url to file as zip
dataonly:
name:
en: Access to latest dataset
description:
en: Latest Dataset
outputPorttype: file
format: CSV
accessURL: url to file as CSV
API:
outputPorttype: API
authenticationMethod: OAuth
specification: OAS
format: JSON
accessURL: 'https://data.cms.gov/data-api/v1/dataset/2/data'
specsURL: 'https://data.cms.gov/provr-enrollment/api-docs'
documentationURL: 'https://data.cms.gov/provr-enrollment/docs'
Agent:
outputPorttype: AI
description:
en: MCP interface for structured data access and agent interaction.
authenticationMethod: Token
specification: MCP 2025-03-26
format: MCP
specsURL: 'https://urbanpulse.ai/llms.txt'
documentationURL: 'https://urbanpulse.ai/llms-full.txt'
paymentGateways:
default:
description:
en: Stripe-based API payment gateway
type: Stripe
version: 1
reference: 'https://docs.stripe.com/'
spec: |
stripe.createCheckoutSession({
amount: 4999,
currency: 'usd',
success_url: 'https://your-platform.com/success',
cancel_url: 'https://your-platform.com/cancel'
});
agent:
description:
en: Payment gateway for AI agents
type: Axio
version: 1
reference: 'https://www.x402.org/'
spec: |
paymentMiddleware("0xYourAddress", {"/mcp-access": "$0.01"});
license:
en:
scope:
definition: >-
The purpose of this license is to determine the terms and conditions
applicable to the licensing of the data product, whereby Data Holder
grants Data User the right to use the data.
restrictions: >-
Data User agrees not to, directly or indirectly, participate in the
unauthorized use, disclosure or conversion of any confidential
information.
geographicalArea:
- EU
- US
permanent: false
exclusive: false
rights:
- Reproduction
- Display
- Distribution
- Adaptation
- Reselling
- Sublicensing
- Transferring
termination:
noticePeriod: 90
terminationConditions: >-
After the expiry of the right of use, the product and its derivatives
must be removed.
continuityConditions: >-
Expired license will automatically continued without written
cancellation (termination) by Data Holder
governance:
ownership: >-
Mindmote Oy, a company specializing in pet industry insights, owns the
license to its proprietary data product 'Pets of the Year'.
damages: >-
During the term of license, except for the force majeure or the Data
Holders reasons, Data User is required to follow strictly in
accordance with the license. If Data User wants to terminate the
license early, it needs to pay a certain amount of liquidated damages.
confidentiality: >-
Data User undertakes to maintain confidentiality as regards all
information of a technical (such as, by way of a non-limiting example,
drawings, tables, documentation, formulas and correspondence) and
commercial nature (including contractual conditions, prices, payment
conditions) gained during the performance of this license.
applicableLaws: >-
This license shall be interpreted, construed and enforced in
accordance with the law of Finland, including Copyright Act 404/1961.
warranties: >-
Data Holder makes no warranties, express or implied, guarantees or
conditions with respect to your use of the data product. To the extent
permitted under local law, Data Holder disclaims all liability for any
damages or losses, including direct, consequential, special, indirect,
incidental or punitive, resulting from Data User use of the data
product.
audit: >-
Data Holder will reasonably cooperate with Data Users by providing
available additional information about the data product. Both parties
will bear their own audit-related costs.
forceMajeure: >-
Both parties may suspend their contractual obligations when
fulfillment becomes impossible or excessively costly due to
unforeseeable events beyond their control, such as strikes, fires,
wars, and other force majeure events.
dataHolder:
en:
legalName: MindMote Oy
businessId: 12243434-12
email: contact@mindmote.fi
taxID: 12243434-12
vatID: 12243434-12
logoURL: 'https://mindmote.fi/logo.png'
description: Digital Economy services and tools
URL: 'https://mindmote.fi'
telephone: +35845 0232 2323
streetAddress: Koulukatu 1
postalCode: '33100'
addressRegion: Pirkanmaa
addressLocality: Tampere
addressCountry: Finland
aggregateRating: ''
ratingCount: '1245'
slogan: ''
parentOrganization: ''
You'll find a complete machine-readbale example of a data product from the right column. It is imaginary data product Urban Pulse which contains derived data about events in the given city. The product has 4 pricing plans which are mostly based on recurring subscription model. Note! Not all voluntary attributes are used in the example and multilingualism has not been fully applied.
Mandatory-only example
Example data product with just the mandatory elements and attributes. This is the minimal representation of a data product metadata that is expected to be found from every data product following ODPS standard. This bare minimum can be expanded with other elements and attributes defined in the specification. Also the possibilty to use extensions exists if local additions are needed.
Example of mandatory-only elements and attributes Open Data Product specification instance:
schema: https://opendataproducts.org/v4.0/schema/odps.yaml
version: 4.0
product:
en:
name: Pets of the year
productID: 123456are
visibility: private
status: draft
type: derived data
Terms used
Here's list of terms used and what we mean with them. The meaning of terms is mostly taken from existing knowledge eg articles and other trusted sources. The source is always linked to the term. In some rare cases term is defined for the specification purposes only.
Term |
Description |
---|---|
Data point | A data point refers to a single, individual unit of data. It can be a number, a word, a measurement, or any other piece of information that is recorded and used for analysis. Some mix the data point with a dataset. In a dataset, each data point represents one observation or measurement. For example, in a dataset of temperatures recorded every day, each daily temperature is a data point. |
Data product | As a strategic resource for companies, data is considered an asset that, like any other material good, has a financial value and whose management generates costs. Data created, collected or used in individual business processes can be sold to other organisations as raw or processed data, so that it no longer serves as an enabler of products, but is the product itself. This leads to the paradigm that data assets can be monetised by exchanging and trading data between organisations as data products and services. There are multiple definitions for data product. In an article authored by Jian Pei (2020), data products "refer to data sets as products and information services derived from data sets." Simon O'Regan's defines data product as a product whose primary objective is to use data to facilitate an end goal. From the academic literature we have found several subtypes of data products: raw data, derived data, data sets, reports, analytic views, 3D visualisations, algorithms, decision support (dashboards) and automated decision-making (Netflix product recommendations or Spotify’s Discover Weekly would be common examples). Typically raw data, derived data and algorithms have technical users. Most often they tend to be internal products in an organisation. If we dive in the data mesh world, this quote from Zhamak Dehghani’s book is key to understand the definition of data as a product: “Domain data teams must apply product thinking […] to the datasets that they provide; considering their data assets as their products and the rest of the organization’s data scientists, ML and data engineers as their customers.” While many of the standard Product Development Rules apply — solve a customer need, learn from feedback, prioritise relentlessly, etc. — data has different characteristics compared to tangible products that prevent the direct transfer of established processes and rules of trading goods, especially in terms of pricing mechanisms. In trading data, there is less willingness to pay. For example, data buyers often do not recognise the potential value of data items because it cannot be fully disclosed prior to purchase (known as the ‘Arrow paradox’). In addition, there is often a lack of notion that the creation, processing, storage and distribution of high-quality data is a major cost factor for the data provider. Another obstacle is the lack of trust and security causing potential data providers to fear that competitors could benefit from disclosure of in-house data. One of the aims of this specification is to tackle above mentioned issues which hinder the growth of data ecosystem and market volatility. |
Data as a service | In computing, data as a service, or DaaS, is a term used to describe cloud-based software tools used for working with data, such as managing data in a data warehouse or analyzing data with business intelligence. It is enabled by software as a service (SaaS). DaaS like all "as a service" (aaS) technology, builds on the concept that its data product can be provided to the user on demand, regardless of geographic or organizational separation between provider and consumer. According to Daniel Newman from Forbes (2017) DaaS is essentially a data stream that subscribers can access on demand. Some people use the term data product in a meaning which contains also data commodities which have more service alike attributes than product attributes. In those cases we prefer to use the term data as a service and call the creation process as data servitization. The term productizement is reserved for the process which creates data products as end result. |
Data as a service business model | Data as a service as a business model is a concept when two or more organizations buy, sell, or trade machine-readable data in exchange for something of value. Data as a service is a general term that encompasses data-related services. Now DaaS service providers are replacing traditional data analytics services or happily clustering with existing services to offer more value-addition to customers. The DaaS providers are curating, aggregating, analyzing multi-source data in order to provide additional more valuable analytical data or information. Typically, DaaS business is based on subscriptions and customers pay for a package of services or definite services. |
Data pipeline | According to Aiswarya et al. the complex chain of interconnected activities or processes from data gen- eration through data reception constitutes a data pipeline. In other words, data pipelines are the connected chain of processes where the output of one or more processes becomes an input for another. It is a piece of software that removes many manual steps from the workflow and permits a streamlined, automated flow of data from one node to another. Moreover, it automates the operations involved in the selection, extraction, transformation, aggregation, validation, and loading of data for further analysis and visualization. It offers end to end speed by removing errors and resisting bottlenecks or delay. Data pipelines can process multiple streams of data simultaneously. |
Infrastructure as Code | Infrastructure as Code (IaC) transforms infrastructure management by using code instead of manual processes. Configuration files capture infrastructure specifications, ensuring consistent environment provisioning. The "as code" paradigm extends beyond infrastructure to encompass quality control and data product processes. This approach, applied to the entire data pipeline, enhances repeatability, traceability, and scalability, fostering collaboration and systematic data management. |
DataOps | In DataOps, the focus lies in creating automated processes for releasing and updating data products throughout their lifecycle, from development to production. This automation spans the entire journey from development to transitioning into production. The objective is to enhance operational efficiency through automation, reduce errors, and enable faster release cycles for data products. |
Editors and contributors
This specification is openly developed and a lot of the work comes from community. We list all community contributors as a sign of appreciation. The editors (as initial creators of the the specification) are Jarkko Moilanen and Jussi Niilahti. Editors take the feedback and draft new candidate releases, which may become the versions of the specification.
List of community contributors
The work around the specification would not be possible without enormous help from the community. Here's list of contributors so far.
- Toni Luhti
- Topi Santakivi
- Antti Loukiala