NAV
yaml

OPEN DATA PRODUCT SPECIFICATION

Development Version

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

The specification is shared under Apache 2.0 license.

VERSION DEV

Version source:

ODPS YAML Schema:

Editors:

Participate:

Introduction

The Open Data Product Specification is a vendor-neutral, open-source machine-readable data product metadata model. It defines the objects and attributes as well as the structure of digital data products. The work is based on existing standards (schema.org), best practices and emerging concepts like Data Mesh. The reasoning is that we reuse and proudly copy instead of reinventing the wheel. More detailed information of the origin can be found from the Open Data Product Specification homepage.

Open Data Product Specification (ODPS) changes the data product metadata model towards a standalone model, which helps to decouple data product from the systems often directly associated with it. With help of the ODPS data product can be presented and described to the customer also as such without any need for marketplace or other systems.

This version signifies a step towards embracing Everything as Code paradigm, but is still experimental. Both SLA and Data quality support now "as code" monitoring.

Development of the standard is coordinated in Open Data Product Initiative (ODPI) which was established in July 2022 to make it possible for the specification to grow and become institutionlized. The ODPI was taken under the wings of open source chapter of Open Collective.

odps-features

Specification aims and aspects

Specification aims:

Note! In the "Open Data Product" focus is on the latter words and the prefix 'open' refers to the openness of the standard. Any kind of connotations to open data (a different thing) are not intentional, intended, or desirable.

The specification has been designed with four major aspects of the data product in mind: 1) technical (infrastructure & access), 2) business (pricing & plans), 3) legal (licensing & IPR), and 4) ethical (privacy & mydata). The four aspects are described in 7 elements, which contain attributes and other elements.

odps-model

If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github

Document structure

LEFT COLUMN: Navigation

The left column is navigation which enables fluent and easy movement around the specification.

MIDDLE COLUMN: Principles and components

The middle column contains detailed information about the included components and related options. This is the theory part.

Note! Mandatory elements and attributes are listed separately in the definition tables. This enables user to construct minimum viable specification more easily and fast. https://schema.org provided ready-made definitions are applied when ever possible instead of re-inventing the wheel.

RIGHT COLUMN: Examples

The right column contains YAML formatted examples of how the specification is used. In the future other output formats are added on request basis. YAML can easily be converted to JSON if needed.

Example of YAML formatted snippet from the Open Data Product Specification:

monitoring:
  space: 
    https://monitoring.com

Document level attributes

Here's the list of attributes which can occur at the document root level. In the following description, if a field is not explicitly REQUIRED or described with a MUST or SHALL, it can be considered OPTIONAL. Optional attributes are listed in own table and examples are given on the right column.

Mandatory attributes

Example of document level attribute usage and structure:

schema: https://opendataproducts.org/v3.0rc/schema/odps.yaml
version: 3.0
product:
  details:
    en:
      name: Pets of the year
      productID: 123456are
      visibility: private
      status: draft
      type: dataset
    fi:
      name: Vuoden suosituimmat lemmikit
      productID: 123456are
      visibility: private
      status: draft
      type: dataset
Element name
Type Options Description
schema URL Valid URL. See more from RFC 3986. REQUIRED Defines the URL of Schema. Used often for validation purposes.
version string This is the version of ODPS, for example dev or 2.2 REQUIRED Defines the ODPS version.
product element root element REQUIRED Root element to tie all together.
details element product business details REQUIRED Binds together business details in different languages.
en element ISO 639-1 defined 2-letter codes REQUIRED - NOTE! This is a dynamic element! This element binds together other product attributes and expresses the langugage used. In the example this is "en", which indicates that product details are in English. If you would like to use French details, then name the element "fr". The naming of this element follows options (language codes) listed in ISO 639-1 standard.

You can have product details in multiple languages simply by adding similar sets like the example - just change the binding element name to matching language code.

The pattern to implement multilanguage support for data products was adopted from de facto UI translation practices. The attributes inside this element are commonly rendered in the UI for the consumer and providing a simple way to implement that was the driving reasoning. See for example JSON - Multi Language
name string max length 256 chars REQUIRED The name of the product.
productID string max length 256 chars REQUIRED Product identifier.
visibility one of one of: private, invitation, organisation, dataspace, public REQUIRED The publicity level eg who can see this product. Private - just the creator. Invitation - visible only to parties explicitly invited. Organisation - visible to all in your organisation. Dataspace - visible to all existent members of the data space. Public - visible to all publicly.
status one of one of: announcement, draft, development, testing, acceptance, production, sunset, retired REQUIRED The status of the product. Lifecycle model discussed in details in here (link).
type one of Options: raw data, derived data, dataset, reports, analytic view, 3D visualisation, algorithm, decision support, automated decision-making, data-enhanced product, data-driven service, data-enabled performance, bi-directional. REQUIRED The type of the product. Options are derived from examples and lists found from academic literature.

Optional attributes

RecommendedDataProducts OBJECT contains an array of data products which offers means to attach related data products to the data product at hand. The source of the recommended data product might be from the same marketplace/catalog or an external one. Recommended object offers method to extend the reach and promotion escpecially when data product is treated as an independent entity much like described in Data Mesh. Also when data product is published in a marketplace, the Recommended object offers means to promote other than just the data products from the given data marketplace. In short, tis object is mainly for discovery and reach purposes.

RecommendedUseCases OBJECT is an array which contains offers method to attach usefull usecases to the data product. Usecases are informatiove for the the data customer and exemplify how the data product can create value.

Example of document level attribute usage and structure:

schema: https://opendataproducts.org/v3.0rc/schema/odps.yaml
version: 3.0
product:
  details: 
    en:
      name: Pets of the year
      productID: 123456are
      valueProposition: Design a customised petstore using a data product that describes
        pets with their habits, preferences and characteristics.
      description: This is an example of a Petstore product.
      productSeries: Lovely pets data products
      visibility: private
      status: draft
      productVersion: '0.1.0'
      versionNotes: New version with additional details such more accurate pet details
      issues: The current issues include incorrect information in the dog breeds. The
        resolution for these problems is planned for the next     update, scheduled
        to be released on July 15th, 2023.
      categories:
      - pets
      standards:
      - ISO 24631-6
      tags:
      - pet
      brandSlogan: Passion for the data monetization
      type: dataset
      contentSample: https://download.com/pets.json
      logoURL: https://data-product-business.github.io/open-data-product-spec/images/logo-dps-ebd5a97d.png
      OutputFileFormats:
      - JSON
      - XML
      - CSV
      - ZIP
      - PDF
      useCases:
      - useCase:
          useCaseTitle: Build attractive and lucrative petstore!
          useCaseDescription: Use case description how succesfull petstore chain was
            established in Abu Dhabi
          useCaseURL: https://marketplace.com/usecase1
      recommendedDataProducts:
      - https://marketplace.com/dataproduct.json
      - https://marketplace.com/dataproduct-another.json

Element name
Type Options Description
created date Use ISO 8601 When product was created.
updated date Use ISO 8601 When product was last updated.
valueProposition string text content, max length 512 chars This is the product's value proposition. Often one or two sentences and crystallizes the value for the customer.
description string - The description of the product. Text only.
productSeries string - A group of products in the product mix which are associated with each other and they can be obtained for the same type of customers or they are marketable for the same type of market place.
categories array - Comma separated array of categories.
standards array - Comma separated array of standards related e.g. to data content or quality, such as ISO 8000 or ISO 19131.
tags array - Comma separates array of tags.
productVersion string The versioning according to SemVer The version of the data product. Applies for ODPS metadata as well.
versionNotes string - Additional information about the version.
issues string - There may be errors in the data product that require corrections. These issues will be briefly described to users, along with information about when the fixes will be implemented.
contentSample URL Valid URL. See more from RFC 3986. Sample content of the data product, for example JSON/XML output. This sample should match the actual data product output and give the data consumer an idea what to expect. Obviously if the data product is pure service for example dashboard or algorithm, then consider providing preview version or alike
logoURL URL Valid URL Valid URL of the logo. See more from RFC 3986.
outputFileFormats string - Output file formats for data product
brandSlogan string - Brand related slogan like Nike has just do it.
useCases element array Contains list of related use cases with description information and link to details. NOTE! These examples are expected to use same language as defined previously in the data product details content binding element.
useCaseTitle string string Title of the usecase.
useCaseDescription string string Brief description of the usecase.
useCaseURL URL Valid URL, RFC 3986 Valid URL of the more detailed usecase description.
recommendedDataProducts array Array of valid URLs (RFC 3986) Data products to recommend use next to this data product or even as replacement (for comparison). The URL provided MUST reference a description of a data product following this same standard

If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github

Data SLA

Data Service Level Agreement (SLA) Object contains attributes which define the desired and promised quality of the data product.

A Data Service Level Agreement (SLA) is a contractual agreement between a data service provider and its customers that defines the expected level of service quality, performance, and availability for the data services provided. SLAs outline specific metrics, targets, and responsibilities that both parties agree to adhere to, ensuring accountability and transparency in the delivery of data services.

Defining Data SLAs in a machine-readable format enhances automation, facilitates monitoring, enables real-time compliance tracking, and supports seamless integration with monitoring and alerting systems.

SLA can be defined with 11 standardized dimensions with Everything as Code monitoring

SLA Dimension
Description
latency minimal amount of time before getting any response.
uptime Uptime is a measure of system reliability, expressed as the percentage of time a machine, typically a computer, has been working and available. See more https://uptime.is/.
responseTime amount of time to process external request.
errorRate Maximum tolerated errors in data, percentage.
endOfSupport The date at which your product will not have support anymore.
endOfLife The date at which your product will not be available anymore. No support, no access.
updateFrequency how often data is updates.
timeToDetect How fast can you detect a problem?
timeToNotify Once you see a problem, how much time do you need to notify your users?
timeToRepair How long do you need to fix the issue once it is detected?
emailResponseTime How long do you need to respond to email support requests?

Template structure of SLA array component:

 - dimension: selected dimension
    objective: 
    unit: 
    monitoring:
      type:  
      reference: 
      spec:

Each dimension has objective value, a unit and then monitoring "as code" to verify objective. In some cases monitoring is not feasable or possible to arrange for various reasons. Type attribute indicates which monitoring system is used. Reference attribute contains url for reference documentation regarding the monitoring spec. Spec contains the actucal "as code" part as YAML or string which can be executed in selected monitoring system as is. Note! The "as code" part of the component is the initial step towards embracing Everything as Code paradigm, but is still experimental.

The SLA object is general in nature and should be enough for common (80%) use cases. Note that you can make extensions to the standard with "x-" mechanism in order to fulfill any industry specific needs. The "Specification extensions" section provides details on how to use this feature.

Also basic email and phone support information can be expressed inside the SLA component.

No mandatory attributes at the moment. Optional attributes are listed in own table and an example is given in the right column.

Optional attributes and elements

Example of SLA component usage:


SLA:
  - dimension: latency
    displaytitle:
      - en: Latency
    objective: 100
    unit: milliseconds
    monitoring:
      type: prometheus
      reference: https://prometheus.io/docs/prometheus/latest/querying/basics/ 
      spec:  # expressed as string or inline yaml
        myTimer.observeDuration();

  - dimension: uptime
    displaytitle:
      - en: Uptime
    objective: 99
    unit: percent
    monitoring:
      type: prometheus
      reference: https://prometheus.io/docs/prometheus/latest/querying/basics/
      spec:  | # expressed as string or inline yaml
        avg_over_time(
          (
            sum without() (up{job="prometheus"})
              or
            (0 * sum_over_time(up{job="prometheus"}[7d]))
          )[7d:5m]
        )    

  - dimension: responseTime
    objective: 200
    unit: milliseconds
    monitoring:
      type: prometheus
      reference: https://prometheus.io/docs/prometheus/latest/querying/basics/ 
      spec:  | # expressed as string or inline yaml
        rate(http_server_requests_seconds_sum[$__rate_interval]) / rate(http_server_requests_seconds_count[$__rate_interval])

  - dimension: updateFrequency
    objective: 30
    unit: minutes
    monitoring:
      type: prometheus 
      spec: | # expressed as string or inline yaml
        time() - max_over_time(timestamp(changes(table[5m]) > 0)[1d:1m])

  - dimension: errorRate
    objective: 0.1
    unit: percent

  - dimension: endOfSupport
    objective: 01/01/2025 # dd/mm/yyyy
    unit: date

  - dimension: endOfLife
    objective: 01/03/2025 # dd/mm/yyyy
    unit: date

  - dimension: timeToDetect
    objective: 60
    unit: minutes

  - dimension: timeToNotify
    objective: 120
    unit: minutes

  - dimension: timeToRepair
    objective: 24
    unit: hours

  - dimension: emailResponseTime
    objective: 12
    unit: hours

  support:
      phoneNumber: '+971508976456'
      phoneServiceHours: 'Mon-Fri 8am-4pm (GMT)'
      email: support@opendataproducts.org
      emailServiceHours: 'Mon-Fri 8am-4pm (GMT)'
      documentationURL: ''
Element name
Type Options Description
SLA element - Binds the SLA related elements and attributes together
dimension attribute string, one of: latency, uptime, responseTime, errorRate, endOfSupport, endOfLife, updateFrequency, timeToDetect, timeToNotify, timeToRepair, emailResponseTime Defines the SLA dimension.
unit attribute Options for unit are: milliseconds, seconds, minutes, days, weeks, months, years, never, date, null.

Name of the quality attribute indicating the timely interval. If date is given, format is dd/mm/yyyy
monitoring element - Contains the monitoring (computational "as code") structure to validate target state for the selected SLA dimension.
displayTitle array - Dimension title to be shown is various UIs. Keep it short and sweet.
en attribute ISO 639-1 defined 2-letter codes This element binds together other product attributes and expresses the langugage used. In the example this is "en", which indicates that product details are in English. If you would like to use French details, then name the element "fr". The naming of this element follows options (language codes) listed in ISO 639-1 standard.

You can have product details in multiple languages simply by adding similar sets like the example - just change the binding element name to matching language code.

The pattern to implement multilanguage support for data products was adopted from de facto UI translation practices. The attributes inside this element are commonly rendered in the UI for the consumer and providing a simple way to implement that was the driving reasoning. See for example JSON - Multi Language
type attribute string monitoring system name name such as Prometheus. The systems enable as code approach to monitor SLA.
spec element YAML or string contains the as code part for monitoring. Content is intended to be in a form that can be injected as is to defined monitoring system.
reference URL Valid URL Provide URL for the reference documentation
support element - Support element describes how the customer can reach for help in case of difficulties in usage, billing, or otherwise.
phoneNumber string valid phone number The support phone number. Use E.164
phoneServiceHours string - Describes the service hours company provides. Contains information often in week level eg Mon-Fri at 8am - 4pm.
email string valid email address Email information for support requests. Use RFC2822
emailServiceHours string - Describes the email service hours company provides. Contains information often in week level eg Mon-Fri at 8am - 4pm.
documentationURL URL Valid URL URL to documentation

If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github

Data Quality

Data quality is essential for one main reason: You give customers the best experience when you make decisions using accurate data. A great customer experience leads to happy customers, brand loyalty, and higher revenue for your business. Information is only valuable if it is of high quality.

By adhering to defined quality characteristics, organizations can maximize the value of their data assets, improve decision-making, enhance operational efficiency, and maintain trust and confidence in their data-driven processes and systems.

How can you assess your data quality? ODPS is compatible with EDM Council data quality model.

ODPS offers 8 standardized options to define and measure data quality with Everything as Code monitoring

Data Quality Dimension
Description
accuracy The measurement of the veracity of data to its authoritative source
completeness Data is required to be populated with a value (aka not null, not nullable). Completeness checks if all necessary data attributes are present in the dataset.
conformity Data content must align with required standards, syntax (format, type, range), or permissible domain values. Conformity assesses how closely data adheres to standards, whether internal, external, or industry-wide.
consistency Data should retain consistent content across data stores. Consistency ensures that data values, formats, and definitions in one group match those in another group.
coverage All records are contained in a data store or data source. Coverage relates to the extent and availability of data present but absent from a dataset.
timeliness The data must represent current conditions; the data is available and can be used when needed.
validity Validity refers to the extent to which the data accurately and appropriately represents the real-world object or concept it is supposed to describe.
uniqueness Uniqueness means each record and attribute should be one-of-a-kind, aiming for a single, unique data entry

Template structure of Data Quality array component:

 - dimension: selected dimension
    objective: 
    unit: 
    monitoring:
      type:  
      reference: 
      spec:

Each dimension has objective value, a unit and then monitoring "as code" to verify objective. In some cases monitoring is not feasable or possible to arrange for various reasons. Type attribute indicates which monitoring system is used. Reference attribute contains url for reference documentation regarding the monitoring spec. Spec contains the actucal "as code" part as YAML or string which can be executed in selected monitoring system as is. See template example.

Note! The "as code" part of the component is the initial step towards embracing Everything as Code paradigm, but is still experimental.

The values of the QA attributes are given by the vendor. Should you trust in the values, is the choice made by the data consumer. If possible utilize automatic checking of data quality against the source and update the values accordingly.

The QA object is general in nature and should be enough for common (80%) use cases. Note that you can make extensions to the standard with "x-" mechanism in order to fulfill any industry specific needs. The "Specification extensions" section provides details on how to use this feature.

Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle. That is why integrity is not in the attributes, but accuracy and consistency as well as completeness are.

Note! The "as code" spec part of the component is the initial step towards embracing Everything as Code paradigm, but is still experimental. We need more vendors supporting the approach. In the mean while you can use custom solutions.

Optional attributes and elements

Example of Data Quality component with some of the data quality dimensions:


dataQuality:
  - dimension: accuracy
    displaytitle:
    - en: Data Accuracy (percent)
    - fi: Datan virheettömyys (prosenttia)
    objective: 98
    unit: percentage
    monitoring:
      type: SodaCL 
      reference: https://docs.soda.io/soda-cl/soda-cl-overview.html
      spec:
        - require_unique(member_id) 
        - require_range(age_band, 18, 100)

  - dimension: completeness
    displaytitle:
    - en: Data Completeness (percent)
    objective: 99.9
    unit: percentage
    monitoring:
      type: SodaCL 
      reference: https://docs.soda.io/soda-cl/soda-cl-overview.html
      spec:
        - for each column:
            name: [member_id, gender, age_band]
            checks:
              - not null:
                  fail: when > 0.1% # Fail if more than 0.1% of records are null

  - dimension: consistency
    displaytitle:
    - en: Data Consistency (percent)
    - fi: Datan johdonmukaisuus (prosenttia)
    objective: 98
    unit: percentage

  - dimension: timeliness
    objective: 100
    unit: percentage

  - dimension: validity
    objective: 98
    unit: percentage

  - dimension: uniqueness
    objective: 100
    unit: percentage

Element name
Type Options Description
dataQuality element - Contains array of data quality dimensions with optional computational monotoring object. Binds the data quality related elements and attributes together
dimension attribute string, one of: accuracy, completeness, conformity, consistency, coverage, timeliness, validity, or uniqueness. Defines the data quality dimension.
objective attribute integer Defines the target value for the data quality dimension
unit attribute string. One of: percentage, number Defines the unit used in stating the target quality level.
monitoring element - Contains the monitoring (computational "as code") structure to validate target state for the selected data quality dimension.
displayTitle array - Dimension title to be shown is various UIs. Array contains array list of titles in desired amount of languages.
en attribute ISO 639-1 defined 2-letter codes This element binds together other product attributes and expresses the langugage used. In the example this is "en", which indicates that product details are in English. If you would like to use French details, then name the element "fr". The naming of this element follows options (language codes) listed in ISO 639-1 standard.

You can have product details in multiple languages simply by adding similar sets like the example - just change the binding element name to matching language code.

The pattern to implement multilanguage support for data products was adopted from de facto UI translation practices. The attributes inside this element are commonly rendered in the UI for the consumer and providing a simple way to implement that was the driving reasoning. See for example JSON - Multi Language
type attribute string monitoring system name name such as SodaCL and Montecarlo. The systems enable as code approach to monitor data quality.
reference URL Valid URL Provide URL for the reference documentation
spec element YAML or string contains the as code part for monitoring. Content is intended to be in a form that can be injected as is to defined monitoring system. Content depends of the system used and reference attribute is expected to provide more information.

If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github

Data Pricing Plans

Pricing is the process whereby a business sets the price at which it will sell its products and services. Pricing OBJECT consists of mandatory and optional attributes. This element contains pricing plans related data to be used for example in displaying the items in a marketplace. If needed the standard metadata is converted to marketplace internal format. We encourage all data product owners to enforce usage of this standard.

12 standardized pricing models include:

Pricing plan
Description
Recurring time period based In the simplest terms, recurring payments (also known as subscription payments, automatic payments, or recurring billing) take place when customers authorize a merchant to charge them repeatedly for goods or services on a prearranged schedule.
One time payments plans One Time Fee Revenue Model is a business model that generates revenue through a single payment for perpetual product use or service access. The One Time Fee Revenue Model is a fundamental concept in the world of small businesses and entrepreneurship.
Pay-as-you-go plans The Pay As You Go Plan is a flexible alternative to a monthly plan. Instead of paying a recurring monthly charge, you buy credits as needed.
Revenue sharing plans Revenue sharing is a performance-based income model that involves sharing business profits or losses among participating partners. Revenue sharing is a profit-sharing system that ensures all parties involved are compensated for their contribution to the business.
Data volume plan Volume pricing is a pricing strategy in which an item's price per unit decreases as the purchase quantity increases.
Trial A free trial pricing strategy offers target customers a chance to try your product for free for a limited time. It is a sales promotion technique that uses the product to market itself.
Dynamic pricing Dynamic pricing is a pricing strategy that applies variable prices instead of fixed prices. Instead of deciding on a set price for a season, retailers can update their prices multiple times per day to capitalize on the ever-changing market.
Pay what you want plans Also known as PWYW pricing, is a pricing strategy in which buyers pay the desired price for a particular product, commodity, or service. The approach may sometimes lead to the value of zero. Following the buyer's guidance, one can set a suggested price and a minimum price.
Freemium A type of business model that offers basic features of a product or service to users at no cost and charges a premium for supplemental or advanced features.
Open data Access to open data is expected to be free of cost, but in some cases it is also possible to collect fees to cover costs of the service.
Value-based Value-based pricing is a strategy of setting prices primarily based on a consumer's perceived value of a product or service. Value-based pricing is customer-focused, meaning companies base their pricing on how much the customer believes a product is worth. Often worth to provide customer a value simulator to see expected value gains and possibly set the price based on that. Pricing would be customized per customer.
On Request Access to data product is given only on request. Often provider expects customer to meet provider first. In the discussions conditions, pricing etc are agreed.

Optional attributes and elements

Example of Pricing component usage in english:


pricingPlans:
  en:
  - name: Premium subscription 1 year
    priceCurrency: EUR
    price: 50.00
    billingDuration: year
    unit: recurring
    maxTransactionQuantity: unlimited
    offering:
      - High Quality Pets data
      - Unlimited transactions
      - Billed annually 
  - name: Premium Package Monthly
    priceCurrency: EUR
    price: 5.00
    billingDuration: month
    unit: recurring
    maxTransactionQuantity: unlimited
    offering:
      - High Quality Pets data
      - Unlimited transactions
      - Billed monthly 
  - name: Freemium Package
    priceCurrency: EUR
    price: 0.00
    billingDuration: month
    unit: recurring
    maxTransactionQuantity: 1000
    offering:
      - High Quality Pets data
      - Free to use, no cost at all!
      - Fair amount of transactions for testing and small business 
  - name: Revenue sharing
    priceCurrency: percentage
    price: 5.50
    billingDuration: month
    unit: revenue-sharing
    maxTransactionQuantity: 20000
    offering:
      - High Quality Pets data
      - No upfront fee
      - Billed monthly 

Element name
Type Options Description
pricingPlans element - Binds the pricing plans related elements and attributes together
en element ISO 639-1 defined 2-letter codes NOTE! This is a dynamic element! This element binds together other product pricing plan attributes and expresses the langugage used. In the example this is "en", which indicates that pricing plan details are in English. If you would like to use French details, then name the element "fr". The naming of this element follows options (language codes) listed in ISO 639-1 standard.

You can have product pricing plan details in multiple languages simply by adding similar sets like the example - just change the binding element name to matching language code.

The pattern to implement multilanguage support for data products was adopted from de facto UI translation practices. The attributes inside this element are commonly rendered in the UI for the consumer and providing a simple way to implement that was the driving reasoning. See for example JSON - Multi Language
priceCurrency string Use standard formats: ISO 4217 currency format e.g. "USD"; Ticker symbol for cryptocurrencies e.g. "BTC" The primary currency used in pricing. Platforms are assumed to use this as primary currency if currency conversions are used to display product pricing in different locations for various currencies. If the unit is revenue-sharing, then this attribute value MUST be percentage.
price string - The offer price of a product, or of a price component, or revenue-sharing percentage.

If the unit of pricing is revenue-sharing, then this price attribute value is percentage value.

Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols.

With data-volume the price is for each 1GB of data.
billingDuration string options: instant, day, week, month, year Specifies for how long this price (or price component) will be billed. Can be used, for example, to model the contractual duration of a subscription or payment plan.
unit string One of: One-time-payment, Pay-per-use, Recurring, Revenue-sharing, Data-volume , Pay-what-you-want, Freemium, Open-data, Value-based, On-request, Trial One-time-payment is for single time purchase purposes, further purchaces are not intended to continue under same agreement.

Pay-per-use is intended for continuous usage and price set is for each successful usage action.

Recurrring is intended for continuous time period plans.

Revenue sharing is a performance-based income model. An effective revenue sharing deal structure is offering your expertise to a business owner to help them grow their business. In return, you get paid a percentage of the revenue as a royalty fee.

Freemium is for free access. Use this option also for open data.

Data-volume is for data amount based pricing in which customer pays based on the served data amount. The price is always for 1GB of data.

Pay-what-you-want is a pricing system where buyers pay any desired amount for a given commodity, sometimes including zero. In some cases, a minimum (floor) price may be set, and/or a suggested price may be indicated as guidance for the buyer. The buyer can also select an amount higher than the standard price for the commodity. If the floor price is set, use minPrice attribute.

Open-data is an explicit pricing plan category for open data. By default open data should be free, but in some cases it can have a price.

Value-based is value-based selling unit. Present the outcome of your story with solid data and a measurable impact with help of offering attribute. Example: “We can lower the energy bill in heating by $8-13/square meter in a year. Try out simulator to calculate your value!”. Use optional valueSimulator attribute to provide link (URL) to value simulator you have created. In order to set base fee for value-based plan, you can for example set monthly (billingDuration) plan with base see with help of minPrice attribute.

On-request is for plans in which customer is given access to data product after contacting provider. Use provider contact information in providing means to contact data product provider for access permissions request. If the trial is used, then trial duration should be defined in the offering part.
maxTransactionQuantity Integer Integer The maximum transaction quantity for the given billing duration. Use this to define for example monthly (or any other period) request limit to the data product. Note! If you want to set unlimited use, value must be 0 (zero).
offering string array The element that contains pricing plan content as array of strings. Think of this as the list of what is included in the pricing plan and what you offer in return to the price asked. Use the language defined in the plan
minPrice string - The lowest price if the price is a range. If dynamic pricing is used with this product, this is the lowest price allowed. In dynamic pricing businesses are able to change prices based on algorithms that take into account competitor pricing, supply and demand, and other external factors in the market. Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols.
maxPrice string - The highest price if the price is a range. If dynamic pricing is used with this product, this is the highest price allowed. Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols.
valueAddedTaxIncluded boolean true/false Specifies whether the applicable value-added tax (VAT) is included in the price specification or not.
valueAddedTaxPercentage Integer Number percentage value, range 0-100 Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols.
validFrom DateTime A combination of date and time in ISO 8601 format yyyy-MM-dd'T'HH:mm:ss.SSSZ. The date when the item becomes valid.
validTo DateTime A combination of date and time in ISO 8601 format yyyy-MM-dd'T'HH:mm:ss.SSSZ. The date after when the item is not valid.
additionalPrice string - This is used to define fees for usage which exceeds the defined max transaction quantity. This value is for each additional transaction. Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols.
maxDataQuantity Integer - The maximum amount of data transferred during the billing duration. Unit is GB.
valueSimulator url valid url Intended to be used with value-based pricing plan. Provide url to value simulator in which customer can see the value in various cases. In the simulator customer might be able to input own variables to match their exact case and see the gained value.

If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github

Data Licensing

Data product licensing is essential for ensuring legal compliance, protecting intellectual property rights, enabling monetization and commercialization, facilitating control and governance, managing liability and indemnification, promoting standardization and interoperability, and fostering transparency and trust in data exchange and collaboration. By establishing clear and enforceable licensing agreements, data producers and consumers can effectively leverage data assets while minimizing legal risks and maximizing the value of data products.

The data product may be exploited e.g. by licensing its use and exploitation to third parties. Machine-readable license as part of the specification is implemented for this purpose. It can be used to conclude various agreements regarding data protection, processing and intellectual property rights (IPR). Data can be protected by one or more intellectual property rights. Principle is that when a third party (Data User) exploits the data, it must have a license or other right from Data Holder to exploit the data.

Optional attributes and elements

Example of License Object usage:


license:
  en:
    scope:
      definition: The purpose of this license is to determine the terms and conditions
        applicable to the licensing of the data product, whereby Data Holder grants
        Data User the right to use the data.
      restrictions: Data User agrees not to, directly or indirectly, participate in
        the unauthorized use, disclosure or conversion of any confidential information.
      geographicalArea:
      - EU
      - US
      permanent: False
      exclusive: False
      rights:
      - Reproduction
      - Display
      - Distribution
      - Adaptation
      - Reselling
      - Sublicensing
      - Transferring
    termination:
      noticePeriod: 90
      terminationConditions: After the expiry of the right
        of use, the product and its derivatives must be removed.
      continuityConditions: Expired license will automatically continued without written
        cancellation (termination) by Data Holder
    governance:
      ownership: Mindmote Oy, a company specializing in pet industry insights, owns
        the license to its proprietary data product 'Pets of the Year'.
      damages: During the term of license, except for the force majeure or the Data
        Holders reasons, Data User is required to follow strictly in accordance with
        the license. If Data User wants to terminate the license early, it needs to
        pay a certain amount of liquidated damages.
      confidentiality: Data User undertakes to maintain confidentiality as regards all
        information of a technical (such as, by way of a non-limiting example, drawings,
        tables, documentation, formulas and correspondence) and commercial nature (including
        contractual conditions, prices, payment conditions) gained during the performance
        of this license.
      applicableLaws: This license shall be interpreted, construed and enforced in accordance
        with the law of Finland, including Copyright Act 404/1961.
      warranties: Data Holder makes no warranties, express or implied, guarantees or
        conditions with respect to your use of the data product. To the extent permitted
        under local law, Data Holder disclaims all liability for any damages or losses,
        including direct, consequential, special, indirect, incidental or punitive,
        resulting from Data User use of the data product.
      audit: Data Holder will reasonably cooperate with Data Users by providing available
        additional information about the data product. Both parties will bear their
        own audit-related costs.
      forceMajeure: Both parties may suspend their contractual obligations when fulfillment
        becomes impossible or excessively costly due to unforeseeable events beyond
        their control, such as strikes, fires, wars, and other force majeure events.

Element name
Type Options Description
license element - Binds the licensing related elements and attributes together.
en attribute ISO 639-1 defined 2-letter codes This element binds together other product attributes and expresses the langugage used. In the example this is "en", which indicates that product details are in English. If you would like to use Arabic details, then name the element "ar". The naming of this element follows options (language codes) listed in ISO 639-1 standard.

You can have product details in multiple languages simply by adding similar sets like the example - just change the binding element name to matching language code.

The pattern to implement multilanguage support for data products was adopted from de facto UI translation practices. The attributes inside this element are commonly rendered in the UI for the consumer and providing a simple way to implement that was the driving reasoning. See for example JSON - Multi Language
scope element - Extent, range, coverage, area or space of the license.
definition string text content, max length 512 chars Background and purpose of the license.
restrictions string text content, max length 512 chars Restrictions of the license.
geographicalArea string ISO 3166-1 alpha-2 codes License right restricted to the geographical area.
permanent boolean true/false License with no expiration date.
exclusive boolean true/false The exclusive license holder is given complete control over the use of the data product, and no other person or organization is allowed to use it during the term of the license agreement.
rights array Options: Reproduction (rights to reproduce), Display (disclose data to others), Distribution (right to distribute), Adaptation (right for derivate work), Reselling (right to resell), Transferring (transferable data license), Sublicensing (license grant may include a right to sublicense) Rights granted by the licence. The texts in brackets and italic are intended to describe rights.
termination element - Licence termination and continuity related conditions.
noticePeriod integer unit is days The notice period is a particular time that an data product provider or consumer must give before ending the contract. This time window allows both sides to make the necessary preparations, guaranteeing an unhindered transfer.
terminationConditions string text content, max length 512 chars Cancellation conditions of the license.
continuityConditions string text content, max length 512 chars Continuity conditions of the license.
governance element - Governance is the approach taken to ensure that the agreed outcomes are being fulfilled.
ownership string text content, max length 512 chars Data product licensing ownership.
audit string text content, max length 512 chars License auditing terms.
warranties string text content, max length 512 chars License warranties.
damages string text content, max length 512 chars Damages refers to the sum of money (i.e. indemnifications) for a breach of some duty or violation of license right.
confidentiality string text content, max length 512 chars Restrictions and requirements imposed on the Data User regarding e.g. the use and disclosure of the Data Holder's confidential information.
applicableLaws string text content, max length 512 chars Applicable laws, i.e local acts, degrees or law.
forceMajeure string text content, max length 512 chars Force majeure is a clause that is included in contracts to remove liability for unforeseeable and unavoidable catastrophes that interrupt the expected course of events and prevent participants from fulfilling obligations. These clauses generally cover both natural disasters and catastrophes created by humans.

DataOps

DataOps is a process whereby a data product pipeline deployment method is defined. Usually the deployment script contains the logic of the individual steps as well as the code chaining the steps together.

DataOps OBJECT describes building, deploying, and running data product's code, and storing and giving access to data and metadata. This principle has been adopted from the Data Mesh.

Optional attributes and elements

Example of DataOps component usage:


dataOps:
  data:
    schemaLocationURL: http://http://192.168.10.1/schemas/2016/petshopML-2.3/schema/petstore.xsd

  lineage:
    dataLineageTool: Collibra
    dataLineageOutput: http://192.168.10.1/lineage.json

  infrastructure:
    platform: Azure
    region: West US 2 (Washington)
    storageTechnology: Azure SQL
    storageType: sql
    containerTool: helm

  build:
    format: yaml
    hashType: SHA-2
    checksum: 7b7444ab8f5832e9ae8f54834782af995d0a83b4a1d77a75833eda7e19b4c921
    signatureType: JWK
    scriptURL: http://192.168.10.1/rundatapipeline.yaml
    deploymentDocumentationURL: http://192.168.10.1/datapipeline

Element name
Type Options Description
dataOps element - Binds the dataOps related elements and attributes together.
infrastructure element - Infrastructure is a process whereby a data product pipeline deployment method is defined.
platform string any Platform infrastructure, such as AWS, GCP, Azure.
region string any Provide details of cloud region of AWS, Azure or alike. Examples for AWS: US West (Oregon), Canada (Central), US East (N. Virginia), US East (Ohio). Examples for Azure: Canada Central (Toronto), East US 2 (Virginia), West US 2 (Washington)
storageTechnology string any Describes the internal storage area technology, such as Amazon S3, Google Cloud Storage, Azure Blob Storage, Azure SQL.
storageType string any Describes the internal storage type, such as files, sql, events, MQTT.
containerTool string any A name of the package manager, container or infrastructure as code tool.
format string any Type of script language.
schemaLocationURL URL Valid URL The URL of the data product schema, such as XSD, XML or JSON Schema.
scriptURL URL Valid URL The URL of the deployment script. Script can be used for implementing the data product. In a Data Mesh -model it can be used to define, for example, one or more outputs which take the data from source systems or other data products.
deploymentDocumentationURL URL Valid URL The URL of the deployment documentation.
datalineageTool URL Valid URL A tool to view the data lineage.
datalineageOutput URL Valid URL The URL of the data lineage output. Data lineage output shows the mapping of source data to target output on a metadata level
hashType string One of: SHA-1, SHA-2, SHA-3 Type of secure hash algorithm for checksum.
checksum string any Script checksum.
signatureType string any A public-key cryptosystem,such as JWK, PKCS#12, or PEM.

Data Access

Data Access OBJECT describes the authorised ability to retrieve, edit, copy or transfer data from IT systems.

Optional attributes and elements

Example of Data Access component usage:


dataAccess:
  interface:
    outputPorttype: API
    authenticationMethod: OAuth
    specification: OAS
    format: GraphQL
    specsURL: http://192.168.10.1/petshop.json
    documentationURL: http://192.168.10.1/petshop

Element name
Type Options Description
dataAccess element - Binds the data access related elements and attributes together.
interface element - Reference to the ability to use data.
outputPorttype string any Type of data access, such as API, SQL, sFTP, gRPC.
hashType string any Type of secure hash algorithm, such as SHA-1, SHA-2, for checksum, when output is file(s).
checksum string any File checksum.
authenticationMethod string any Data access authentication method type, such as API key, HTTP Basic, OAuth, No authentication.
specification string any Type of the data access specification, such as OAS, RAML, Slate.
format string any Data access file format type, such as JSON, XML, GraphQL, plain text.
specsURL URL Valid URL The URL of the data access documentation, preferably in a machine-readable format, such as OpenAPI specs.
documentationURL URL Valid URL The URL of the separated data access documentation or guide. For example, it may contain instructions on how to create and manage api keys.

Data Holder

DataHolder Object describes the Organization legally allowed to create, develop and publish data products.

Data holder means "a legal person, public body, international organisation, or a natural person who is not a data subject with respect to the specific data in question, which, in accordance with applicable Union or national law, has the right to grant access to or to share certain personal data or non-personal data." (Data Governance Act)

The data holder might not be the original IPR owner of the data used, but has rights operate with it. The contract or other agreement between Provider and possible data owner is not part of the standard as metadata or licence wise.

If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github

Optional attributes and elements

Example of Holder component with some of the voluntary attributes:


dataHolder:
  en:
    legalName: MindMote Oy
    businessId: 12243434-12
    email: contact@mindmote.fi
    taxID: "12243434-12"
    vatID: "12243434-12"
    logoURL: "https://mindmote.fi/logo.png"
    description: "Digital Economy services and tools"
    URL: "https://mindmote.fi"
    telephone: "+35845 0232 2323"
    streetAddress: "Koulukatu 1"
    postalCode: "33100"
    addressRegion: "Pirkanmaa"
    addressLocality: "Tampere"
    addressCountry: "Finland"
    aggregateRating: ""
    ratingCount: "1245"
    slogan: ""
    parentOrganization: ""

Element name
Type Options Description
dataHolder element - Binds the provider related business elements and attributes together
en attribute ISO 639-1 defined 2-letter codes This element binds together other product attributes and expresses the langugage used. In the example this is "en", which indicates that product details are in English. If you would like to use French details, then name the element "fr". The naming of this element follows options (language codes) listed in ISO 639-1 standard.

You can have product details in multiple languages simply by adding similar sets like the example - just change the binding element name to matching language code.

The pattern to implement multilanguage support for data products was adopted from de facto UI translation practices. The attributes inside this element are commonly rendered in the UI for the consumer and providing a simple way to implement that was the driving reasoning. See for example JSON - Multi Language
legalName string text content, max length 256 chars REQUIRED The official name of the organization, e.g. the registered company name.
businessID string As defined in RFC 5322 The business identifier code of the company. Often this is given to the company by authorized public sector organization managing register of businesses.
contactName string - Contact person name
email string - Email to be used in contacting the organization.
taxID string - The Tax / Fiscal ID of the organization or person, e.g. the TIN in the US or the CIF/NIF in Spain.
vatID string - The Value-added Tax ID of the organization or person.
businessDomain string - In a data mesh architecture, data (or data product) ownership and management are distributed across self-contained business domains.
logoURL URL Valid URL. See more from RFC 3986. The URL pointing to organisation logo.
description string Max length 512 chars The introduction to the organization. Often contains information of what the organisation does and focuses on.
URL URL Valid URL. See more from RFC 3986. The URL of the organization's website.
telephone string Valid telephone number The telephone number. Use E.164 standard.
streetAddress string - The street address. For example, 1600 Amphitheatre Pkwy.
postalCode string - The postal code. For example, 94043.
addressRegion string - The region in which the locality is, and which is in the country. For example, California or another appropriate first-level Administrative division
addressLocality string - The locality in which the street address is, and which is in the region. For example, Mountain View.
addressCountry string two-letter ISO 3166-1 alpha-2 country code The country.
aggregateRating string - The average rating based on multiple ratings or reviews.
ratingCount integer - The amount of ratigns and reviews used in calculating the aggregateRating.
slogan string Max length 256 chars The slogan of the organization. This is often related to showing the brand
parentOrganization string - The larger organization that this organization is a subOrganization of, if any.

If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, raise an issue in Github

Specification extensions

While the Open Data Product Specification tries to accommodate most use cases, additional data can be added to extend the specification at certain points.

The extensions properties are implemented as patterned fields that are always prefixed by "x-". The extensions may or may not be supported by the available tooling, but those may be extended as well to add requested support (if tools are internal or open-sourced). Open Data Product Initiative Technical Steering Committee does not officially approve external extensions - they are fully independent. Popular extensions however are natural candidates for future additions of the standard.

We encourage you to let us know of useful extensions so that we can consider those in the future releases, raise an issue in Github

Example of extension usage:


product:
  name: Pets of the year
  productID: 123456are
  description: ''
  x-internal-id: foobar123

Element name
Type Options Description
^x- any Allows extensions to the Open Data Product Schema. The field name MUST begin with x-, for example, x-internal-id. The value can be null, a primitive, an array or an object. Can have any valid JSON format value.

Hello world example

You'll find a complete machine-readbale example of a data product from the right column. It is imaginary data product Pets of the year which contains derived data about the most common pets in the world. The product has 4 pricing plans which are mostly based on recurring subscription model. Note! Not all voluntary attributes are used in the example and multilingualism has not been fully applied.

Example of complete working Data Product specification instance:


---
schema: https://opendataproducts.org/v3.0rc/schema/odps.yaml
version: 3.0
product:
  en:
    name: Pets of the year
    productID: 123456are
    valueProposition: Design a customised petstore using a data product that describes
      pets with their habits, preferences and characteristics.
    description: This is an example of a Petstore product.
    productSeries: Lovely pets data products
    visibility: private
    status: draft
    version: '0.1'
    categories:
    - pets
    standards:
    - ISO 24631-6
    tags:
    - pet
    brandSlogan: Passion for the data monetization
    type: derived data
    logoURL: https://data-product-business.github.io/open-data-product-spec/images/logo-dps-ebd5a97d.png
    OutputFileFormats:
    - json
    - xml
    - csv
    - zip
    useCases:
    - useCase:
        useCaseTitle: Build attractive and lucrative petstore!
        useCaseDescription: Use case description how succesfull petstore chain was
          established in Abu Dhabi
        useCaseURL: https://marketplace.com/usecase1
  recommendedDataProducts:
  - https://marketplace.com/dataproduct.json, https://marketplace.com/dataproduct-another.json
  pricingPlans:
    en:
    - name: Premium subscription 1 year
      priceCurrency: EUR
      price: '50.00'
      billingDuration: year
      unit: recurring
      maxTransactionQuantity: unlimited
      offering:
        - item 1
    - name: Premium Package Monthly
      priceCurrency: EUR
      price: '5.00'
      billingDuration: month
      unit: recurring
      maxTransactionQuantity: 10000
      offering:
        - item 1
    - name: Freemium Package
      priceCurrency: EUR
      price: '0.00'
      billingDuration: month
      unit: recurring
      maxTransactionQuantity: 1000
      offering:
        - item 1
    - name: Revenue sharing
      priceCurrency: percentage
      price: '5.50'
      billingDuration: month
      unit: revenue-sharing
      maxTransactionQuantity: 20000
      offering:
        - item 1
  dataOps:
    data:
      schemaLocationURL: http://http://192.168.10.1/schemas/2016/petshopML-2.3/schema/petstore.xsd

    lineage:
      dataLineageTool: Collibra
      dataLineageOutput: http://192.168.10.1/lineage.json

    infrastructure:
      platform: Azure
      region: West US 2 (Washington)
      storageTechnology: Azure SQL
      storageType: sql
      containerTool: helm

    build:
      format: yaml
      hashType: SHA-2
      checksum: 7b7444ab8f5832e9ae8f54834782af995d0a83b4a1d77a75833eda7e19b4c921
      signatureType: JWK
      scriptURL: http://192.168.10.1/rundatapipeline.yaml
      deploymentDocumentationURL: http://192.168.10.1/datapipeline

  dataAccess:
    type: API
    authenticationMethod: OAuth
    specification: OAS
    format: JSON
    documentationURL: https://swagger.com/petstore.json

  SLA:
  - dimension: latency
    displaytitle:
      - en: Latency
    objective: 100
    unit: milliseconds
    monitoring:
      type: prometheus
      reference: https://prometheus.io/docs/prometheus/latest/querying/basics/ 
      spec:  
        myTimer.observeDuration();

  - dimension: uptime
    displaytitle:
      - en: Uptime
    objective: 99
    unit: percent

  - dimension: responseTime
    objective: 200
    unit: milliseconds
    monitoring:
      type: prometheus
      reference: https://prometheus.io/docs/prometheus/latest/querying/basics/ 
      spec:  
       rate(http_server_requests_seconds_sum[$__rate_interval]) / rate(http_server_requests_seconds_count[$__rate_interval])

  - dimension: errorRate
    objective: 0.1
    unit: percent

  - dimension: endOfSupport
    objective: 01/01/2025 # dd/mm/yyyy
    unit: date

  - dimension: endOfLife
    objective: 01/03/2025 # dd/mm/yyyy
    unit: date

  - dimension: updateFrequency
    objective: 7
    unit: days

  - dimension: timeToDetect
    objective: 60
    unit: minutes

  - dimension: timeToNotify
    objective: 120
    unit: minutes

  - dimension: timeToRepair
    objective: 24
    unit: hours

  - dimension: emailResponseTime
    objective: 12
    unit: hours

  support:
      phoneNumber: '+971508976456'
      phoneServiceHours: 'Mon-Fri 8am-4pm (GMT)'
      email: support@opendataproducts.org
      emailServiceHours: 'Mon-Fri 8am-4pm (GMT)'
      documentationURL: ''

  dataQuality:
  - dimension: accuracy
    displaytitle:
    - en: Data Accuracy (percent)
    - fi: Datan virheettömyys (prosenttia)
    objective: 98
    unit: percentage
    monitoring:
      type: SodaCL 
      reference: https://docs.soda.io/soda-cl/soda-cl-overview.html
      spec:
        - require_unique(member_id) 
        - require_range(age_band, 18, 100)

  - dimension: completeness
    displaytitle:
    - en: Data Completeness
    objective: 99.9
    unit: percentage
    monitoring:
      type: SodaCL 
      reference: https://docs.soda.io/soda-cl/soda-cl-overview.html
      spec:
        - for each column:
            name: [member_id, gender, age_band]
            checks:
              - not null:
                  fail: when > 0.1% # Fail if more than 0.1% of records are null

  - dimension: consistency
    objective: 98
    unit: percentage

  - dimension: timeliness
    objective: 100
    unit: percentage

  - dimension: validity
    objective: 98
    unit: percentage

  - dimension: uniqueness
    objective: 100
    unit: percentage

  license:
    en:
      scope:
        definition: The purpose of this license is to determine the terms and conditions
          applicable to the licensing of the data product, whereby Data Holder grants
          Data User the right to use the data.
        restrictions: Data User agrees not to, directly or indirectly, participate in
          the unauthorized use, disclosure or conversion of any confidential information.
        geographicalArea:
        - EU
        - US
        permanent: false
        exclusive: false
        rights:
        - Reproduction
        - Display
        - Distribution
        - Adaptation
        - Reselling
        - Sublicensing
        - Transferring
      termination:
        terminationConditions: Cancellation before 30 days. After the expiry of the
          right of use, the product and its derivatives must be removed.
        continuityConditions: Expired license will automatically continued without written
          cancellation (termination) by Data Holder
      governance:
        ownership: Mindmote Oy, a company specializing in pet industry insights, owns
          the license to its proprietary data product 'Pets of the Year'.
        damages: During the term of license, except for the force majeure or the Data
          Holders reasons, Data User is required to follow strictly in accordance with
          the license. If Data User wants to terminate the license early, it needs to
          pay a certain amount of liquidated damages.
        confidentiality: Data User undertakes to maintain confidentiality as regards
          all information of a technical (such as, by way of a non-limiting example,
          drawings, tables, documentation, formulas and correspondence) and commercial
          nature (including contractual conditions, prices, payment conditions) gained
          during the performance of this license.
        applicableLaws: This license shall be interpreted, construed and enforced in
          accordance with the law of Finland, including Copyright Act 404/1961.
        warranties: Data Holder makes no warranties, express or implied, guarantees
          or conditions with respect to your use of the data product. To the extent
          permitted under local law, Data Holder disclaims all liability for any damages
          or losses, including direct, consequential, special, indirect, incidental
          or punitive, resulting from Data User use of the data product.
        audit: Data Holder will reasonably cooperate with Data Users by providing available
          additional information about the data product. Both parties will bear their
          own audit-related costs.
        forceMajeure: Both parties may suspend their contractual obligations when fulfillment
          becomes impossible or excessively costly due to unforeseeable events beyond
          their control, such as strikes, fires, wars, and other force majeure events.
  dataHolder:
    en:
      taxID: 12243434-12
      vatID: 12243434-12
      businessDomain: Data Product Business
      logoURL: https://mindmote.fi/logo.png
      description: Digital Economy services and tools
      URL: https://mindmote.fi
      telephone: "+358 45 232 2323"
      streetAddress: Koulukatu 1
      postalCode: '33100'
      addressRegion: Pirkanmaa
      addressLocality: Tampere
      addressCountry: Finland
      aggregateRating: ''
      ratingCount: 1245
      slogan: ''
      parentOrganization: ''


Mandatory-only example

Example data product with just the mandatory elements and attributes. This is the minimal representation of a data product metadata that is expected to be found from every data product following ODPS standard. This bare minimum can be expanded with other elements and attributes defined in the specification. Also the possibilty to use extensions exists if local additions are needed.

Example of mandatory-only elements and attributes Open Data Product specification instance:


schema: https://opendataproducts.org/v3.0rc/schema/odps.yaml
version: 3.0
product:
  en:
    name: Pets of the year
    productID: 123456are
    visibility: private
    status: draft
    type: derived data

Terms used

Here's list of terms used and what we mean with them. The meaning of terms is mostly taken from existing knowledge eg articles and other trusted sources. The source is always linked to the term. In some rare cases term is defined for the specification purposes only.

Term
Description
Data point A data point refers to a single, individual unit of data. It can be a number, a word, a measurement, or any other piece of information that is recorded and used for analysis. Some mix the data point with a dataset. In a dataset, each data point represents one observation or measurement. For example, in a dataset of temperatures recorded every day, each daily temperature is a data point.
Data product As a strategic resource for companies, data is considered an asset that, like any other material good, has a financial value and whose management generates costs. Data created, collected or used in individual business processes can be sold to other organisations as raw or processed data, so that it no longer serves as an enabler of products, but is the product itself. This leads to the paradigm that data assets can be monetised by exchanging and trading data between organisations as data products and services.

There are multiple definitions for data product. In an article authored by Jian Pei (2020), data products "refer to data sets as products and information services derived from data sets." Simon O'Regan's defines data product as a product whose primary objective is to use data to facilitate an end goal. From the academic literature we have found several subtypes of data products: raw data, derived data, data sets, reports, analytic views, 3D visualisations, algorithms, decision support (dashboards) and automated decision-making (Netflix product recommendations or Spotify’s Discover Weekly would be common examples).

Typically raw data, derived data and algorithms have technical users. Most often they tend to be internal products in an organisation.

If we dive in the data mesh world, this quote from Zhamak Dehghani’s book is key to understand the definition of data as a product: “Domain data teams must apply product thinking […] to the datasets that they provide; considering their data assets as their products and the rest of the organization’s data scientists, ML and data engineers as their customers.

While many of the standard Product Development Rules apply — solve a customer need, learn from feedback, prioritise relentlessly, etc. — data has different characteristics compared to tangible products that prevent the direct transfer of established processes and rules of trading goods, especially in terms of pricing mechanisms.

In trading data, there is less willingness to pay. For example, data buyers often do not recognise the potential value of data items because it cannot be fully disclosed prior to purchase (known as the ‘Arrow paradox’).

In addition, there is often a lack of notion that the creation, processing, storage and distribution of high-quality data is a major cost factor for the data provider. Another obstacle is the lack of trust and security causing potential data providers to fear that competitors could benefit from disclosure of in-house data.

One of the aims of this specification is to tackle above mentioned issues which hinder the growth of data ecosystem and market volatility.
Data as a service In computing, data as a service, or DaaS, is a term used to describe cloud-based software tools used for working with data, such as managing data in a data warehouse or analyzing data with business intelligence. It is enabled by software as a service (SaaS). DaaS like all "as a service" (aaS) technology, builds on the concept that its data product can be provided to the user on demand, regardless of geographic or organizational separation between provider and consumer.

According to Daniel Newman from Forbes (2017) DaaS is essentially a data stream that subscribers can access on demand.

Some people use the term data product in a meaning which contains also data commodities which have more service alike attributes than product attributes. In those cases we prefer to use the term data as a service and call the creation process as data servitization. The term productizement is reserved for the process which creates data products as end result.
Data as a service business model Data as a service as a business model is a concept when two or more organizations buy, sell, or trade machine-readable data in exchange for something of value. Data as a service is a general term that encompasses data-related services. Now DaaS service providers are replacing traditional data analytics services or happily clustering with existing services to offer more value-addition to customers. The DaaS providers are curating, aggregating, analyzing multi-source data in order to provide additional more valuable analytical data or information.

Typically, DaaS business is based on subscriptions and customers pay for a package of services or definite services.
Data pipeline According to Aiswarya et al. the complex chain of interconnected activities or processes from data gen- eration through data reception constitutes a data pipeline. In other words, data pipelines are the connected chain of processes where the output of one or more processes becomes an input for another. It is a piece of software that removes many manual steps from the workflow and permits a streamlined, automated flow of data from one node to another. Moreover, it automates the operations involved in the selection, extraction, transformation, aggregation, validation, and loading of data for further analysis and visualization. It offers end to end speed by removing errors and resisting bottlenecks or delay. Data pipelines can process multiple streams of data simultaneously.
Infrastructure as Code Infrastructure as Code (IaC) transforms infrastructure management by using code instead of manual processes. Configuration files capture infrastructure specifications, ensuring consistent environment provisioning. The "as code" paradigm extends beyond infrastructure to encompass quality control and data product processes. This approach, applied to the entire data pipeline, enhances repeatability, traceability, and scalability, fostering collaboration and systematic data management.
DataOps In DataOps, the focus lies in creating automated processes for releasing and updating data products throughout their lifecycle, from development to production. This automation spans the entire journey from development to transitioning into production. The objective is to enhance operational efficiency through automation, reduce errors, and enable faster release cycles for data products.

Editors and contributors

This specification is openly developed and a lot of the work comes from community. We list all community contributors as a sign of appreciation. The editors (as initial creators of the the specification) are Jarkko Moilanen and Jussi Niilahti. Editors take the feedback and draft new candidate releases, which may become the versions of the specification.

List of community contributors

The work around the specification would not be possible without enormous help from the community. Here's list of contributors so far.