OPEN DATA PRODUCT SPECIFICATION 1.0
Note! You are looking at the old version. Latest production version is 2.0:
https://open-data-product-initiative.github.io/open-data-product-spec-2.0/Version 1.0
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
The specification is shared under Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.
Published 4th Feb 2022
Editors:
Participate:
Introduction
The Open Data Product Specification is a vendor-neutral, open-source machine-readable data product metadata model. It defines the objects and attributes as well as the structure of digital data products. The work is based on existing standards (schema.org), best practices and emerging concepts like Data Mesh. The reasoning is that we reuse and proudly copy instead of reinventing the wheel. More detailed information of the origin can be found from the Open Data Product Specification homepage.
The specification has been designed with four major aspects of the data product in mind: 1) technical (infrastructure & access), 2) business (pricing & plans), 3) legal (licensing & IPR), and 4) ethical (privacy & mydata). The four aspects are described in 5 elements, which contain attributes and other elements.
Specification aims:
- enable interoperability between organizations, data platforms, marketplaces, and tools.
- reduce data product metadata conversions and errors between systems and organizations,
- increase the speed of designing, testing, and implementing data products.
- speed up tools development around data product design, development and management.
- enable creation of automated data product deployment with standard methods (DataOps)
Note! In the "Open Data Product" focus is on the latter words and the prefix 'open' refers to the openness of the standard. Any kind of connotations to open data (a different thing) are not intentional, intended, or desirable.
If you see something missing, described inaccurately or plain wrong, or you want to comment the specification, click the button below and proceed forward.
Document structure
LEFT COLUMN: Navigation
The left column is navigation which enables fluent and easy movement around the specification.
MIDDLE COLUMN: Principles and components
The middle column contains detailed information about the included components and related options. This is the theory part.
Note! Mandatory elements and attributes are listed separately in the definition tables. This enables user to construct minimum viable specification more easily and fast. https://schema.org provided ready-made definitions are applied when ever possible instead of re-inventing the wheel.
RIGHT COLUMN: Examples
The right column contains JSON formatted examples of how the specification is used. In the future other output formats are added on request basis.
Example of JSON formatted snippet from the Open Data Product specification:
"monitoring": {
"url": "https://monitoring.com"
}
Document level attributes
Here's the list of attributes which can occur at the document root level. In the following description, if a field is not explicitly REQUIRED or described with a MUST or SHALL, it can be considered OPTIONAL. Optional attributes are listed in own table and examples are given on the right column.
Mandatory attributes
Example of document level attribute usage and structure:
"Product": {
"name": "Pets of the year",
"productID": "123456are",
"visibility": "private",
"status": "draft",
"type": "dataset"
}
Element name |
Type | Options | Description |
---|---|---|---|
name | string | max length 256 chars | REQUIRED The name of the product. |
productID | string | max length 256 chars | REQUIRED Product identifier. |
visibility | one of | one of: private, organisation, public | REQUIRED The publicity level eg who can see this product. Private - just the creator. Organisation - visible to all in your organisation. Public - visible to all publicly. |
status | one of | one of: announcement, draft, development, testing, acceptance, production, sunset, retired | REQUIRED The status of the product. Lifecycle model discussed in details in here (link). |
type | one of | Options: raw data, derived data, dataset, reports, analytic view, 3D visualisation, algorithm, decision support, automated decision-making, data-enhanced product, data-driven service, data-enabled performance, bi-directional. | REQUIRED The type of the product. Options are derived from examples and lists found from academic literature. |
Optional attributes
Example of document level attribute usage and structure:
"Product": {
"name": "Pets of the year",
"productID": "123456are",
"description": "",
"visibility": "private",
"status": "draft",
"version": "0.1",
"categories": ["pets"],
"tags": ["pet"],
"brandSlogan": "Passion for the data monetization",
"type": "dataset",
"logoURL": "https://data-product-business.github.io/open-data-product-spec/images/logo-dps-ebd5a97d.png"
}
Element name |
Type | Options | Description |
---|---|---|---|
valueProposition | string | text content, max length 512 chars | This is the product's value proposition. Often one or two sentences and crystallizes the value for the customer. |
description | string | - | The description of the product. Text only. |
categories | array | - | Comma separates array of categories. |
tags | array | - | Comma separates array of tags. |
version | string | The versioning scheme is major.minor.. Examples: 1.0, 2.1, 3.15 | The version of the product. |
logoURL | URL | Valid URL | Valid URL of the logo. See more from RFC 3986. |
brandSlogan | string | - | Brand related slogan like Nike has just do it. |
Data Pricing
Pricing is the process whereby a business sets the price at which it will sell its products and services. Pricing OBJECT consists of mandatory and optional attributes. This element contains pricing plans related data to be used for example in displaying the items in a marketplace. If needed the standard metadata is converted to marketplace internal format. We encourage all data product owners to enforce usage of this standard.
Mandatory attributes are listed in separate table and marked with bolded names and asterix *. Next to the mandatory attributes is an example.
The same logic applies to the optional attributes as well. Optional attributes are listed in own table and an example is given in the right column.
Supported pricing models include:
- recurring time period based (day, week, month, year) plans
- one time payments plans
- pay-as-you-go plans
- revenue sharing plans
- data volume plan
- dynamic pricing (high and low limits for automated pricing)
- Pay what you want plans
Mandatory attributes and elements
Example of Pricing component usage with manadatory elements and attributes:
"pricing": [ {
{
"name": "Premium Package Monthly",
"priceCurrency": "EUR",
"price": "5.00",
"billingDuration": "month",
"unit": "recurring",
"maxTransactionQuantity": 10000
},
{
"name": "Freemium Package",
"priceCurrency": "EUR",
"price": "0.00",
"billingDuration": "month",
"unit": "recurring",
"maxTransactionQuantity": 1000
},
{
"name": "Revenue sharing",
"priceCurrency": "percentage",
"price": "5.50",
"billingDuration": "month",
"unit": "revenue-sharing",
"maxTransactionQuantity": 20000
},
{
"name": "Premium subscription 1 year",
"priceCurrency": "EUR",
"price": "50.00",
"billingDuration": "year",
"unit": "recurring",
"maxTransactionQuantity": "unlimited"
}
}
]
Element name |
Type | Options | Description |
---|---|---|---|
name | string | max length 256 chars | REQUIRED The name of the plan/offering. |
priceCurrency | string | Use standard formats: ISO 4217 currency format e.g. "USD"; Ticker symbol for cryptocurrencies e.g. "BTC" | REQUIRED The primary currency used in pricing. Platforms are assumed to use this as primary currency if currency conversions are used to display product pricing in different locations for various currencies. If the unit is revenue-sharing, then this attribute value MUST be percentage. |
price | string | - | REQUIRED The offer price of a product, or of a price component, or revenue-sharing percentage. If the unit of pricing is revenue-sharing, then this price attribute value is percentage value. Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols. With data-volume the price is for each 1GB of data. |
billingDuration | string | options: instant, day, week, month, year | REQUIRED Specifies for how long this price (or price component) will be billed. Can be used, for example, to model the contractual duration of a subscription or payment plan. |
unit | string | One of: one-time-payment, pay-per-use, recurring, revenue-sharing, data-volume , pay-what-you-want | REQUIRED One-time-payment is for single time purchase purposes, further purchaces are not intended to continue under same agreement. Pay-per-use is intended for continuous usage and price set is for each successful usage action. Recurrring is intended for continuous time period plans. Revenue sharing is a performance-based income model. An effective revenue sharing deal structure is offering your expertise to a business owner to help them grow their business. In return, you get paid a percentage of the revenue as a royalty fee. Data-volume is for data amount based pricing in which customer pays based on the served data amount. The price is always for 1GB of data. pay-what-you-want is a pricing system where buyers pay any desired amount for a given commodity, sometimes including zero. In some cases, a minimum (floor) price may be set, and/or a suggested price may be indicated as guidance for the buyer. The buyer can also select an amount higher than the standard price for the commodity. If the floor price is set, use minPrice attribute. |
maxTransactionQuantity | Integer | Integer | REQUIRED The maximum transaction quantity for the given billing duration. Use this to define for example monthly (or any other period) request limit to the data product. Note! If you want to set unlimited use, value must be 0 (zero). |
Optional attributes and elements
Example of Pricing component usage with some of the optional elements and attributes:
"pricing" [ {
{
"name": "Premium subscription 1 year",
"priceCurrency": "EUR",
"price": "10.00",
"minPrice": "5.00",
"maxPrice": "15.000"
"additionalPrice": 0.02
},
{
"name": "Premium Package",
"priceCurrency": "EUR",
"price": "10.00",
"maxPrice": "20.00",
"valueAddedTaxIncluded": false
}
}
]
Element name |
Type | Options | Description |
---|---|---|---|
minPrice | string | - | The lowest price if the price is a range. If dynamic pricing is used with this product, this is the lowest price allowed. In dynamic pricing businesses are able to change prices based on algorithms that take into account competitor pricing, supply and demand, and other external factors in the market. Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols. |
maxPrice | string | - | The highest price if the price is a range. If dynamic pricing is used with this product, this is the highest price allowed. Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols. |
valueAddedTaxIncluded | boolean | true/false | Specifies whether the applicable value-added tax (VAT) is included in the price specification or not. |
valueAddedTaxPercentage | Integer | Number percentage value, range 0-100 | Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols. |
validFrom | DateTime | A combination of date and time in ISO 8601 format yyyy-MM-dd'T'HH:mm:ss.SSSZ. | The date when the item becomes valid. |
validTo | DateTime | A combination of date and time in ISO 8601 format yyyy-MM-dd'T'HH:mm:ss.SSSZ. | The date after when the item is not valid. |
additionalPrice | string | - | This is used to define fees for usage which exceeds the defined max transaction quantity. This value is for each additional transaction. Use '.' (Unicode 'FULL STOP' (U+002E)) rather than ',' to indicate a decimal point. Avoid using these symbols as a readability separator. Use values from 0123456789 (Unicode 'DIGIT ZERO' (U+0030) to 'DIGIT NINE' (U+0039)) rather than superficially similiar Unicode symbols. |
maxDataQuantity | Integer | - | The maximum amount of data transferred during the billing duration. Unit is GB. |
Data Pipeline
Data Pipeline is a process whereby a data product pipeline deployment method is defined. Usually the deployment script contains the logic of the individual steps as well as the code chaining the steps together.
Data Pipeline OBJECT defines building, deploying, and running data product’s code, and storing and giving access to data and metadata. This priciple has been adopted from the Data Mesh.
Example of Data Pipeline component usage:
"dataPipeline": {
"infrastructure": {
"platform": "Azure",
"storageTechnology": "Azure SQL",
"storageType": "sql",
"containerTool": "helm",
"format": "yaml",
"status": "development",
"scriptURL": "http://192.168.10.1/rundatapipeline.yml",
"deploymentDocumentationURL": "http://192.168.10.1/datapipeline",
"hashType": "SHA-2",
"checksum": "7b7444ab8f5832e9ae8f54834782af995d0a83b4a1d77a75833eda7e19b4c921"
}
"dataAccess": {
"outputPorttype": "API",
"authenticationMethod": "OAuth",
"specification": "OAS",
"format": "GraphQL",
"specURL": "https://192.168.10.1/petstore.json",
"documentationURL": "http://192.168.10.1/petshop"
}
}
Element name |
Type | Options | Description |
---|---|---|---|
infrastructure | element | - | Infrastructure is a process whereby a data product pipeline deployment method is defined. |
platform | string | any | Platform infrastructure, such as AWS, GCP, Azure. |
storageTechnology | string | any | Describes the internal storage area technology, such as Amazon S3, Google Cloud Storage, Azure Blob Storage, Azure SQL. |
storageType | string | any | Describes the internal storage type, such as files, sql, events, MQTT. |
containerTool | string | any | A name of the package manager, container or infrastructure as code tool. |
format | string | any | Type of script language. |
status | string | Options: announcement, draft, development, testing, acceptance, production, sunset, retired. | Development status. |
scriptURL | URL | Valid URL | The URL of the deployment script. |
deploymentDocumentationURL | URL | Valid URL | The URL of the deployment documentation. |
hashType | string | One of: SHA-1, SHA-2, SHA-3 | Type of secure hash algorithm for checksum. |
checksum | string | any | Script checksum. |
dataAccess | element | - | Reference to the ability to use data. |
outputPorttype | string | any | Type of data access, such as API, SQL, sFTP, gRPC. |
authenticationMethod | string | any | Data access authentication method type, such as API key, HTTP Basic, OAuth, No authentication. |
specification | string | any | Type of the data access specification, such as OAS, RAML, Slate. |
format | string | any | Data access file format type, such as JSON, XML, GraphQL, plain text. |
specsURL | URL | Valid URL | The URL of the specification. |
documentationURL | URL | Valid URL | The URL of the data access documentation. |
Data SLA
Data Service Level Agreement (SLA) Object** contains attributes which define the desired and promised quality of the data product.
No mandatory attributes at the moment. Optional attributes are listed in own table and an example is given in the right column.
Optional attributes and elements
Example of Quality component usage:
"SLA": {
"updateFrequency":
{
"unit": "hours",
"value": 1
},
"uptime":
{
"unit": "percentage",
"value": 99
},
"responseTime":
{
"unit": "milliseconds",
"value": 200
}
"nullValues":
{
"unit": "percentage",
"value": 0.01
}
"support":
{
"company":
{
"phoneNumber": "",
"phoneServiceHours": ""
"chatURL":"",
"chatServiceHours": "",
"chatResponseTime": "",
"email": "",
"emailServiceHours": "",
"emailResponseTime": "",
"documentationURL": "",
"guidesURL": "",
},
"community":
{
"stackoverflowURL": "",
"forumURL": ""
"slackURL": "",
"twitterURL": ""
}
}
"observability":
{
"logsURL": "https://logs.com"
"dashboardURL": "https://dashboard.com",
"uptimeURL": "https://uptime.com"
}
}
Element name |
Type | Options | Description |
---|---|---|---|
updateFrequency | element | Options for unit are: milliseconds, seconds, minutes, days, weeks, months, years, never, null. Value attribute is Integer. |
Name of the quality attribute indicating the timely interval how often data is updated. |
uptime | element | Options for unit are: percentage, string, null. The value attribute can be integer or string "best effort". |
Uptime is the amount of time that a service is online available and operational. Guaranteed uptime is expressed as SLA level and is generally the most important metric to measure the quality of a hosting provider. An SLA level of 99.99% for example equates to 52 minutes and 36 seconds of downtime per year. in this context uptime is SLA. |
responseTime | element | Unit options are: milliseconds, seconds, null. Value can be integer or null |
Response time is the total amount of time it takes to respond to a request for service. |
nullValues | element | Unit is percentage. Value can be integer or null |
Null values is the percentage of null values in the content. This is quite oftenly used as data quality attribute by data scientists. |
support | element | - | Support element describes how the customer can reach for help in case of difficulties in usage, billing, or otherwise. Support can be based on company provided support and community driven support. |
phoneNumber | string | - | The support phone number |
phoneServiceHours | string | - | Describes the service hours company provides. Contains information often in week level eg Mon-Fri at 8am - 4pm. |
chatURL | URL | Valid URL | The URL of chat service to use. Service hours and response time defined in other attributes. |
chatServiceHours | string | - | Describes the chat service hours company provides. Contains information often in week level eg Mon-Fri at 8am - 4pm. |
chatResponseTime | string | - | Describes aimed maximum delay in responding to chat support requests. This doesn't normally guarantee a resolution to the problem. |
string | - | Email information for support requests. | |
emailServiceHours | string | - | Describes the email service hours company provides. Contains information often in week level eg Mon-Fri at 8am - 4pm. |
emailResponseTime | string | - | Describes aimed maximum delay in responding to email support requests. This doesn't normally guarantee a resolution to the problem. |
documentationURL | URL | - | URL to the documentation of the product. |
guidesURL | URL | Valid URL | URL to the guides offering more information and examples about how to use the data product. Guides might be platform specific. |
community | Element | - | Element that contains community based support function information. |
stackoverflowURL | URL | Valid URL | URL to the Stack Overflow. Could be for example list of resolved issues related to the product. |
forumURL | URL | Valid URL | URL to the community forum in which product related support requests can be raised. |
slackURL | URL | Valid URL | URL to the Slack workspace in which product related support requests can be raised. |
twitterURL | URL | Valid URL | URL to the Twitter account for which product related support requests can be raised. |
observability | element | - | Observability is a superset of monitoring. It provides not only high-level overviews of the system’s health but also highly granular insights into the implicit failure modes of the system. In addition, an observable system furnishes ample context about its inner workings, unlocking the ability to uncover deeper, systemic issues. |
logsURL | URL | Valid URL | URL to service which offers access to event logs including errors, response times, call information. |
dashboardURL | URL | Valid URL | URL to dashboard application which visualizes product behaviour. This service should support at least part of the given product quality indicators. |
uptimeURL | URL | Valid URL | URL to service which shows uptime statistics as well as other statistical information. This service should support at least part of the given product quality indicators. |
Data Licensing
The data product may be exploited e.g. by licensing its use and exploitation to third parties. Machine-readable license as part of the specification is implemented for this purpose. It can be used to conclude various agreements regarding data protection, processing and intellectual property rights (IPR). Data can be protected by one or more intellectual property rights. Principle is that when a third party (Data User) exploits the data, it must have a license or other right from Data Holder to exploit to the data.
Example of License Object usage:
"license": {
"scope": {
"definition": "The purpose of this license is to determine the terms and conditions applicable to the licensing of the data product, whereby Data Holder grants Data User the right to use the data.",
"language": "en-us",
"permanent": false,
"terminationContitions": "Cancellation before 30 days. After the expiry of the right of use, the product and its derivatives must be removed.",
"continuityConditions": "Expired license will automatically continued without written cancellation (termination) by Data Holder",
"restrictions": "Data User agrees not to, directly or indirectly, participate in the unauthorized use, disclosure or conversion of any confidential information.",
"geographicalArea": [
"EU",
"US"
],
"modificationRight": true,
"resellingRight": true,
}
"governance": {
"containsPersonalData": true,
"dpaURL": "http://192.168.10.1/dpaconditions",
"audit": "Data Holder will reasonably cooperate with Data User by providing available additional information concerning the data product. Each party will bear its own costs with respect to the audit procedures.",
"warranties": "Data Holder makes no warranties, express or implied, guarantees or conditions with respect to your use of the data product. To the extent permitted under local law, Data Holder disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental or punitive, resulting from Data User use of the data product.",
"forceMajeure": "Each party may suspend the fulfilment of its contractual obligations, when the said fulfilment is impossible or objectively too costly due to an unforeseeable impediment independent from the parties, such as for example: strike, boycott, lockout, fire, war (declared or not), civil war, riots and revolutions, requisitions, embargo, power blackouts, extraordinary breakage of machinery, delays in the delivery of components or raw materials.",
"damages": "During the term of license, except for the force majeure or the Data Holders reasons, Data User is required to follow strictly in accordance with the license. If Data User wants to terminate the license early, it needs to pay a certain amount of liquidated damages.",
"confidentiality": "Data User undertakes to maintain confidentiality as regards all information of a technical (such as, by way of a non-limiting example, drawings, tables, documentation, formulas and correspondence) and commercial nature (including contractual conditions, prices, payment conditions) gained during the performance of this license."
"applicableLaws": "This license shall be interpreted, construed and enforced in accordance with the law of Finland, Incl. Copyright Act 404/1961."
}
}
Element name |
Type | Options | Description |
---|---|---|---|
scope | element | - | Extent, range, coverage, area or space of the license. |
definition | string | text content, max length 512 chars | Background and purpose of the license. |
language | string | ISO 639-1 standard language codes | License language. |
permanent | boolean | true/false | License with no expiration date. |
termitationConditions | string | text content, max length 512 chars | Cancellation conditions of the license. |
continuityConditions | string | text content, max length 512 chars | Continuity conditions of the license. |
restrictions | string | text content, max length 512 chars | Restrictions of the license. |
geographicalArea | string | ISO 3166-1 alpha-2 codes | License right restricted to the geographical area. |
modificationRights | boolean | true/false | Data modification rights. |
resellingRights | boolean | true/false | Reselling rights. |
governance | element | - | Governance is the approach taken to ensure that the agreed outcomes are being fulfilled. |
containsPersonalData | boolean | true/false | Data contains personal data. |
dpaURL | URL | valid URL | The URL of the Data Processing Agreement (DPA). |
audit | string | text content, max length 512 chars | License auditing terms. |
warranties | string | text content, max length 512 chars | License warranties. |
forceMajeure | string | text content, max length 512 chars | Force Majeure |
damages | string | text content, max length 512 chars | Damages refers to the sum of money (i.e. indemnifications) for a breach of some duty or violation of license right. |
confidentiality | string | text content, max length 512 chars | Restrictions and requirements imposed on the Data User regarding e.g. the use and disclosure of the Data Holder's confidential information. |
applicableLaws | string | text content, max length 512 chars | Applicable laws, i.e local acts, degrees or law. |
Data Holder
DataHolder Object describes the Organization legally allowed to create, develop and publish data products.
Data holder means "a legal person, public body, international organisation, or a natural person who is not a data subject with respect to the specific data in question, which, in accordance with applicable Union or national law, has the right to grant access to or to share certain personal data or non-personal data." (Data Governance Act)
The data holder might not be the original IPR owner of the data used, but has rights operate with it. The contract or other agreement between Provider and possible data owner is not part of the standard as metadata or licence wise.
Mandatory attributes are listed in separate table and marked with REQUIRED text. Next to the mandatory attributes is an example.
The same logic applies to the optional attributes as well. Optional attributes are listed in own table and an example is given in the right column.
Mandatory attributes and elements
Example of Provider component mandatory attributes usage:
"dataHolder":
{
"legalName":"MindMote Oy",
"businessId":"12243434-12",
"email":"contact@mindmote.fi"
}
Element name |
Type | Options | Description |
---|---|---|---|
legalName | string | text content, max length 256 chars | REQUIRED The official name of the organization, e.g. the registered company name. |
businessID | string | As defined in RFC 5322 | REQUIRED The business identifier code of the company. Often this is given to the company by authorized public sector organization managing register of businesses. |
string | - | REQUIRED Email to be used in contacting the organization. |
Optional attributes and elements
Example of Provider component with some of the voluntary attributes:
"dataHolder":
{
"taxID": "12243434-12",
"vatID": "12243434-12",
"logoURL": "https://mindmote.fi/logo.png",
"description": "Digital Economy services and tools",
"URL": "https://mindmote.fi",
"telephone": "+35845 0232 2323",
"streetAddress": "Koulukatu 1",
"postalCode": "33100",
"addressRegion": "Pirkanmaa",
"addressLocality": "Tampere",
"addressCountry": "Finland",
"aggregateRating": "",
"ratingCount": 1245,
"slogan": "",
"parentOrganization": ""
}
Element name |
Type | Options | Description |
---|---|---|---|
taxID | string | - | The Tax / Fiscal ID of the organization or person, e.g. the TIN in the US or the CIF/NIF in Spain. |
vatID | string | - | The Value-added Tax ID of the organization or person. |
logoURL | URL | Valid URL. See more from RFC 3986. | The URL pointing to organisation logo. |
description | string | Max length 512 chars | The introduction to the organization. Often contains information of what the organisation does and focuses on. |
URL | URL | Valid URL. See more from RFC 3986. | The URL of the organization's website. |
telephone | string | - | The telephone number. |
streetAddress | string | - | The street address. For example, 1600 Amphitheatre Pkwy. |
postalCode | string | - | The postal code. For example, 94043. |
addressRegion | string | - | The region in which the locality is, and which is in the country. For example, California or another appropriate first-level Administrative division |
addressLocality | string | - | The locality in which the street address is, and which is in the region. For example, Mountain View. |
addressCountry | string | two-letter ISO 3166-1 alpha-2 country code | The country. |
aggregateRating | string | - | The average rating based on multiple ratings or reviews. |
ratingCount | integer | - | The amount of ratigns and reviews used in calculating the aggregateRating. |
slogan | string | Max length 256 chars | The slogan of the organization. This is often related to showing the brand |
parentOrganization | string | - | The larger organization that this organization is a subOrganization of, if any. |
Specification extensions
While the Open Data Product Specification tries to accommodate most use cases, additional data can be added to extend the specification at certain points.
The extensions properties are implemented as patterned fields that are always prefixed by "x-". The extensions may or may not be supported by the available tooling, but those may be extended as well to add requested support (if tools are internal or open-sourced). Open Data Product Initiative Technical Steering Committee does not officially approve external extensions - they are fully independent. Popular extensions however are natural candidates for future additions of the standard.
Example of extension usage:
"Product": {
"name": "Pets of the year",
"productID": "123456are",
"description": "",
"x-internal-id": "foobar123"
}
Element name |
Type | Options | Description |
---|---|---|---|
^x- | any | Allows extensions to the Open Data Product Schema. The field name MUST begin with x-, for example, x-internal-id. The value can be null, a primitive, an array or an object. Can have any valid JSON format value. |
Hello world example
You'll find a complete machine-readbale example of a data product from the right column. It is imaginary data product Pets of the year which contains derived data about the most common pets in the world. The product has 4 pricing plans which are mostly based on recurring subscription model. Note! Not all voluntary attributes are used in the example.
Example of complete working Data Product specification instance:
{
"Product": {
"name": "Pets of the year",
"productID": "123456are",
"description": "",
"visibility": "private",
"status": "draft",
"version": "0.1",
"categories": ["pets"],
"tags": ["pet"],
"brandSlogan": "Passion for the data monetization",
"type": "derived data",
"logoURL": "https://data-product-business.github.io/open-data-product-spec/images/logo-dps-ebd5a97d.png"
},
"pricing": [{
"name": "Premium subscription 1 year",
"priceCurrency": "EUR",
"price": "50.00",
"billingDuration": "year",
"unit": "recurring",
"maxTransactionQuantity": "unlimited"
},
{
"name": "Premium Package Monthly",
"priceCurrency": "EUR",
"price": "5.00",
"billingDuration": "month",
"unit": "recurring",
"maxTransactionQuantity": 10000
},
{
"name": "Freemium Package",
"priceCurrency": "EUR",
"price": "0.00",
"billingDuration": "month",
"unit": "recurring",
"maxTransactionQuantity": 1000
},
{
"name": "Revenue sharing",
"priceCurrency": "percentage",
"price": "5.50",
"billingDuration": "month",
"unit": "revenue-sharing",
"maxTransactionQuantity": 20000
}
],
"dataPipeline": {
"infrastructure": {
"platform": "Azure",
"storageTechnology": "Azure SQL",
"storageType": "sql",
"containerTool": "helm",
"format": "yaml",
"status": "development",
"scriptURL": "http://192.168.10.1/rundatapipeline.yml",
"deploymentDocumentationURL": "http://192.168.10.1/datapipeline",
"hashType": "SHA-2",
"checksum": "7b7444ab8f5832e9ae8f54834782af995d0a83b4a1d77a75833eda7e19b4c921"
},
"dataAccess": {
"type": "API",
"authenticationMethod": "OAuth",
"specification": "OAS",
"format": "JSON",
"specURL": "https://swagger.com/petstore.json",
"documentationURL": "http://192.168.10.1/test/docs/dataaccess"
},
"SLA": {
"updateFrequency": {
"unit": "hours",
"value": 1
},
"uptime": {
"unit": "percentage",
"value": 99
},
"responseTime": {
"unit": "milliseconds",
"value": 200
},
"nullValues": {
"unit": "percentage",
"value": 0.01
},
"support": {
"company": {
"phoneNumber": "",
"phoneServiceHours": "",
"chatURL": "",
"chatServiceHours": "",
"chatResponseTime": "",
"email": "",
"emailServiceHours": "",
"emailResponseTime": "",
"documentationURL": "",
"guidesURL": ""
},
"community": {
"stackoverflowURL": "",
"forumURL": "",
"slackURL": "",
"twitterURL": ""
}
},
"observability": {
"logsURL": "https://logs.com",
"dashboardURL": "https://dashboard.com",
"uptimeURL": "https://uptime.com"
}
},
"license": {
"scope": {
"definition": "The purpose of this license is to determine the terms and conditions applicable to the licensing of the data product, whereby Data Holder grants Data User the right to use the data.",
"language": "en-us",
"permanent": false,
"terminationContitions": "Cancellation before 30 days. After the expiry of the right of use, the product and its derivatives must be removed.",
"continuityConditions": "Expired license will automatically continued without written cancellation (termination) by Data Holder",
"restrictions": "Data User agrees not to, directly or indirectly, participate in the unauthorized use, disclosure or conversion of any confidential information.",
"geographicalArea": [
"EU",
"US"
],
"modificationRight": true,
"resellingRight": true
},
"governance": {
"containsPersonalData": true,
"dpaURL": "http://192.168.10.1/dpaconditions",
"audit": "Data Holder will reasonably cooperate with Data User by providing available additional information concerning the data product. Each party will bear its own costs with respect to the audit procedures.",
"warranties": "Data Holder makes no warranties, express or implied, guarantees or conditions with respect to your use of the data product. To the extent permitted under local law, Data Holder disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental or punitive, resulting from Data User use of the data product.",
"forceMajeure": "Each party may suspend the fulfilment of its contractual obligations, when the said fulfilment is impossible or objectively too costly due to an unforeseeable impediment independent from the parties, such as for example: strike, boycott, lockout, fire, war (declared or not), civil war, riots and revolutions, requisitions, embargo, power blackouts, extraordinary breakage of machinery, delays in the delivery of components or raw materials.",
"damages": "During the term of license, except for the force majeure or the Data Holders reasons, Data User is required to follow strictly in accordance with the license. If Data User wants to terminate the license early, it needs to pay a certain amount of liquidated damages.",
"confidentiality": "Data User undertakes to maintain confidentiality as regards all information of a technical (such as, by way of a non-limiting example, drawings, tables, documentation, formulas and correspondence) and commercial nature (including contractual conditions, prices, payment conditions) gained during the performance of this license.",
"applicableLaws": "This license shall be interpreted, construed and enforced in accordance with the law of Finland, Incl. Copyright Act 404/1961."
}
},
"dataHolder": {
"taxID": "12243434-12",
"vatID": "12243434-12",
"logoURL": "https://mindmote.fi/logo.png",
"description": "Digital Economy services and tools",
"URL": "https://mindmote.fi",
"telephone": "+35845 0232 2323",
"streetAddress": "Koulukatu 1",
"postalCode": "33100",
"addressRegion": "Pirkanmaa",
"addressLocality": "Tampere",
"addressCountry": "Finland",
"aggregateRating": "",
"ratingCount": 1245,
"slogan": "",
"parentOrganization": ""
}
}
}
Terms used
Here's list of terms used and what we mean with them. The meaning of terms is mostly taken from existing knowledge eg articles and other trusted sources. The source is always linked to the term. In some rare cases term is defined for the specification purposes only.
Term |
Description |
---|---|
Data point | |
Data product | As a strategic resource for companies, data is considered an asset that, like any other material good, has a financial value and whose management generates costs. Data created, collected or used in individual business processes can be sold to other organisations as raw or processed data, so that it no longer serves as an enabler of products, but is the product itself. This leads to the paradigm that data assets can be monetised by exchanging and trading data between organisations as data products and services. There are multiple definitions for data product. In an article authored by Jian Pei (2020), data products "refer to data sets as products and information services derived from data sets." Simon O'Regan's defines data product as a product whose primary objective is to use data to facilitate an end goal. From the academic literature we have found several subtypes of data products: raw data, derived data, data sets, reports, analytic views, 3D visualisations, algorithms, decision support (dashboards) and automated decision-making (Netflix product recommendations or Spotify’s Discover Weekly would be common examples). Typically raw data, derived data and algorithms have technical users. Most often they tend to be internal products in an organisation. If we dive in the data mesh world, this quote from Zhamak Dehghani’s book is key to understand the definition of data as a product: “Domain data teams must apply product thinking […] to the datasets that they provide; considering their data assets as their products and the rest of the organization’s data scientists, ML and data engineers as their customers.” While many of the standard Product Development Rules apply — solve a customer need, learn from feedback, prioritise relentlessly, etc. — data has different characteristics compared to tangible products that prevent the direct transfer of established processes and rules of trading goods, especially in terms of pricing mechanisms. In trading data, there is less willingness to pay. For example, data buyers often do not recognise the potential value of data items because it cannot be fully disclosed prior to purchase (known as the ‘Arrow paradox’). In addition, there is often a lack of notion that the creation, processing, storage and distribution of high-quality data is a major cost factor for the data provider. Another obstacle is the lack of trust and security causing potential data providers to fear that competitors could benefit from disclosure of in-house data. One of the aims of this specification is to tackle above mentioned issues which hinder the growth of data ecosystem and market volatility. |
Data as a service | In computing, data as a service, or DaaS, is a term used to describe cloud-based software tools used for working with data, such as managing data in a data warehouse or analyzing data with business intelligence. It is enabled by software as a service (SaaS). DaaS like all "as a service" (aaS) technology, builds on the concept that its data product can be provided to the user on demand, regardless of geographic or organizational separation between provider and consumer. According to Daniel Newman from Forbes (2017) DaaS is essentially a data stream that subscribers can access on demand. Some people use the term data product in a meaning which contains also data commodities which have more service alike attributes than product attributes. In those cases we prefer to use the term data as a service and call the creation process as data servitization. The term productizement is reserved for the process which creates data products as end result. |
Data as a service business model | Data as a service as a business model is a concept when two or more organizations buy, sell, or trade machine-readable data in exchange for something of value. Data as a service is a general term that encompasses data-related services. Now DaaS service providers are replacing traditional data analytics services or happily clustering with existing services to offer more value-addition to customers. The DaaS providers are curating, aggregating, analyzing multi-source data in order to provide additional more valuable analytical data or information. Typically, DaaS business is based on subscriptions and customers pay for a package of services or definite services. |
Data pipeline | According to Aiswarya et al. the complex chain of interconnected activities or processes from data gen- eration through data reception constitutes a data pipeline. In other words, data pipelines are the connected chain of processes where the output of one or more processes becomes an input for another. It is a piece of software that removes many manual steps from the workflow and permits a streamlined, automated flow of data from one node to another. Moreover, it automates the operations involved in the selection, extraction, transformation, aggregation, validation, and loading of data for further analysis and visualization. It offers end to end speed by removing errors and resisting bottlenecks or delay. Data pipelines can process multiple streams of data simultaneously |
Editors and contributors
This specification is openly developed and a lot of the work comes from community. We list all community contributors as a sign of appreciation. The editors (as initial creators of the the specification) are Jarkko Moilanen and Jussi Niilahti. Editors take the feedback and draft new candidate releases, which may become the versions of the specification.
List of community contributors
The work around the specification would not be possible without enormous help from the community. Here's list of contributors so far.
- Toni Luhti
- Topi Santakivi