09 November 2024

Policy Brief: What is the Value of Data? - Coyle & Manley (2022)

Overview: This brief explores various methods for valuing data, acknowledging limitations in existing models and emphasizing the need for more comprehensive approaches. The Typology of valuation methods in Part (2) below (and page 4 in brief) is especially helpful.

Comment: The global economy has been reshaped by data, with data-driven firms becoming dominant market leaders. This transformation extends to both private and public sectors. Although data’s value is widely acknowledged, there remains no consensus on how to quantify that value. This lack of clarity hinders optimal investments and governance.

This report builds on prior work (reviewed earlier in this blog - Ed) at the Bennett Institute for Public Policy (Coyle et al 2020). This article covers development of new methods of data value measurement.

Coyle, D. and A. Manley, Policy Brief: What is the Value of Data? A review of empirical methods, July 2022, Bennett Institute for Public Policy, University of Cambridge, July 2022: https://www.bennettinstitute.cam.ac.uk/wp-content/uploads/2022/07/policy-brief_what-is-the-value-of-data.pdf

  • Part (1) Discusses the value of economically valuable data to public and private sectors
  • Part (2) Reviews proposed data valuation methodologies
  • Part (3) Presents a framework to develop an estimate of value

===============================

Part (1) Introduction

In recent years, data has become a key driver of economic transformation, with data-driven companies making up seven of the top 10 firms globally by market capitalization in 2021. This shift is particularly evident in the growing productivity and profitability gap between data-intensive firms and others.

Data’s value is increasingly recognized across sectors, including the public sector. Despite this recognition, no consensus has developed to empirically measure the value of data, hindering its full potential. While many firms and investors acknowledge the value of data, particularly through data services and stock market evaluations, the absence of clear valuation methods makes it difficult to guide investment decisions or govern data usage effectively. Coyle & Manley's report explores various approaches to data valuation; and highlights the challenges of incorporating factors like opportunity costs, risks, and the costs associated with data collection and storage into such assessments.

===============================

Part (2) Proposed Data Valuation Methodologies

Coyle (2020) presented the Lens framework, describing data through the Economic Lens and the Information Lens. 

Building on prior research, the report identifies key characteristics influencing data's value and outlines valuation approaches. Traditional methods—cost-based, income-based, and market-based—are commonly used but fail to fully account for other inputs. 

The report also highlights newer approaches which improve capture of data's broader economic value, such as ascribing value using data flows and marketplaces. Comparative methods are cited, including:

• Internet of Water: A taxonomy of various data valuation methods (2018).

• Ker and Mazzini (2020) identified four different methods: (a) cost-based; (b) income-based approach; (c) market capitalisation; and (c) trade flows.

• OECD Going Digital Toolkit: Estimates the value of data, summarizes the System of National Accounts cost-based frameworks being adopted by governments; and summarized other approaches.

The methods are summarized as:

  • 2.1 Cost-based Methods
  • 2.2 Income-based Methods
  • 2.3 Market-based methods (Marketplaces, Market capitalisation, Data Flows) 
  • 2.4 "Ambiguity-driven" methods (term coined by blogger - Ed)
  • 2.5 Impact-Based Methods

-----------------------------

2.1 Cost-based Methods

These methods calculate the costs of generating, storing, and replacing data, providing a lower-bound estimate of its value. Variants like the Modified Historical Cost Method (MHCM) adjust for data characteristics, and the consumption-based method reflects usage rates. National statistical offices, such as Statistics Canada and the UK Office for National Statistics, have trialed this method. Cost-based methods are widely used for valuing data, rooted in the System of National Accounts (SNA) (1).

[Government interest is to define the value to generate taxes - Ed]. Their challenge is that "national level cost-based approaches rely on having well-classified data at the microlevel. This will be difficult to achieve and there are several blurred lines that make classification harder." (p.5-7) 

2.2 Income-based Methods

These methods estimate data's value through expected revenue streams generated by the data, such as selling marketing analytics. A common approach is the "relief from royalty" method, which estimates savings from owning data rather than licensing it. However, challenges arise in distinguishing data's contribution to revenue, especially for firms where data enhances products rather than being sold directly. This method also introduces uncertainty as it relies on judgment.

2.3 Market-based methods

These methods use observable prices for data, though such prices are rare since most data is used internally. When available, market prices offer valuable insights but reflect only a partial estimate of the broader social value of data. Key academic approaches include using data marketplaces, market capitalization of firms, and global data flows to estimate value. However, limitations remain, especially when data is aggregated or traded in complex ecosystems like credit scoring, where the true value often exceeds the sum of its parts.

2.3.1: Data Marketplaces

The literature on data marketplaces explores their potential to increase the value of data by reducing transaction costs, improving pricing transparency, and allowing multiple users to derive value from the same datasets. However, the success of such initiatives has been inconsistent, with key challenges including complex pricing mechanisms, regulatory differences, and a lack of trust. Data suppliers often bundle datasets and set prices based on consumer willingness to pay, but much of the literature remains theoretical and idealistic. Case studies from China, New Zealand, the EU, and Colombia show that effective data pricing and trust in data quality are critical for success, yet low participation often undermines marketplace efforts. Historical examples, such as Microsoft's failed Azure DataMarket, highlight the difficulty in building customer interest, while current platforms like the Shanghai Data Exchange and Ocean Market demonstrate varying approaches to data transaction and pricing. Ultimately, while data marketplaces hold significant potential, barriers such as trust, pricing, and regulation continue to limit their effectiveness.

2.3.2 Market capitalization-based

These methods are used to value data by examining its impact on a firm's market value, particularly for data-driven companies. This approach estimates the worth of these firms by looking at their overall market capitalization, which includes the value derived from data and analytics. For example, Ker and Mazzini (2020) estimate that U.S. data-driven firms, identified through lists like "The Cloud 100," are collectively worth over $5 trillion. Coyle and Li (2021) further build on this by analyzing how data-driven companies, such as Airbnb, disrupt traditional firms like Marriott, leading to the depreciation of incumbents' organizational capital. The decline in the value of non-digital firms' organizational capital due to data-driven competition helps estimate how much these firms should be willing to pay for data. Overall, these methods provide a way to quantify the value of data in relation to firm competitiveness and valuation in data-intensive industries.

2.3.3: Data Flows

Data flows are valuable in markets with limited information, as they can be observed and analyzed, with a strong correlation between the volume of data flow and its value on dominant online platforms. However, quantifying global data flows is challenging because most assessments only account for data that crosses international borders. While Ker and Mazzini (2020) and Coyle and Li (2021) suggest that the link between data flow volume and data value in specific locations is weak, due to large data hubs serving broad areas and the need for local knowledge, they highlight the economic significance of the content in data flows over volume alone. For example, video streaming generates more traffic than e-commerce but contributes less economic value. Their research also underscores the economic importance of digitally deliverable products, noting that many countries still lack a framework for categorizing digital trade. Overall, while data flows help understand data's economic value, their measurement is still evolving, hindered by definitional and geographical complexities. 

-----------------------------

Section 2.4: Ambiguity Methods: 

This section discusses Experiments and Surveys to estimate value where market prices do not exist. 

-----------------------------

2.5: Impact-Based Methods

The intent of data collection is to develop stories from which to mine insights for decision-making. Impact-based methods assess the value of data using cause-and-effect measurement. These create greater value-add, "making their value more persuasive than traditional quantitative approaches." These methods use testing, such as comparative scenarios, to test response. Slotin (2018) reviews five data valuation methodologies, favoring impact-based approaches for their clarity and communicative strength. These include:

(a) Empirical Studies: Arrieta-Ibarra et al. (2020) estimated that "data use accounts for up to 47% of Uber’s revenue. In a scenario where drivers are fully compensated for their data, this could equate to $30 per driver per day for data generation."

(b) Decision-Based Valuation: A variant of empirical-based studies, this method "adjusts the value of data based on factors like frequency, accuracy, and quality before weighing outcomes by their contribution to decisions. This method acknowledges that value derives from improved decision-making and considers alternatives to using the data, although it requires subjective judgment."

(c) Shapley values: This variant ignores the determination of value-add of insights. Instead, Shapley values represent a subset of impact-based methods that focus on valuing data in its raw form rather than solely based on its applications in data-driven insights. Originating from game theory, Shapley values provide a unique payoff solution within a public good game, ensuring group rationality, fairness, and additivity. 

Shapley values are used in computer science to evaluate the contribution of individual data points to model performance, assess data quality, and optimize feature selection. They are also applied to determine compensation for data providers by quantifying the value of their data.  Although Shapley values provide a useful framework for valuing data, they represent only one possible solution to public good problems, with alternative approaches that might offer better properties. The method has advantages, such as identifying valuable data for collection, but also faces drawbacks, including high computational costs and challenges in translating value into monetary terms. 

(d) Direct measurable economic impact: Various studies analyze the growth and jobs impact.

(e) Stakeholder-based methods: These methods analyze the value to the sector supply chain of data availability: "This is a wider definition of value and may include value upstream or downstream omitted from other methods; it can encompass the non-rival aspect of data. The data consultancy Anmut is one that has developed this method and provides a case study of their valuation of Highways England data (Anmut n.d.). The challenge of this approach is that it requires professional judgment of value vs. auditable measurements of value.

(f) Real options analysis: Real options analysis provides a method for estimating the value of data by considering its potential future use cases rather than its current applications. This flexibility enables firms to capitalize on positive opportunities while minimizing downside risks. Data is considered non-rival, meaning its value does not diminish with use, and its potential applications can remain undefined at the time of collection. The option value represents the "right but not the obligation" to generate insights from data in the future, allowing firms to collect data for unknown future purposes. The value here is that this incentivizes waiting to determine data value in advance; by assessing the impact of more new information (policy changes, tech changes, shifts in consumer preferences) before deciding whether to analyze the data. 

===============================

3. Discussion

Policymakers are stuck because there is no "consensus or best method for valuing data." The authors instead propose setting up a schema to classify methods and to validate them with external surveys. The schema goal is to determine: 

  1. What is being valued?
  2. Who is valuing the data?
  3. When is the valuation taking place?
  4. What is the purpose of the valuation?
-----------------------------

3.1 What is being valued?

There are a variety of things that could be referred to as ‘data.’ The possible distinctions are illustrated in the ‘data value chain,’ (see blog review of Coyle, 2020) which sets out different stages from the generation of raw data up to the decisions made using data insights generating the potential end-user value. The authors note that: "In general, the raw data is of least interest, and some of the literature goes as far as to state that raw data does not hold any value on its own...(as) even with cost-based methods, in many ways the most straightforward approach, it is almost impossible to distinguish between costs associated with raw data generation and database formation." 

[Note: OrbMB's ORBintel method illuminates the costs and we forecast developing the means to establish the value of raw data that is to be collected, in advance of the need - Ed.]

-----------------------------

3.2 Who is Valuing the Data? 

This section explores how the perspective of different stakeholders affects the methods used to value data (Data Producers, Private Sector Producers, Data Users, Data Hubs, Public Sector vs. Private Sector Valuation, Intangible Asset and Productivity).

The key point is that Data serves as an intangible asset that provides firms with a productivity advantage, especially when coupled with complementary skills. Firms that capture monopoly rents from data tend to value it higher than alternative users, leading to a divergence between private and social valuations of data.

Valuation varies significantly based on the perspectives of different stakeholders, with public sector approaches emphasizing societal value and the monopoly rent that is taxation, while private sector valuations focus on costs and impacts. Understanding these differing perspectives is crucial for accurately assessing data's overall worth.

-----------------------------

3.3: When is the Valuation Taking Place? 

This section distinguishes between ex ante (before the event) and ex post (after the event) data valuation methods: Ex Ante Valuations are fraught with uncertainty, therefore risk and cost; therefore less widely used. Ex Post Valuations mitigate the uncertainties.

-----------------------------

3.4 What is the purpose of the valuation? 

The authors contend that the purpose influences the choice of methodologies, with different approaches suited to different goals. 

===============================

References:

(1) SNA is the United Nations (UN) framework setting “the internationally agreed standard set of recommendations on how to compile measures of economic activity.” The framework establishes consistent accounting rules and classifications, to make multistate comparison possible. The next update (2025) will include an update on data valuation. 



09 October 2024

More than Dual-Use?

We've accepted an invitation to join the Peachscore+Gust Data-driven Accelerator. This month's post was to be a review of Cambridge researcher Diane Coyle's (2022) valuation analysis; and we are juggling. So, for this month, a short post about an observation about market sectors.

The Root Taxonomy of Goods and Services - More than Dual-Use?

Goods and services generally get classed into three sector classes: 

  • Civilian
  • Military/national security
  • Dual-Use (use cases in both sectors)

This has never been really exactly true, as there are numerous goods and services that are not purely civilian consumer (individual & household) offers. These include various public administration and safety segments ranging from police and rescue to wastewater and pothole maintenance.

The better structure might be to say that there are five sector classes:

  1. Civilian;
  2. Civil Aid; and 
  3. Military/National Security:
where:
  • #1, #2, and #3 are (#4) Triple-Use; and
  • #2 and #3 are (#5) Dual-Use:

Consider the carabiner: 


Comments?





28 August 2024

The Value of Data – Policy Implications – Main report – Coyle (2020)

Overview: This is a fascinating contribution to the literature. The discussion about content ("Information Lens") driving utility is especially useful to understand where and how to determine when to apply cost/benefit analyses. The authors' proposed framework clearly expresses the need to incentivize private innovation to concurrently aid the public good - a partnership of private, non-profit and public stakeholders. 


Comment: The previous articles are concerned with the internal business value of data, and the economic value of data for statistical purposes. This article dives into the economic value of data to the State. The perspective is that of public sector economists discussing opportunities and concerns for the United Kingdom: noting that: "A greater understanding of the value of data would help identify where the benefits of greater investment in and sharing of data are worth the costs;" and that in this view, there is need for regulation “for establishing a trustworthy institutional framework for managing, monitoring and enforcing the terms of access.” 

Here, data is defined as intangible; and is generally viewed as a homogeneous economic good (Note 1); this leads to the view that valuation is not that important because market pricing sets the value.  

The authors note the absence of costing because “available empirical studies use market valuations or transactions” to estimate value. These authors note that “the value of different types…can be very different”; and value is tied to the use case (what is the utility?), which they contend leads to a need to determine utility (public, social, private) in which there is a public interest. Finally, the authors call for greater regulation to obtain greater “social welfare value” and to prevent private asymmetric information advantage. But recognize that regulation will affect ROI due to the need to collect and clean data, and to invest to develop complementary skills and assets. 

Note: The degree of not-sharing is often key to the profitable continuance of a private enterprise - Ed.

Coyle, D., S. Diepeveen, J. Wdowin, J. Tennison, and L. Kay, (2020) The Value of Data – Policy Implications – Main Report. Bennett Institute for Public Policy, University of Cambridge, and Open Data Institute, February 2020: https://www.bennettinstitute.cam.ac.uk/ publications/value-data-policy-implications/ (Accessed Q12024)

Next month, we review Coyle et al (2022). 

 ____________________ 

  • Part (1) Introduces the public policy interest in data value 
  • Part (2) Describes current valuation taxonomies and develops a two-lens framework, commencing with the Economic Lens 
  • Part (3) Deep dives into the second lens: the Information Lens 
  • Part (4) Provides an overview of the current UK legal framework 
  • Part (5) Discusses the features driving Market-based Valuation 
  • Parts (6,7,8) Delve into three related issues 
  • Part (9) Presents conclusions and recommends future stakeholder work plans 

____________________

Part (1) Introduction: The subject of this paper is to discuss the policy interest in data value; and to develop a schema to determine the value of data that is made available for public purposes that encompass all sectors of society.

 

(A)    The policy interest has two dimensions:

 

(a) Governments need to use data to make policy decisions; and

(b) Governments need to understand the value of the transaction of sharing (or not sharing) data, in the context of: 

 

(i) Understanding the value (and impact) of data transactions to government and society, and

(ii) the worrisome economic problem of private actors mining public data for private purposes at no cost (i.e. taking it for free to modify & resell or to restrict the sale of the improved resource to a limited clientele) [Similar to other State-resources that are licensed to extractors, who require return-on-investment to bear the cost of bringing the refined resource to market - ed.].

 

(B) The schema to determine data value consists of two lenses: Economic and Information (Note 2).

 

 

Part (2) Taxonomies and a new framework: This section describes current taxonomies of valuation; and then discusses part one of the new framework: the Economic Lens (The distinctive economic characteristics of data)

 

This section introduces the reader to the economic problem of valuing data, a resource which arises from "the creation of value from data of different kinds, its capture by different entities, and its distribution". There is a review of the intangible nature of the resource, high-level use cases, and the impact of externalities (impacts to data which "are often positive, such as additional data improving predictive accuracy, or enhancing the information content of other data": p.5).

 

Data is described as a non-rival asset (the same asset can be used by many users) which can be “closed, shared, or open. If access to data is restricted, its uses are limited; i.e. it becomes a private good. If it is shared with a select group of people - it is a club good - its uses and analysis can be wider, perhaps creating more value. If data is shared openly - a public good - anyone can use it.” [The economic actors who participate in the life of the State each can individually control data as a private good, group good, and public good. This includes the State, where-for example-records that are restricted are variously a private good, group good, and public good. - ed.]

 

The authors note that the unique ("intangible") nature of data gives it low utility if not shared, and therefore must be shared or licensed to make good use of the resource. The economic perspective expressed here is that data "is not best thought of as owned or exchanged;...that personal ‘ownership’ is an inappropriate concept for data (and that characterising data as ‘the new oil’ is similarly misleading)." [Note: this is for national statistical accounts and public services; “ownership” and “exchange” is how value is expressed by private buyers and sellers - ed].

 

The second part is to develop five new characteristics, to develop an improved valuation schema: (a) Marginal return; (b) Externalities; (c) Optionality; (d) Consequences; (e) Costs.

 

Part (3) discusses part two: the Information lens (the determination of 'economic utility' value):

 

The Information Lens is defined as the determination of utility to express economic value. Use cases are organized to fit five subject areas, and then five sub-frameworks:

  • (1a) People
  • (1b) Organization Type
  • (1c) Natural environment
  • (1d) Built environment; and
  • (1e) the type of goods or services being offered.

(2) Use Cases are organized into five sub-frameworks, and there is an example table:

  • (2a) Generality of the use case (as a non-rival good, data is repeatedly useful for many purposes; "Generality" is a determination of use case repeatability);
  • (2b) Granularity (this is the degree to which data is "filtered, aggregated and combined in different ways to reveal different insights" p.8)
  • (2c) Geo-spatial coverage: the area that data refers to limits or develops its utility.
  • (2d) Temporal coverage: data utility is time-bound;
  • (2e) Human User Groups: This sub-schema organizes human users into three groups: Planners (ex. city planner, Operators (commuters), and Historians (police investigating a crime).

The section continues with a review of technical characteristics that contribute to valuations. These include: Quality (p.10), Sensitivity and personal data (p.10), Interoperability and linkability (developing standards and common identifiers to improve aggregation) (p.10), Excludability (data that is not common and so much be actively shared (p.11), and Accessibility (the degree to which data dissemination is organized and executed) (p.11). 

 

Here, the Open Data Institute's Data Spectrum is charted to identify "some of the access conditions determining whether data is a private, shared, or public good. Access conditions can be determined by technology, licensing or terms and conditions, and regulation." (p.12).

Part (4) provides an overview of the current UK legal framework, which includes Intellectual property rights and licensing (p.14), Intellectual property rights in public sector information (p.15), and Data protection rights (p.16).

Part (5) dives into the features driving Market-based valuation of data.

 

The primary methods are to use Market Price methods to estimate value:

 

(a) Stock market valuations;

(b) Income-based;

(c) Cost-based valuation methods.

 

Method (a) is to compare data-driven companies and non-data-driven companies; analyses suggest that the former have become more valuable that the latter.

 

Method (b) is to make an estimate of future cash flows (Free Cash Flow or FCF) derived from the asset (Note 3) where the "data value chain" has been developed to visualize this method. The FCF method can be successfully exploited by e-commerce companies such as Amazon that enjoy feedback loops; where insights drawn from data help improve the customer experience, which improves per customer unit sales. Note: Many companies in the space obtain customer data as a free good (not paid for).

 

Method (c): The public sector's cost approach is to estimate "the aggregate value of data to the economy in the national accounts, as there are relatively few market sales of datasets, with most being generated within the business in the process of providing other goods and services."

Parts (6,7,8) Delve into existing non-market estimates, Creating value through open and shared data, and Institutions for the data economy.

 

The authors note that: "Market valuations thus provide useful information but do not capture the full social value of data." Their contention is that-- subject to trade-offs' analysis--it would be economically better to create value through open data sharing. Here, citing O'Neill, the authors note that "the real or perceived crisis of trust in many societies reflects suspicion of authority" which—to this reader—suggests that the survival of a system of government that is trusted requires honesty and integrity in data management per "the social and legal ‘permissions’" of the society.   

 

Trade-offs to consider are the need to:

  • incentivize investment and innovation,
  • maintain data security, and the related need to
  • protect personally and commercially sensitive data. 

It is hard to develop and adhere to this form of social navigation. The authors suggest three frameworks:

  • Elinor Ostrom’s framework for the management of shared resources: [chart: Ostrom’s principles]
  • Paying new attention to creating proper Data Infrastructure (vs. leaving it all to private actors);
  • Establishing data trusts, data pools, and brokerages (to create a level playing field of data access).

Part (9) presents conclusions and future work plans.  

The authors note that “the quality of data is a key challenge” in-and-of-itself; and that the “quality needed depends on what the data is being used for.” The authors also state that: “Asymmetries of information mean that contracts for data use are incomplete, and the regulatory framework should recognise this, particularly that schemes for sharing data in a regulated way change the returns on investment in collecting and cleaning data, and investing in complementary skills and assets.” In this context, the authors express the need to develop different use cases to determine utility (public, social, private) in which there is a public interest; whilst specifying the need to incentivize private investment to efficiently deliver data goods and services for every sector of society. The proposals here are to:

 

·         Incentivise investment without disincentivising sharing

·         Limit exclusive access to public sector data

·         Use competition policy to distribute value

·         Explore mandating access to private sector data

·         Provide a trustworthy institutional and regulatory environment

·         Simplify data regulation and licensing

·         Monitor impacts and iterate

----------------

(1) The definition is derived from the UN System of National Accounts or SNA.

 (2) "Lens analysis requires you to distill a concept, theory, method or claim from a text (i.e. the “lens”) and then use it to interpret, analyze, or explore something else" cf.: https://pressbooks.cuny.edu/qcenglish130writingguides/chapter/lens-analysis/

 
(3) cf. 26 C. Mawer, “Valuing data is hard,” Silicon Valley Data science blog post (2015). Accessed by the authors at: https://www.svds.com/valuing-data-is-hard/. See also C. Corrado, “Data as an Asset,” presentation at EMAEE 2019 Conference on the Economics, Governance and Management of AI, Robots and Digital Transformation, 2019; and M. Savona, “The Value of Data: Towards a Framework to Redistribute It,” SPRU Working Paper 2019-21 (October 2019).
 
(4) cf: O. O’Neill, “A Question of Trust,” BBC Radio 4 (2002). Accessed by the authors at: www.bbc.co.uk/radio4/reith2002/


Policy Brief: What is the Value of Data? - Coyle & Manley (2022)

Overview:  This brief explores various methods for valuing data, acknowledging limitations in existing models and emphasizing the need for m...