Overview: This brief explores various methods for valuing data, acknowledging limitations in existing models and emphasizing the need for more comprehensive approaches. The Typology of valuation methods in Part (2) below (and page 4 in brief) is especially helpful.
Comment: The global economy has been reshaped by data, with data-driven firms becoming dominant market leaders. This transformation extends to both private and public sectors. Although data’s value is widely acknowledged, there remains no consensus on how to quantify that value. This lack of clarity hinders optimal investments and governance.
This report builds on prior work (reviewed earlier in this blog - Ed) at the Bennett Institute for Public Policy (Coyle et al 2020). This article covers development of new methods of data value measurement.
Coyle, D. and A. Manley, Policy Brief: What is the Value of Data? A review of empirical methods, July 2022, Bennett Institute for Public Policy, University of Cambridge, July 2022: https://www.bennettinstitute.cam.ac.uk/wp-content/uploads/2022/07/policy-brief_what-is-the-value-of-data.pdf
- Part (1) Discusses the value of economically valuable data to public and private sectors
- Part (2) Reviews proposed data valuation methodologies
- Part (3) Presents a framework to develop an estimate of value
===============================
Part (1) Introduction
In recent years, data has become a key driver of economic transformation, with data-driven companies making up seven of the top 10 firms globally by market capitalization in 2021. This shift is particularly evident in the growing productivity and profitability gap between data-intensive firms and others.
Data’s value is increasingly recognized across sectors, including the public sector. Despite this recognition, no consensus has developed to empirically measure the value of data, hindering its full potential. While many firms and investors acknowledge the value of data, particularly through data services and stock market evaluations, the absence of clear valuation methods makes it difficult to guide investment decisions or govern data usage effectively. Coyle & Manley's report explores various approaches to data valuation; and highlights the challenges of incorporating factors like opportunity costs, risks, and the costs associated with data collection and storage into such assessments.
===============================
Part (2) Proposed Data Valuation Methodologies
Coyle (2020) presented the Lens framework, describing data through the Economic Lens and the Information Lens.
Building on prior research, the report identifies key characteristics influencing data's value and outlines valuation approaches. Traditional methods—cost-based, income-based, and market-based—are commonly used but fail to fully account for other inputs.
The report also highlights newer approaches which improve capture of data's broader economic value, such as ascribing value using data flows and marketplaces. Comparative methods are cited, including:
• Internet of Water: A taxonomy of various data valuation methods (2018).
• Ker and Mazzini (2020) identified four different methods: (a) cost-based; (b) income-based approach; (c) market capitalisation; and (c) trade flows.
• OECD Going Digital Toolkit: Estimates the value of data, summarizes the System of National Accounts cost-based frameworks being adopted by governments; and summarized other approaches.
The methods are summarized as:
- 2.1 Cost-based Methods
- 2.2 Income-based Methods
- 2.3 Market-based methods (Marketplaces, Market capitalisation, Data Flows)
- 2.4 "Ambiguity-driven" methods (term coined by blogger - Ed)
- 2.5 Impact-Based Methods
-----------------------------
2.1 Cost-based Methods
These methods calculate the costs of generating, storing, and replacing data, providing a lower-bound estimate of its value. Variants like the Modified Historical Cost Method (MHCM) adjust for data characteristics, and the consumption-based method reflects usage rates. National statistical offices, such as Statistics Canada and the UK Office for National Statistics, have trialed this method. Cost-based methods are widely used for valuing data, rooted in the System of National Accounts (SNA) (1).
[Government interest is to define the value to generate taxes - Ed]. Their challenge is that "national level cost-based approaches rely on having well-classified data at the microlevel. This will be difficult to achieve and there are several blurred lines that make classification harder." (p.5-7)
2.2 Income-based Methods
These methods estimate data's value through expected revenue streams generated by the data, such as selling marketing analytics. A common approach is the "relief from royalty" method, which estimates savings from owning data rather than licensing it. However, challenges arise in distinguishing data's contribution to revenue, especially for firms where data enhances products rather than being sold directly. This method also introduces uncertainty as it relies on judgment.
2.3 Market-based methods
These methods use observable prices for data, though such prices are rare since most data is used internally. When available, market prices offer valuable insights but reflect only a partial estimate of the broader social value of data. Key academic approaches include using data marketplaces, market capitalization of firms, and global data flows to estimate value. However, limitations remain, especially when data is aggregated or traded in complex ecosystems like credit scoring, where the true value often exceeds the sum of its parts.
2.3.1: Data Marketplaces
The literature on data marketplaces explores their potential to increase the value of data by reducing transaction costs, improving pricing transparency, and allowing multiple users to derive value from the same datasets. However, the success of such initiatives has been inconsistent, with key challenges including complex pricing mechanisms, regulatory differences, and a lack of trust. Data suppliers often bundle datasets and set prices based on consumer willingness to pay, but much of the literature remains theoretical and idealistic. Case studies from China, New Zealand, the EU, and Colombia show that effective data pricing and trust in data quality are critical for success, yet low participation often undermines marketplace efforts. Historical examples, such as Microsoft's failed Azure DataMarket, highlight the difficulty in building customer interest, while current platforms like the Shanghai Data Exchange and Ocean Market demonstrate varying approaches to data transaction and pricing. Ultimately, while data marketplaces hold significant potential, barriers such as trust, pricing, and regulation continue to limit their effectiveness.
2.3.2 Market capitalization-based
These methods are used to value data by examining its impact on a firm's market value, particularly for data-driven companies. This approach estimates the worth of these firms by looking at their overall market capitalization, which includes the value derived from data and analytics. For example, Ker and Mazzini (2020) estimate that U.S. data-driven firms, identified through lists like "The Cloud 100," are collectively worth over $5 trillion. Coyle and Li (2021) further build on this by analyzing how data-driven companies, such as Airbnb, disrupt traditional firms like Marriott, leading to the depreciation of incumbents' organizational capital. The decline in the value of non-digital firms' organizational capital due to data-driven competition helps estimate how much these firms should be willing to pay for data. Overall, these methods provide a way to quantify the value of data in relation to firm competitiveness and valuation in data-intensive industries.
2.3.3: Data Flows
Data flows are valuable in markets with limited information, as they can be observed and analyzed, with a strong correlation between the volume of data flow and its value on dominant online platforms. However, quantifying global data flows is challenging because most assessments only account for data that crosses international borders. While Ker and Mazzini (2020) and Coyle and Li (2021) suggest that the link between data flow volume and data value in specific locations is weak, due to large data hubs serving broad areas and the need for local knowledge, they highlight the economic significance of the content in data flows over volume alone. For example, video streaming generates more traffic than e-commerce but contributes less economic value. Their research also underscores the economic importance of digitally deliverable products, noting that many countries still lack a framework for categorizing digital trade. Overall, while data flows help understand data's economic value, their measurement is still evolving, hindered by definitional and geographical complexities.
-----------------------------
Section 2.4: Ambiguity Methods:
This section discusses Experiments and Surveys to estimate value where market prices do not exist.
-----------------------------
2.5: Impact-Based Methods
The intent of data collection is to develop stories from which to mine insights for decision-making. Impact-based methods assess the value of data using cause-and-effect measurement. These create greater value-add, "making their value more persuasive than traditional quantitative approaches." These methods use testing, such as comparative scenarios, to test response. Slotin (2018) reviews five data valuation methodologies, favoring impact-based approaches for their clarity and communicative strength. These include:
(a) Empirical Studies: Arrieta-Ibarra et al. (2020) estimated that "data use accounts for up to 47% of Uber’s revenue. In a scenario where drivers are fully compensated for their data, this could equate to $30 per driver per day for data generation."
(b) Decision-Based Valuation: A variant of empirical-based studies, this method "adjusts the value of data based on factors like frequency, accuracy, and quality before weighing outcomes by their contribution to decisions. This method acknowledges that value derives from improved decision-making and considers alternatives to using the data, although it requires subjective judgment."
(c) Shapley values: This variant ignores the determination of value-add of insights. Instead, Shapley values represent a subset of impact-based methods that focus on valuing data in its raw form rather than solely based on its applications in data-driven insights. Originating from game theory, Shapley values provide a unique payoff solution within a public good game, ensuring group rationality, fairness, and additivity.
Shapley values are used in computer science to evaluate the contribution of individual data points to model performance, assess data quality, and optimize feature selection. They are also applied to determine compensation for data providers by quantifying the value of their data. Although Shapley values provide a useful framework for valuing data, they represent only one possible solution to public good problems, with alternative approaches that might offer better properties. The method has advantages, such as identifying valuable data for collection, but also faces drawbacks, including high computational costs and challenges in translating value into monetary terms.
(d) Direct measurable economic impact: Various studies analyze the growth and jobs impact.
(e) Stakeholder-based methods: These methods analyze the value to the sector supply chain of data availability: "This is a wider definition of value and may include value upstream or downstream omitted from other methods; it can encompass the non-rival aspect of data. The data consultancy Anmut is one that has developed this method and provides a case study of their valuation of Highways England data (Anmut n.d.). The challenge of this approach is that it requires professional judgment of value vs. auditable measurements of value.
(f) Real options analysis: Real options analysis provides a method for estimating the value of data by considering its potential future use cases rather than its current applications. This flexibility enables firms to capitalize on positive opportunities while minimizing downside risks. Data is considered non-rival, meaning its value does not diminish with use, and its potential applications can remain undefined at the time of collection. The option value represents the "right but not the obligation" to generate insights from data in the future, allowing firms to collect data for unknown future purposes. The value here is that this incentivizes waiting to determine data value in advance; by assessing the impact of more new information (policy changes, tech changes, shifts in consumer preferences) before deciding whether to analyze the data.
===============================
3. Discussion
Policymakers are stuck because there is no "consensus or best method for valuing data." The authors instead propose setting up a schema to classify methods and to validate them with external surveys. The schema goal is to determine:
- What is being valued?
- Who is valuing the data?
- When is the valuation taking place?
- What is the purpose of the valuation?
3.1 What is being valued?
There are a variety of things that could be referred to as ‘data.’ The possible distinctions are illustrated in the ‘data value chain,’ (see blog review of Coyle, 2020) which sets out different stages from the generation of raw data up to the decisions made using data insights generating the potential end-user value. The authors note that: "In general, the raw data is of least interest, and some of the literature goes as far as to state that raw data does not hold any value on its own...(as) even with cost-based methods, in many ways the most straightforward approach, it is almost impossible to distinguish between costs associated with raw data generation and database formation."
[Note: OrbMB's ORBintel method illuminates the costs and we forecast developing the means to establish the value of raw data that is to be collected, in advance of the need - Ed.]
-----------------------------
3.2 Who is Valuing the Data?
This section explores how the perspective of different stakeholders affects the methods used to value data (Data Producers, Private Sector Producers, Data Users, Data Hubs, Public Sector vs. Private Sector Valuation, Intangible Asset and Productivity).
The key point is that Data serves as an intangible asset that provides firms with a productivity advantage, especially when coupled with complementary skills. Firms that capture monopoly rents from data tend to value it higher than alternative users, leading to a divergence between private and social valuations of data.
Valuation varies significantly based on the perspectives of different stakeholders, with public sector approaches emphasizing societal value and the monopoly rent that is taxation, while private sector valuations focus on costs and impacts. Understanding these differing perspectives is crucial for accurately assessing data's overall worth.
-----------------------------
3.3: When is the Valuation Taking Place?
This section distinguishes between ex ante (before the event) and ex post (after the event) data valuation methods: Ex Ante Valuations are fraught with uncertainty, therefore risk and cost; therefore less widely used. Ex Post Valuations mitigate the uncertainties.
-----------------------------
3.4 What is the purpose of the valuation?
The authors contend that the purpose influences the choice of methodologies, with different approaches suited to different goals.
===============================
References:
(1) SNA is the United Nations (UN) framework setting “the internationally agreed standard set of recommendations on how to compile measures of economic activity.” The framework establishes consistent accounting rules and classifications, to make multistate comparison possible. The next update (2025) will include an update on data valuation.