A Review of Data Valuation Approaches - Fleckenstein, Obaidi & Tryfona (2023)

Comment: A valuation approach guided by application of professional best judgment, guided by the absence of a repeatable, scalable standard to measure the value of data.

Fleckenstein, M., Obaidi, A., & Tryfona, N. A Review of Data Valuation Approaches and Building and Scoring a Data Valuation Model, Harvard Data Science Review, 5(1). https://doi.org/10.1162/99608f92.c18db966 & https://hdsr.mitpress.mit.edu/pub/1qxkrnig/release/1 MITRE Corporation: Approved for Public Release. Distribution Unlimited. Public Release Case Number: 21-3464.

The authors report that there is increasing desire to treat data as an asset "in both the private and public sectors...However, this remains a challenge, as data is an intangible asset. Today, there is no standard to measure the value of data." Much like Azcoitia, it is this team's view that there is no: "repeatable approach to data valuation:" The use case will define selection of the methods that are used to determine value.

  • Part (1) Introduces current practice    
  • Part (2) Reports classification of data valuation models
  • Part (3) Reports an assessment of the model classes
  • Part (4) Reports the results of test case analysis
  • Part (5) presents Conclusions and References.

Part (I): Discusses three overlapping approaches to valuation: Business (P&L), Public Goods (Government/Non-profit), Dimensional (Attributes of Value).

Part (2): Data Valuation Framework & Part 3: Model Details: The authors studied different methods, spanning more than 40years; thence grouping the methods into three classes:  

1) Market-based models (estimates of cost and revenue): "The market-based model values data based on income (e.g., selling data), cost (e.g., buying data), and/or stock value (e.g., value of data-intensive organizations). Organizations routinely buy and sell data and data-intensive companies."

2) Economic models (estimates of economic and public benefit): "The economic model values data in terms of its economic impact. This model is frequently used by governments to assess the value of publicizing data. For example, governments share weather data, which helps sustain an ecosystem of weather forecasting."

3) Dimensional models (using categories or dimensions): "The dimensional model values data by assessing attributes inherent to a data set (e.g., data volume, variety, and quality) as well as the context in which data is used (e.g., how the data will be used and integrated with other data). For example, organizations inherently decide to acquire, keep, or prioritize one of several similar but different data sets. To date, this is an informal process."

The researchers note that the models are not fit-for-purpose for all use cases, are speculative, can overlap, and can be influenced by factors other than the data itself. Figure 1 is a Venn Diagram that helpfully diagrams the overlap of the data classes that are included in the taxonomy. 

Part (3): The authors review the strengths and weaknesses of each class of model.

    1) Market-based: Sec.2.3 & 3.1
    2) Economic: Sec.2.4 & 3.2
    3) Dimensional: Sec.2.5 & 3.3

The authors note there is no single method to deliver a standard valuation. The type must be selected to fit the use case. The value-add of the approach is that many methods may have to be used in a "framework, with each use case leveraging one or more models" to build a multi-dimensional estimate.

Part (4): Building and Scoring a Dimensional Data Valuation Model: Here is detailed the building and scoring of a dimensional data valuation model. The model was used to define dimensions to assess the value of two use cases. The goal "was to design an easy-to-use, customizable approach that helps organizations assess the value of specific data sets for specific use cases using a small, consistent set of dimensions." This method uses "professional data management experience;" and "in the use case of "flight scheduling and navigation data, we vetted the results with the data set owners."

Part (5): Conclusions and References: The authors' report developing "an easy-to-use, repeatable model to value data for two use-cases;", where the model combines (a) dimensional analysis; (b) "professional data management experience;" and (c) for one of the two use cases, review of results by the data owners. 

The authors conclude that the dimensional approach "can be used effectively to compare two similar data sets or to evaluate the addition of a data set to an existing data pool...(but) falls short of being able to value data in monetary terms;" and will likely require use of the other model types to fully develop a valuation.

References included.

Notes and analysis by blogger. Image: Pxhere. CC0.

Labels: Dimensional;Economic;Market-based;Valuation;Data quality


Navigating the Pricing Conundrum - Azcoitia, Iordanou, and Laoutaris (2021)

Comment: Recommended. An independent analysis of pricing schema for data products and services.

(2021) S. A. Azcoitia, C. Iordanou, and N. Laoutaris, "Measuring the Price of Data in Commercial Data Marketplaces," in DE '22: Proceedings of the 1st International Workshop on Data Economy, December 2022, 1–7, https://doi.org/10.1145/3565011.3569053 (Accessed Q12024).

The work became a pre-print for: S. A. Azcoitia, C. Iordanou and N. Laoutaris, "Understanding the Price of Data in Commercial Data Marketplaces," 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 2023, pp. 3718-3728, doi: 10.1109/ICDE55515.2023.00300.

The Azcoitia, Iordanou, and Laoutaris study (2023) reported earlier is a detailed exploration into the complexities of figuring out how much data is worth in the ever-changing world of commercial data marketplaces. The research delves into the journey of data from creation to trade. This study examines the pricing structures of data products, services and marketplaces.
  • Part (1) introduces the data marketplace ("DM").
  • Part (2) & (3) reports the mechanisms of pricing of data products.
  • Part (4) deep dives into the marketplace for AWS data.
  • Part (5) compares data products across marketplaces.
  • Part (6) dives into the features driving pricing.
  • Part (7) & (8) presents related works, conclusions and future work plans.

Part (1) introduces the problem of developing "Data-driven decision making powered by ML algorithms;" in this study using two data marketplaces to develop a transfer pricing study mechanism.

Part (2) reports the mechanisms of pricing of 10,772 data products, ranging from one-off purchases to telecoms, manufacturing, and gaming data. The researchers interestingly note that a majority of data products are pricing from "direct negotiation between the seller and interested buyers." 

It is very much the picture of an age-old story. Bartering to determine value on the spot.

Part (3) & (4) reports the details of the market for data products and services; using the AWS ecosystem to explore a marketplace. Part 2.1 dives into the current vendor trifecta: Data Providers, Data marketplaces, and Personal Information Management Systems. The section discusses the entities uncovered by the team; and drills down into the characteristics of a sample of each vendor class. The section charts market share of each class. The geographic spread demonstrates US dominance of available data. Of these, "4,162 products from 443 distinct providers provided clear information about their prices" which led to the assessment that the median price is US$1,417 per month. Pricing ranges from free to $500,000; with "one-third of all data products, including targeted market data and reports for example,...sold for US$2,000-5,000 per month."

Part (5) & (6)
continues the exploration of data marketplaces; by developing a pricing concordance and methodology "to build different classifiers to help us compare data products between the two DMs including more price references, namely DataRade (destination DM) and AWS (source DM)." 

The authors used the classification schema to compare pricing distribution of two categories-‘Financial’ and ‘Retail, Location and Marketing’ data products; from this work concluding that "it is mostly ‘what´, as captured in product description and categories, and ‘how much´ data is being traded that determine the prices of data products." 

Therefore, it is data quality evaluation and price--the value determined by the user determining what they want--that determines the value--therefore price--of data products.

Part (7) & (8) concludes with the note that this analysis "is, to the best of our knowledge, the first empirical measurement study that deals with the prices of data products sold in commercial data marketplaces." Further, that "the lack of empirical data around dataset prices is considered as a key challenge in data pricing research."

References section: Includes discussion of the methodology created to construct the analysis. 

Notes and analysis re-written with the assistance of a paid ChatGPT account. Image: Pxhere. Public Domain.

Labels: Costs;Biological system modeling;Ecosystems;Pricing;Data engineering;Data models;Telecommunications;Data economy;data marketplaces;measurement;data pricing.

Navigating the Value Conundrum - Azcoitia (2023)

The next several blog posts will discuss the current state of the art of data valuation.

Azcoitia, S.A., Towards a Human-Centric Data Economy, Ph.D dissertation; Telematics Engineering, Universidad Carlos III de Madrid https://doi.org/10.48550/arXiv.2111.04427 and https://sandresazcoitia.com/2023/04/24/towards-a-human-centric-data-economy/







Unravelling the Complex Web of Data Valuation: 

In the ever-evolving landscape of data-driven economies, understanding the true value of data has become a paramount challenge. This 2023 study is a comprehensive exploration that delves into the intricacies that arise when determining the worth of data assets; whilst navigating the dynamic realm of commercial data marketplaces. The study sheds light on the data value chain, and the nuances of trading data assets through the internet, through a meticulous examination of market dynamics.

  • Part (1) explores the data value chain and the trading of data assets.
  • Part (2) reports development and execution of a measurement study.
  • Part (3) reports development of novel algorithms and methods to streamline data transactions.
  • Part (4) concludes, presents new research topics, and proposes policy changes.

Part 1: (p.3) Begins by dissecting the data value chain and delving into the trading mechanisms of data assets facilitated by the internet. A detailed survey and analysis of commercial data marketplaces and vendor strategies makes up this section. The subsequent market review, starting on page 29, is highly recommended for a comprehensive understanding of the current landscape.

Part 2: Reports development and execution of a measurement study that estimates the value of data and the setting the price of a dataset; and uses this to analyse the behaviour of market data prices. This activity is aimed at estimating value to establish a pricing framework. 

Part 3: Azcoitia reports development of a framework of “algorithms and tools to reduce the complexity”, improve market efficiency, and improve buyer profitability. The framework is used to analyze the behaviour of market data prices; providing valuable insights into the dynamics of this complex ecosystem. 

Part 4: Discusses the findings and includes proposals for policy changes arising from the author's observation that:

  • The elusive nature of data as an economic good has proven to be a central problem (pages 12-16). 
  • Estimating value has proven to be challenging, leading to seemingly contradictory estimations. 
  • The quality of data emerges as a crucial production factor, comparable to traditional factors such as land, capital, labour, or infrastructure.

The author emphasizes the critical need for a nuanced approach to assess the true worth of data in economic terms. Intricacies of pricing strategies are unravelled in pages 17-18, providing a comprehensive review of various approaches such as usage-based, subscription-based, and package-pricing.

In conclusion, the study examines the complexities that are part of estimating value and proposes algorithms and tools to reduce this complexity, enhance market efficiency, and improve buyer profitability. The author advocates for policy changes to foster a more transparent and adaptive data marketplace. 

As industries grapple with the evolving data landscape, this study serves as a crucial guide for navigating the intricacies of today's data markets.

- Summary notes re-written with the assistance of a paid ChatGPT account.

A Review of Data Valuation Approaches - Fleckenstein, Obaidi & Tryfona (2023)

Comment: A valuation approach guided by application of professional best judgment, guided by the absence of a repeatable, scalable standard ...