Improving data markets: enabling diverse data pricing functions and assisting buyers to purchase data within their budgets

Liu, Mengya (2022) Improving data markets: enabling diverse data pricing functions and assisting buyers to purchase data within their budgets. University of Southampton, Doctoral Thesis, 328pp.

Record type: Thesis (Doctoral)

Abstract

Data has become a valuable commodity that is traded among data sellers and buyers on data markets. A data market cooperates with sellers to monetise their data while supporting the buyers to purchase data that meets their conditions. The existing studies of data markets offer good solutions to pricing buyers’ requirements of data in the form of queries. Despite being considered a critical aspect of the data business, however, very few studies have been looked at the technical problem of meeting the financial wishes of sellers and buyers, i.e. the problem of enabling sellers’ price data by diverse data pricing functions and assisting buyers to purchase data within their budgets. This thesis reviews the existing models of data markets and posits that the existing marketplace-centred models hinder both buyers and sellers by failing to consider their expectations to purchase data within budget and utility constraints and autonomously pricing the data requirements, respectively. From the perspective of sellers, pricing their data, or claiming revenue for their data contributions, is a major aspect of their business. Current marketplaces use various practices to protect sellers from arbitrage, but letting a marketplace decide the data prices and revenues of sellers is not flexible enough for the sellers to manage data and control revenues. On the other hand, buyers expect marketplaces to make decisions for their benefit. For instance, when query answers can be derived from multiple sellers, and the sellers sell duplicated data at different prices, a marketplace can decide to purchase the duplicated data at lower prices. If marketplaces make decisions for buyers’ benefits, data dealers will either end up with a decrease in the revenue of some sellers, or they will decrease the revenue of all sellers by splitting the payment equally. Both outcomes contravene sellers’ expectations of maximising revenue. In addition, current markets tend to ignore the above expectation from buyers and the constraints they face, such as budget and data preferences. We explore a method to remove these impediments and investigate the question of how a data market might enable sellers to set the data prices autonomously, while also assisting buyers to purchase data within their budgets. We analyse the problem from several angles, including the setup of a data market and the subsequent optimisations required in trading data. We introduce a federated data market, Data Emporium, which allows sellers to price their data independently. Then, we focus on the challenge of finding the best purchase for buyers’ money in Data Emporium: Given that sellers price their data sources using different methods, the ideal purchase is a subset of query answers, referred to as an allocation, that has maximum utility within the budget of buyers, especially when the price of the data depends on the number of data items returned or accessed in order to derive the query answer. We generalise the problem and explain its NP-hardness, and we especially study the new challenge in it when a class of pricing functions – access dependent pricing functions (ADPFs) – is used by sellers. ADPFs are a type of function that charge a different price for a set of data items than the total cost of the individual data items. Thereby, generating a set of data items to reduce the cost compared to their individual purchase, and thus the cost of the solutions derived from them becomes a new challenge. We introduce our two-step approach to solve the problem: a cost compression algorithm for minimising the cost of intermediate allocations, and heuristic algorithms, Greedy and 3DDP, to approximate the optimal allocation. We then present our experimental evaluation of their performance. Regarding the conflict that a marketplace faces when it tries to satisfy the expectations of both sellers and buyers, we propose a new data market architecture, Free Market, as a further solution to the above-mentioned problem. The motivation for the design of Free Market derives from the following considerations: (1) when sellers doubt the equivalence of their income and contribution; (2) when buyers lose their interest in being victims to high bills and non-customised query answers to their constraints; and (3) when a single seller or a marketplace fails to answer a query, sellers and buyers may not trust that the marketplace is fairness, and buyers face the challenge of purchasing data from multiple independent sellers or competing marketplaces. Therefore, we design a free market to let buyers and sellers exchange or share data directly without involving a third party and releasing the trust deposits. Distributed sellers autonomously manage and price their data and maintain a query executor for queries that can be as small as an individual trading their private data or as big as a union of data sources, such as an independent marketplace with its sellers. In the meantime, buyers have data requirements on the table and they keep the right to decide where their money goes and how to collect data. This design eases the arduous process of developing trust in data markets, as well as enabling sellers to receive data revenue from buyers immediately after providing data. For the technical challenge of trading data in a free market with an awareness of the diverse data pricing functions as well as the budget and utility constraints of purchases, we let a marketplace announce the catalogue of data sources and use a local query engine to match the requirements of buyers and supplies of sellers. The announcement from a marketplace informs buyers of the information about the available data sources in the market. It easily copes with the existing and leaving sellers. From the buyers’ perspective, the local query engine seeks a method to execute queries over a huge number of available data sources in a data market while reducing the cost of query answers. At the same time, it allows buyers to set up constraints for query processing and supports buyers to purchase the desired answers within those constraints. This thesis exhibits our local query engine. We first use a minimum spanning tree to model the problem and then adjust the classic greedy algorithm into two solutions, Gen-Greedy and Sum-Greedy. Experiments conducted with thirty random pricing settings demonstrate that the two solutions can save 66% on costs compared to the state-of-the-art market, CostFed. Moreover, we demonstrate an approximation solution to plan query execution with a budget constraint. This thesis provides a comparison of the general markets, data emporium and the free market with respect to their structures and services for sellers and buyers from the aspect of data management, trust, pricing, dependency, budget, efficiency, searching process and settlement.

Text

Final thesis - Version of Record

Available under License University of Southampton Thesis Licence.

Download (30MB)

Text

Permission to deposit thesis - Miya - Version of Record

Restricted to Repository staff only

Available under License University of Southampton Thesis Licence.