A Canonical Form for PROV Documents
--- Dataset Underpinning Evaluation ---


This is the dataset related to the following publication.

Luc Moreau. A Canonical Form for PROV Documents and its Application to
Equality, Signature, and Validation. ACM Transactions on Internet
Technology. 2017. http://dx.doi.org/10.1145/3032990

    Abstract. We present a canonical form for PROV that is a
    normalized way of representing PROV documents as mathematical
    expressions.  As opposed to the normal form specified by the
    PROV-CONSTRAINTS recommendation, the canonical form we present is
    defined for all PROV documents, irrespective of their validity,
    and it can be serialized in a unique way.  The paper makes the
    case for a canonical form for PROV and its potential uses, namely:
    comparison of PROV documents in different formats, validation, and
    signature of PROV documents.  A signature of a PROV document
    allows the integrity and the author of provenance to be
    ascertained; since the signature is based on the canonical form,
    these checks are not tied to a particular encoding, but can be
    performed on any representation of PROV.


The data set contains the following files:

- bench100.csv, bench100_.csv: csv file containing data for figure 5
- bench100.pdf: figure 5
- bench100.txt: logfile produced by JMH benchmark

The provenance files used for evaluation (in prov-n notation):

- pc1-full.provn
- pc1-with-id1.provn
- pc1-with-id2.provn
- pc1-with-id4.provn
