W3C PROV Implementations: Preliminary Analysis

In the beginning of December 2012, the W3C Provenance Working Group issued a call for implementations. As of February the 25th 2013, 64 PROV implementations were reported to the W3C Provenance Working Group [1]. These implementations took different forms ranging from stand alone applications (30), to reusable frameworks and libraries (10), to services hosted by third parties (9), to vocabularies (21), and constraints validation modules (3). The objective of this blog post is to examine how PROV is being used. In particular, I will identify the prov concepts that are commonly used, and attempt to give an explanation of the figures obtained. In my analysis, I will focus on the first three components of PROV [1], viz., component 1 (Entities and Activities), component 2 (Derivations) and component 3 (Agents, Responsibility, and Influence.

The chart in Figure 1 summarizes the usage of concepts PROV in implementations of type Application, Framework/API and Service. In total, 40 implementations fall in those categories. The chart distinguishes between the consumption and the generation of a given concept by implementation. For each concept, the chart shows the number of implementation that consume instances of that concept, produce them, or both consume and produce them.

prov_implementation_1

Figure 1: Coverage of PROV concepts in implementations of type Application, Framework / API, or Service.

The analysis of this chart shows that the concepts in the three components Entities/Activities, Agents, Responsibilities, Influence, and Derivations are covered by the implementations. However, the frequency by which those concepts are covered varies. In particular, we observe that a large proportion of implementations supports (most of the) core concepts of PROV. PROV core concepts are illustrated in Figure 2. Specifically, the following core concepts: Entity, Activity, Agent, Usage and Generation, are supported by almost all implementations. Association and Derivation are supported by more than 3/4 of the implementations.

prov_core_concepts

Figure 2: PROV core concepts.

On the other hand, we observe that the core concepts of Attribution, Communication and Delegation are supported by less than half of the implementations. Specifically, 19 out of 40 implementations support Attribution, 14 support Delegation, and 12 support Communication. In the case of Attribution and Communication, one can argue that they are actually (indirectly) supported by most of implementations. This is because Attribution can be inferred using a chain of Generation and Association, which are supported by most of implementations. Similarly, Communication can be inferred using a chain of Generation and Usage, which are supported by most of implementations.

On the other hand, we observe that the number of implementations that support Plan, which is not part of the core concepts illustrated in Figure 2, is large. Half of the implementations support this concept. This can be explained by the fact that most implementers felt the need to link the provenance traces produced by their system to the recipe that was followed.
prov_implementation_2
Figure 3: Coverage of PROV by Vocabularies that use PROV

prov_implementation_3

Figure 4: Coverage of PROV by vocabularies that extends PROV

Figure 3 and 4 illustrates PROV concepts that are used and extended, respectively, by implementations of type vocabulary. In total, the working group received 8 vocabularies that use PROV concepts, and 13 vocabularies that extend them. The two charts confirms the observation made in the case of implementations of type applications, frameworks and Services. Most of PROV concepts seem to be used and extended by vocabularies. The frequency by which they are supported is different from one concept to another. In particular, (most of the) core concepts of PROV are supported by the majority of vocabularies.

It is worth underlining that  PROV is still in the process of being adopted. The existing implementations that we analyzed in this blog post show how PROV constructs create a firm foundation for provenance interoperability.

[1] http://www.w3.org/TR/prov-implementations
[2] http://www.w3.org/TR/prov-dm