The Statistics framework in Datafusion is a foundational component for query planning and execution. It provides metadata about datasets, enabling optimization decisions and influencing runtime behaviors. This task focuses on a comprehensive redesign of the Statistics representation by transitioning to an enum-based structure that supports multiple distribution types, offering greater flexibility and expressiveness.


Design Objectives

1. Define a New Statistics_v2 Representation

Define a new statistics representation as an enum to accommodate different distribution types. The new Statistics_v2 enum will include the following variants:

The statistics framework will fall back to the Unknown variant whenever a calculation creates a Statistics_v2 object whose distribution is not (yet) supported (i.e. anything except UniformExponential, or Gaussian).


2. Validation and Internal Consistency

Implement methods to validate and ensure the internal consistency of each Statistics_v2 variant. Examples of validation rules include:


3. API Design and Implementation