Provenance | Brainfood Training: AI for Non-Technical Minds

Definition

Provenance refers to the ability to trace and verify the origin, lineage, and creation history of a large language model (LLM).

This includes determining whether a model was derived from a specific foundational model (e.g., through fine-tuning or customisation) and validating its authenticity.

Key aspects include:

Lineage Tracking

Identifying if a target model (child) originates from a parent model, even after modifications like fine-tuning or architectural adjustments
Detecting similarities in output patterns or parameters that indicate a shared lineage

Verification Methods

Using statistical analysis (e.g., hypothesis testing) to compare model outputs against baselines from unrelated models
Leveraging black-box access to analyse prompts and token predictions, avoiding reliance on proprietary internal details

Authenticity Assurance

Ensuring compliance with licensing terms and detecting misuse of foundational models
Validating data inputs and transformations during model training to maintain trust in AI outputs

Summary

Provenance is critical for intellectual property protection, bias/vulnerability tracking, and maintaining accountability in AI systems.