Provenance

Posted in
Definitions

Definition

Provenance refers to the ability to trace and verify the origin, lineage, and creation history of a large language model (LLM).

This includes determining whether a model was derived from a specific foundational model (e.g., through fine-tuning or customisation) and validating its authenticity.

Key aspects include:

Lineage Tracking
  • Identifying if a target model (child) originates from a parent model, even after modifications like fine-tuning or architectural adjustments
  • Detecting similarities in output patterns or parameters that indicate a shared lineage
Verification Methods
  • Using statistical analysis (e.g., hypothesis testing) to compare model outputs against baselines from unrelated models
  • Leveraging black-box access to analyse prompts and token predictions, avoiding reliance on proprietary internal details
Authenticity Assurance
  • Ensuring compliance with licensing terms and detecting misuse of foundational models
  • Validating data inputs and transformations during model training to maintain trust in AI outputs

Summary

Provenance is critical for intellectual property protection, bias/vulnerability tracking, and maintaining accountability in AI systems.