validate_categorical_schema.validate_categorical_schema

validate_categorical_schema.validate_categorical_schema(
    df,
    column,
    allowed_categories,
)

Validate that a categorical column conforms to a predefined allowed-value schema.

This function checks whether all non-missing values in df[column] are contained in allowed_categories. Missing values (NaN/None) are ignored. Values not in allowed_categories are reported in invalid_records.

Parameters

Name Type Description Default
df pandas.DataFrame The DataFrame containing the categorical column. required
column str Name of the categorical column to validate. required
allowed_categories Sequence An iterable of allowed category values (e.g., list, set, tuple). required

Returns

Name Type Description
dict A validation summary containing: status : {‘pass’, ‘fail’} Overall validation status. invalid_records : pandas.DataFrame A DataFrame with columns [‘index’, ‘column’, ‘raw_value’].

Examples

>>> df = pd.DataFrame({"color": ["red", "green", None]})
>>> validate_categorical_schema(
...     df,
...     column="color",
...     allowed_categories=["red", "blue"]
... )