load_or_validate_source.load_or_validate_source

load_or_validate_source.load_or_validate_source(
    dataframe=None,
    source=None,
    expected_min_cols=1,
    sample_size=2048,
)

Load a CSV from a path/URL or validate and clean a provided DataFrame.

Parameters

Name	Type	Description	Default
dataframe	Optional[pd.DataFrame]	An already-loaded DataFrame to validate and clean.	`None`
source	str	Path or URL to a CSV file. HTTP/HTTPS URLs and local filesystem paths are supported.	`None`
expected_min_cols	int	Minimum number of columns expected after loading (default: 2). Used to detect probable delimiter or corruption issues.	`1`
sample_size	int	Number of characters to sample from the source when sniffing the delimiter and detecting basic corruption (default: 2048).	`2048`

Name	Type	Description
	tuple[pandas.DataFrame, ChangeReport]	df : pandas.DataFrame Cleaned and validated DataFrame. Cleaning includes normalizing column headers (strip, whitespace -> underscore, replace illegal chars with underscores) and trimming string cells. report : ChangeReport Report of changes and metadata (detected delimiter, renamed columns mapping, counts of trimmed cells and illegal-char fixes, shape before/after).

Name	Type	Description
	TypeError	If `source` is neither a string nor a pandas.DataFrame.
	DataLoadError	On I/O or parsing failures and validation errors, including: - unable to read/download source - inconsistent column counts in sample (possible corruption) - first row looks like data instead of header - pandas failed to parse CSV - resulting DataFrame is empty or has fewer than `expected_min_cols`

>>> df, rpt = load_or_validate_source(source="data.csv")
>>> df, rpt = load_or_validate_source(dataframe=existing_df)