Skill v1.0.2
currentLLM-judged scan95/1001 files
name: spreadsheet-struct-verified description: Create and modify Excel workbooks with mandatory deterministic Python verification of workbook structure, required sheets, populated columns, and sample counts before reporting completion.
Spreadsheet Structure Verified
Use this workflow whenever you need to create or modify an Excel workbook where the output must match an expected structure. The goal is to avoid delivering a workbook that does not satisfy the user's required tabs, columns, or counts by using deterministic Python verification.
Important: This workflow has two distinct modes:
| Task Type | Pre-edit Audit | Post-write Verification | |
|---|---|---|---|
| Modify existing workbook | Full audit required (Phases 1-2) | Full verification required (Phase 4) | |
| Create new workbook from source/reference data | Skip pre-edit audit (no existing file) | Deterministic Python verification REQUIRED (Phase 4) |
This is a workflow skill, not a tool-specific recipe. Apply it with any spreadsheet-capable tooling, but Phase 4 verification uses a deterministic Python script.
When to use
Use this workflow when:
- You are creating a new workbook from source/reference data
- You are modifying an existing workbook AND the workbook path is uncertain
- Multiple similar
.xlsxor.xlsmfiles may exist - The user specifies required sheet names, row counts, sample counts, or populated columns
- A silent structure mismatch would cause downstream errors
- Systematic post-write verification is required before reporting completion
Before proceeding, determine which workflow mode applies:
Mode A: Modify existing workbook - Use the full workflow (all phases) when there is an existing workbook file that must be updated in place, the user references an existing file, or the task requires preserving existing data/formulas/formatting.
Mode B: Create new workbook from source/reference data - Adapt Phase 1-2 to audit SOURCE reference files instead of targeting a non-existent workbook. Phase 1 locates and inspects source data files; Phase 2 audits their structure (sheet names, columns, data types, row counts, merged cells, formulas). Skip target workbook pre-edit audit (it doesn't exist yet), but DO audit the source/reference files thoroughly before creating. Phase 4 (deterministic Python verification) is MANDATORY for create mode.
Core principle
For modify-existing mode: Never write first. Verify before editing.
For create-new mode: You must create the file, but verify thoroughly after creation using a deterministic Python script.
Always perform these phases in order:
- Find candidate workbook files (modify mode only)
- Verify workbook structure before editing (modify mode only)
- Perform the edit or create
- Deterministic Python verification (ALL modes - MANDATORY)
If any check fails, stop and resolve the mismatch before continuing.
Workspace-path anchoring rule
All spreadsheet operations must be anchored to the exact workspace path provided for the task.
- For modify-existing mode: Discovery, inspection, editing, and verification must use the anchored path
- For create-new mode: The output path must be under the anchored workspace; verify the created file from that path
- Treat the provided workspace path as the authoritative root for all file operations
- Resolve the selected workbook to an exact path under that workspace before editing
- Do not switch to a different directory, fallback path, temp copy, or similarly named workbook unless the user explicitly authorizes it
- If a delegated tool reports a workbook path, independently verify that exact path from the current workspace before trusting the result
- If the workbook cannot be found at the anchored workspace path, stop rather than guessing
Workflow
Create mode: In create mode, Phase 1 focuses on locating SOURCE reference data files rather than a target workbook. Audit these source files to understand their structure before creating the new workbook.
Phase 1: Locate candidate Excel files (Modify mode only)
Search the filesystem broadly enough to find plausible workbooks, then narrow to the file most likely intended by the user.
What to look for
Search for:
.xlsx.xlsm.xls- files whose names resemble the user request
- recently modified files
- files inside likely project/output/data directories
- In create mode: source data files, reference reports, or input spreadsheets that will populate the new workbook
Selection heuristics
Prefer files that:
- Match names mentioned by the user
- Are in the working directory or attached-data area
- Have expected modification dates
- Contain sheet names matching the task
- Have row/column structure consistent with the requested operation
If several files are plausible, inspect them all before choosing.
Minimum evidence before selecting a file
Before deciding "this is the workbook to edit", confirm at least:
- exact path
- workbook sheet names
- approximate row counts in relevant sheets
- whether the file is readable and not obviously corrupt
- In create mode: column names, data types in source sheets, and presence of merged cells or formulas that may affect data extraction
The exact path must be recorded as a workspace-anchored path, not a vague filename.
Create mode: In create mode, Phase 2 audits SOURCE reference files to understand their structure before creating the new workbook. This prevents calculation errors, missing data, or structural mismatches.
Phase 2: Pre-edit audit (Modify mode only)
Before making any edits, inspect the workbook and compare it to the user specification.
Build a checklist from the user request
Extract the following if present:
- required tab names
- tabs to create, preserve, or modify
- expected columns
- key identifiers
- required row counts or minimum rows
- sample size or number of marked rows
- user-mandated sampling criteria that must appear in the selected sample
- whether task instructions permit a fallback when a required criterion is unavailable in the source population
- formulas, summaries, or computed fields
- whether formatting/macros must be preserved
Turn these into explicit checks.
Modify mode: Verify the target workbook structure before editing.
Create mode: Audit source/reference data files to document their structure (sheet names, column names, data types, row counts, merged cells, formulas, data quality issues). Use this audit to plan the target workbook structure.
Verify workbook structure
For each relevant workbook:
- List all sheet names
- Identify the target sheet(s)
- Count rows in each relevant sheet
- Inspect header row values
- Confirm key columns exist
- Note any merged cells, formulas, filters, tables, or protected sheets that may affect edits
Create mode source audit checklist:
- List all sheet names in each source file
- Identify which sheets contain the data needed for the new workbook
- Count rows in each relevant source sheet
- Inspect header row values and record exact column names (case-sensitive)
- Confirm key columns exist and note their data types (text, number, date, etc.)
- Critical: Note any merged cells, formulas, filters, tables, or hidden rows that may affect data extraction
- Check for blank cells, inconsistent formatting, or data quality issues in source columns
- Document the mapping from source columns to target columns in the new workbook
Workbook identity consistency check
Before any write, confirm that repeated inspections agree on workbook identity and structure.
At minimum, compare across inspections:
- exact workbook path
- sheet-name list
- relevant row counts
- relevant sheet dimensions or used ranges when available
- header values for each target sheet
If these differ across runs or tools in any unexplained way, treat the workbook identity as unconfirmed and do not edit until the discrepancy is resolved.
Pre-edit go/no-go decision
Proceed only if all of the following are true:
- The workbook is the correct one
- The target sheet names match or can be mapped unambiguously
- The required columns are present or can be created safely
- The row counts are plausible for the requested operation
- There is no unresolved ambiguity about where edits belong
If not, stop and report the discrepancy.
Phase 3: Perform the edit or create
After the workbook passes the pre-edit audit (modify mode) or when creating new (create mode):
- Preserve untouched sheets and workbook structure (modify mode)
- Avoid renaming sheets unless requested
- Write only to the intended workbook and target tabs
- Preserve formulas/macros/formatting when the task requires it
- Keep a clear mapping between user requirements and the cells/rows being updated
- In create mode: Document the planned target structure (sheet names, column names, expected row counts, formulas) based on source data audit before writing
During the edit, record what changed:
- workbook path
- edited sheets
- rows added/updated
- columns populated
- formulas inserted
- samples marked or flags applied
For create-new mode: Document the intended structure (sheet names, columns, expected row counts) before writing, so Phase 4 can verify against this specification. This documentation must be based on the source data audit from Phase 2.
Phase 4: Deterministic Python Verification (ALL modes - MANDATORY)
After saving, create and execute a deterministic Python verification script that independently validates the saved workbook against the original specification.
This phase is MANDATORY for all modes and must use a direct Python script with openpyxl (or equivalent) for deterministic verification.
Why deterministic Python verification
- Eliminates ambiguity from tool summaries or delegated-agent claims
- Provides reproducible, structured verification output
- Catches structural errors before reporting completion
- Creates an auditable verification record
Create verification script
Write a Python script that:
- Opens the saved workbook from the exact anchored workspace path
- Verifies all required sheet names exist with exact spelling
- Verifies required columns exist in each relevant sheet
- Counts rows and validates against expected counts
- Checks sample-mark counts match requested counts
- Validates that every user-mandated sampling criterion is represented
- Spot-checks content quality (non-blank values in required columns)
- Reports structured pass/fail results for each check
Verification script template
#!/usr/bin/env python3"""Deterministic workbook structure verification script."""from openpyxl import load_workbookimport sysfrom pathlib import Pathdef verify_workbook(workbook_path, specification):"""Verify workbook structure against specification.Args:workbook_path: Exact path to the saved workbookspecification: Dict with required_sheets, required_columns,expected_row_counts, sample_criteria, etc.Returns:dict with verification_results, passed (bool), errors (list)"""results = {'passed': True,'errors': [],'warnings': [],'details': {}}# Load workbooktry:wb = load_workbook(workbook_path, data_only=True)results['details']['workbook_loaded'] = Trueexcept Exception as e:results['passed'] = Falseresults['errors'].append(f"Failed to load workbook: {e}")return results# Verify required sheetsactual_sheets = wb.sheetnamesrequired_sheets = specification.get('required_sheets', [])for sheet_name in required_sheets:if sheet_name in actual_sheets:results['details'][f'sheet_{sheet_name}'] = 'present'else:results['passed'] = Falseresults['errors'].append(f"Missing required sheet: {sheet_name}")results['details'][f'sheet_{sheet_name}'] = 'missing'# Verify each required sheetfor sheet_name in required_sheets:if sheet_name not in actual_sheets:continuews = wb[sheet_name]# Get headers from first rowheaders = []if ws.max_row >= 1:for col in range(1, ws.max_column + 1):cell_value = ws.cell(row=1, column=col).valueheaders.append(str(cell_value).strip() if cell_value else '')results['details'][f'{sheet_name}_headers'] = headers# Verify required columns for this sheetsheet_required_cols = specification.get('required_columns', {}).get(sheet_name, [])for col_name in sheet_required_cols:if col_name in headers:results['details'][f'{sheet_name}_col_{col_name}'] = 'present'else:results['passed'] = Falseresults['errors'].append(f"Missing required column '{col_name}' in sheet '{sheet_name}'")results['details'][f'{sheet_name}_col_{col_name}'] = 'missing'# Count data rows (excluding header)data_row_count = max(0, ws.max_row - 1) if ws.max_row > 1 else 0results['details'][f'{sheet_name}_row_count'] = data_row_count# Verify expected row countsexpected_count = specification.get('expected_row_counts', {}).get(sheet_name)if expected_count is not None:if data_row_count < expected_count:results['passed'] = Falseresults['errors'].append(f"Sheet '{sheet_name}' has {data_row_count} rows, expected at least {expected_count}")# Verify populated columns (spot check)for col_name in sheet_required_cols:if col_name not in headers:continuecol_idx = headers.index(col_name) + 1# Check first few data rows are populatedpopulated_count = 0for row in range(2, min(ws.max_row + 1, 11)):cell_value = ws.cell(row=row, column=col_idx).valueif cell_value is not None and str(cell_value).strip() != '':populated_count += 1if populated_count == 0 and data_row_count > 0:results['warnings'].append(f"Column '{col_name}' in sheet '{sheet_name}' appears empty")# Verify sample counts if applicablesample_mark_col = specification.get('sample_mark_column')sample_mark_value = specification.get('sample_mark_value', 'Yes')expected_sample_count = specification.get('expected_sample_count')if sample_mark_col and expected_sample_count is not None:for sheet_name in required_sheets:if sheet_name not in actual_sheets:continuews = wb[sheet_name]headers = []if ws.max_row >= 1:for col in range(1, ws.max_column + 1):cell_value = ws.cell(row=1, column=col).valueheaders.append(str(cell_value).strip() if cell_value else '')if sample_mark_col not in headers:continuecol_idx = headers.index(sample_mark_col) + 1marked_count = 0for row in range(2, ws.max_row + 1):cell_value = ws.cell(row=row, column=col_idx).valueif cell_value is not None and str(cell_value).strip() == sample_mark_value:marked_count += 1results['details'][f'{sheet_name}_marked_count'] = marked_countif marked_count != expected_sample_count:results['passed'] = Falseresults['errors'].append(f"Expected {expected_sample_count} marked rows in '{sheet_name}', found {marked_count}")# Verify sampling criteria if applicablesampling_criteria = specification.get('sampling_criteria', [])if sampling_criteria:# Check that each criterion is represented in marked rows# This requires checking the actual marked row contentfor criterion in sampling_criteria:criterion_found = Falsefor sheet_name in required_sheets:if sheet_name not in actual_sheets:continuews = wb[sheet_name]# Simplified check - look for criterion value in marked rows# Full implementation would check specific columnsfor row in range(2, ws.max_row + 1):for col in range(1, ws.max_column + 1):cell_value = ws.cell(row=row, column=col).valueif cell_value is not None and str(cell_value).strip() == criterion:criterion_found = Truebreakif criterion_found:breakif criterion_found:breakresults['details'][f'criterion_{criterion}'] = 'found' if criterion_found else 'missing'if not criterion_found:# Check if fallback is permittedif not specification.get('criterion_fallback_permitted', False):results['passed'] = Falseresults['errors'].append(f"Sampling criterion '{criterion}' not found in marked rows")else:results['warnings'].append(f"Sampling criterion '{criterion}' not found (fallback permitted)")return resultsdef main():if len(sys.argv) < 3:print("Usage: verify_workbook.py <workbook_path> <spec_json>")sys.exit(1)workbook_path = sys.argv[1]spec_json = sys.argv[2]import jsonspecification = json.loads(spec_json)results = verify_workbook(workbook_path, specification)# Print structured resultsprint("=" * 60)print("WORKBOOK VERIFICATION RESULTS")print("=" * 60)print(f"Workbook: {workbook_path}")print(f"Status: {'PASSED' if results['passed'] else 'FAILED'}")print()if results['errors']:print("ERRORS:")for err in results['errors']:print(f" ✗ {err}")print()if results['warnings']:print("WARNINGS:")for warn in results['warnings']:print(f" ⚠ {warn}")print()print("DETAILS:")for key, value in results['details'].items():print(f" {key}: {value}")print("=" * 60)sys.exit(0 if results['passed'] else 1)if __name__ == '__main__':main()
Execute verification
After creating the output workbook:
- Define the verification specification as a JSON object with:
required_sheets: List of sheet names that must existrequired_columns: Dict mapping sheet names to lists of required column headersexpected_row_counts: Dict mapping sheet names to minimum expected row countsexpected_sample_count: Number of rows that should be marked/selectedsample_mark_column: Name of the column used to mark selected rowssample_mark_value: Value indicating a row is marked (e.g., "Yes", "TRUE", 1)sampling_criteria: List of values that must appear in marked rowscriterion_fallback_permitted: Boolean - whether missing criteria are acceptable
- Write the verification script to a file in the workspace
- Execute the script using execute_code_sandbox or run_shell:
``bash python3 verify_workbook.py /path/to/output.xlsx '<spec_json>' ``
- Interpret results:
- If
Status: PASSEDwith no errors: Proceed to report completion - If
Status: FAILEDwith errors: Review errors, fix the workbook, re-verify - If script execution fails: Debug the script or try alternative verification
Verification go/no-go decision
Only report completion if:
- The verification script executes successfully
- The verification returns
Status: PASSED - All required sheets are present with exact names
- All required columns are populated (not all blank)
- Row counts meet specifications
- Sample counts match expectations
- All sampling criteria are represented (or fallback is explicitly permitted)
If verification fails:
- Review the specific errors reported
- Fix the identified issues in the workbook
- Re-run the verification script
- Only report completion after verification passes
Independent verification requirement
Post-write verification must include an independent direct file read of the saved workbook from the anchored workspace path.
- Do not rely only on delegated-agent summaries, success messages, or memory of what was written
- The Python verification script performs an independent verification by directly loading the workbook from disk
- If the verification script fails to load or verify the workbook, do not declare completion until the issue is resolved
- If a delegated tool performed the edit, the verification script separately confirms the resulting file
Minimum verification checklist
At minimum, the verification script must validate:
- [ ] The expected workbook file exists at the output path
- [ ] Required tab names are present with exact spelling
- [ ] Required columns exist in each relevant sheet
- [ ] Required columns are populated (not all blank/null)
- [ ] Relevant sheets contain expected row counts
- [ ] Sample-mark counts match the requested count (if applicable)
- [ ] Every user-mandated sampling criterion is represented (if applicable)
- [ ] No required source sheets disappeared during the write
Error handling
If verification script fails to execute
- Check that openpyxl is available (
pip install openpyxlif needed) - Verify the workbook path is correct and accessible
- Try running the script with explicit error output
- If Python execution is unavailable, use an alternative method but document the limitation
If verification finds errors
- Document each error clearly
- Determine the root cause (wrong sheet name? missing column? wrong count?)
- Fix the workbook
- Re-run verification
- Iterate until verification passes
If workbook cannot be verified
If the verification script cannot load or inspect the workbook:
- Confirm the file exists at the expected path
- Check file permissions
- Verify the file is a valid Excel workbook (not corrupted)
- Try opening with alternative methods
- If still failing, report the issue and do not claim completion
Completion criteria
A task using this workflow is complete only when:
- The workbook has been created or modified as requested
- The deterministic Python verification script has been executed
- Verification returns
Status: PASSEDwith no errors - Any warnings have been reviewed and deemed acceptable
- The verified workbook is at the correct anchored workspace path
Never report completion without successful Phase 4 verification.