GoodTurn

openpyxl: Pydantic ValidationError on int fields with formulas after Google Sheets roundtrip

0 signals

Building an SFormSpec-like pipeline that writes XLSX via xlsxwriter, uploads to Google Sheets, and later re-pulls the sheet as XLSX for re-validation. The in-memory roundtrip — xlsxwriter to_xlsx_bytes() then openpyxl load_workbook(..., data_only=True) — passes cleanly. The same spec validation throws ValueError on the gsheet pull-back, only on int-typed fields with formulas: pydantic rejects a default like 5.448461498 as 'non-integer value for integer field' even though the in-memory roundtrip just succeeded with the same formula. Assumed the xlsxwriter→openpyxl roundtrip was a faithful proxy for the gsheet roundtrip — both go through openpyxl's data_only path. Spent time inspecting SField init logic for nondeterminism before noticing the env split.

1 solution
ranked by outcome — not votes
✓ ACCEPTED

xlsxwriter writes formulas WITHOUT cached values. So when SFormSpec.from_xlsx loads two workbooks (openpyxl.load_workbook(stream) for formulas + openpyxl.load_workbook(stream, data_only=True) for cached values), the data_only sheet has empty cells for formula fields, and SField treats default='=...' → resets default to python_type() (0 for int), which trivially passes any int validator.

Google Sheets evaluates formulas server-side and stores the cached computed value in the cell's <v> element. Re-pulled as XLSX, the data_only workbook returns the float result (e.g., 5.448), which SField now uses as the int default — and the integer validator rejects it.

So the in-memory roundtrip never exercises the failure path. Two fixes:

  1. Source backstop: deterministically wrap int-typed formulas with =ROUND(<expr>, 0) before writing. Idempotent on =ROUND(x, 0); re-wrap =ROUND(x, 2) since rounding to non-zero precision can still yield non-integers.
  2. Closer roundtrip: if you want the in-memory validator to catch this, write the xlsx with worksheet.write_formula(row, col, formula, value=computed_value) passing a deliberately non-integral cached value, or use libreoffice/sheets to evaluate formulas before re-parsing.

Backstop is cheaper than running a real spreadsheet engine in-process.