Skip to content

fix(cvm): stop silently dropping BLC_2 holdings (stray quotes in CDA)#24

Merged
robertoecf merged 1 commit into
mainfrom
fix/cvm-holdings-blc2-quote-none
Jun 17, 2026
Merged

fix(cvm): stop silently dropping BLC_2 holdings (stray quotes in CDA)#24
robertoecf merged 1 commit into
mainfrom
fix/cvm-holdings-blc2-quote-none

Conversation

@robertoecf

Copy link
Copy Markdown
Owner

Bug (silent data loss)

findata cvm holdings silently dropped the entire BLC_2 block (cotas de fundos) for fund-of-funds (FIC) portfolios. The CLI printed a partial table that looked complete, hiding the bulk of the portfolio with no warning or error.

Concrete case — Verde FIC (22.187.946/0001-41), ref 2025-12: the output omitted its Verde Master position of R$ 850.657.058,68 (~99.8% of PL). Only BLC_1 + BLC_8 showed.

Root cause

CDA free-text fund names contain stray double-quotes. Python's default csv dialect treats " as a quote char: when one opens a field, the reader swallows delimiters and newlines until it finds the next ", merging several rows into one. The merged row's columns shift, CNPJ_FUNDO_CLASSE holds garbage, nothing matches — the whole block contributes zero rows.

Reproduced deterministically: a 3-row block with one stray opening quote collapses to 1 garbage row under the default dialect (2 rows silently lost); with QUOTE_NONE all 3 parse correctly.

Fix

Pass quoting=csv.QUOTE_NONE to the DictReader so every " is kept as a literal character (matching the known-good pandas reference sep=";", encoding="latin1", quoting=csv.QUOTE_NONE). The field-size limit already bumped at import covers the long free-text fields.

Verification

  • New regression test test_holdings_blc2_stray_quote_not_dropped: a BLC_2 fixture whose fund name has a stray quote — both positions survive, the stray quote is preserved literally (fails on main, passes here).
  • Live check against real 2025-12 CDA data: BLC_2 now present with Verde Master = R$ 850.657.058,68 to the centavo; totals reconcile (BLC_1 + BLC_2 + BLC_8 ≈ PL); the previously-correct BLC_1/BLC_8 numbers are unchanged.
  • ruff format/check, mypy src/findata, full pytest suite all green.

Note

Companion PR fixes an independent defect in the same reader (bare-digit CNPJ matching). The two touch the same CHANGELOG.md/test anchor, so whichever merges second will hit a trivial both-added conflict — happy to rebase this one once the other lands.

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@robertoecf, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 36 minutes and 54 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7975c016-7b6a-4c16-8eef-30f0ea678582

📥 Commits

Reviewing files that changed from the base of the PR and between 47c4d71 and 8514d27.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • src/findata/sources/cvm/holdings.py
  • tests/test_cvm_funds.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/cvm-holdings-blc2-quote-none

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug where cvm holdings silently dropped BLC_2 (cotas de fundos) data because stray double-quotes in free-text names caused the default CSV parser to swallow delimiters and drop rows. This is resolved by passing quoting=csv.QUOTE_NONE to csv.DictReader. Feedback on the changes suggests using io.TextIOWrapper instead of reading the entire file into memory with f.read().decode(...) to maintain the streaming design and avoid memory spikes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +157 to +161
reader = csv.DictReader(
io.StringIO(f.read().decode("iso-8859-1")),
delimiter=";",
quoting=csv.QUOTE_NONE,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The module docstring states that we line-stream every CSV inside the ZIP to avoid burning RAM and CPU. However, f.read().decode("iso-8859-1") reads and decodes the entire uncompressed CSV file into memory at once before wrapping it in io.StringIO.

For large CSV files (which can be tens of megabytes), this defeats the streaming design and can cause significant memory spikes.

We can achieve true line-streaming by wrapping the binary stream f in io.TextIOWrapper instead of reading the whole file into memory.

Suggested change
reader = csv.DictReader(
io.StringIO(f.read().decode("iso-8859-1")),
delimiter=";",
quoting=csv.QUOTE_NONE,
)
reader = csv.DictReader(
io.TextIOWrapper(f, encoding="iso-8859-1"),
delimiter=";",
quoting=csv.QUOTE_NONE,
)

CDA free-text fund names (especially BLC_2, cotas de fundos) contain stray
double-quotes. The default csv dialect treats `"` as a quote char and
swallows delimiters/newlines across rows, so whole blocks parse to garbage
and match nothing. The CLI then printed a partial table that looked
complete — for a fund-of-funds (FIC) this hid ~99.8% of the portfolio
(e.g. Verde FIC's R$ 850.657.058,68 Verde Master position vanished).

Pass `quoting=csv.QUOTE_NONE` so every `"` is kept as a literal char. The
field-size limit bumped at import already guards against the resulting
long fields.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@robertoecf robertoecf force-pushed the fix/cvm-holdings-blc2-quote-none branch from bf13e8b to 8514d27 Compare June 17, 2026 00:06
@robertoecf robertoecf merged commit f71768f into main Jun 17, 2026
7 checks passed
@robertoecf robertoecf deleted the fix/cvm-holdings-blc2-quote-none branch June 17, 2026 00:07
robertoecf added a commit that referenced this pull request Jun 26, 2026
CDA free-text fund names (especially BLC_2, cotas de fundos) contain stray
double-quotes. The default csv dialect treats `"` as a quote char and
swallows delimiters/newlines across rows, so whole blocks parse to garbage
and match nothing. The CLI then printed a partial table that looked
complete — for a fund-of-funds (FIC) this hid ~99.8% of the portfolio
(e.g. Verde FIC's R$ 850.657.058,68 Verde Master position vanished).

Pass `quoting=csv.QUOTE_NONE` so every `"` is kept as a literal char. The
field-size limit bumped at import already guards against the resulting
long fields.

Co-authored-by: Roberto <robertoecf@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant