Skip to content

Add paper-review skill for academic paper methodology extraction#134

Merged
igerber merged 4 commits intomainfrom
paper-review-skill
Feb 8, 2026
Merged

Add paper-review skill for academic paper methodology extraction#134
igerber merged 4 commits intomainfrom
paper-review-skill

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Feb 8, 2026

Summary

  • Add /paper-review skill (.claude/commands/paper-review.md) that reads academic paper PDFs and produces structured methodology documentation
  • Multi-phase architecture: scout agent (haiku) for paper structure detection, parallel extraction agents for long papers (>20 pages), synthesis agent for combining results
  • Short-paper fast path reads entire paper in main context for papers ≤20 content pages
  • Output is a Methodology Registry entry at docs/methodology/papers/{paper-name}-review.md
  • Add .claude/paper-review/ to .gitignore as safety net for interrupted runs
  • Add docs/methodology/papers/ documentation entry to CLAUDE.md

Methodology references (required if estimator / math changes)

  • N/A - no methodology changes, this is a developer tooling skill

Validation

  • No test changes (skill is a markdown instruction file, not code)
  • Manual testing applicable: short paper fast path, long paper multi-agent pipeline, invalid PDF handling, --confirm flag pauses, existing output collision

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Multi-phase skill that reads academic paper PDFs and produces structured
methodology documentation (equations, algorithms, SEs, edge cases) in
the project's Methodology Registry format. Uses scout agent for paper
structure detection, parallel extraction agents for long papers, and
a synthesis agent to combine results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Feb 8, 2026

Overall assessment: ⚠️ Needs changes

Executive summary

  • No estimator/math changes in this PR; methodology registry content is not directly modified.
  • P1 edge case: scout page count caps at 120 pages, which can truncate long papers and miss assumptions/SE details.
  • P2 edge case: references detection only scans the last 8 pages, so long references/appendices can be misclassified as content.
  • P2 methodology/docs mismatch: “direct inclusion” template doesn’t match the actual REGISTRY heading level/labels, risking inconsistent registry entries.
  • No security concerns observed.

Methodology

  • P2 | Impact: The template claims it is “formatted for direct inclusion” in the Methodology Registry but uses ### headings and label variants that don’t match the registry’s ##-level entries, which will break structure/ToC consistency when pasted. | Fix: Align the template to the registry format (## headings and matching section labels) or change the “direct inclusion” claim. Ref: .claude/commands/paper-review.md#L41, docs/methodology/REGISTRY.md#L29.

Code Quality

  • P1 | Impact: The scout binary search hard-codes high = 120, so papers longer than 120 pages are silently truncated and extraction can miss key assumptions, equations, or SE details. | Fix: Use exponential search to find an upper bound (e.g., doubling high until a page fails) before binary search. Ref: .claude/commands/paper-review.md#L183.
  • P2 | Impact: References detection scans only the last 8 pages; long references or early-start appendices can be treated as content, diluting extraction and increasing noise. | Fix: Scan backward until a references heading is found (or to a larger configurable window), or scan all pages for a references heading. Ref: .claude/commands/paper-review.md#L196.

Performance

  • P3 | Impact: None observed in this PR. | Fix: N/A.

Maintainability

  • P3 | Impact: None observed in this PR. | Fix: N/A.

Tech Debt

  • P3 | Impact: None observed in this PR. | Fix: N/A.

Security

  • P3 | Impact: None observed in this PR. | Fix: N/A.

Documentation/Tests

  • P3 | Impact: No testable code changes; documentation-only skill addition. | Fix: N/A.

Fix P1: Replace hard-coded 120-page binary search cap with exponential
probe (64→128→256…) so papers >120 pages are not silently truncated.

Fix P2: Increase references scan window from 8 to 15 pages to cover
papers with large reference sections.

Fix P2: Align template heading levels and labels with REGISTRY.md
(h2 not h3, matching assumption/equation label wording).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 8, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 8, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: a2c35a7add0f4c261bce555834a3c8585b2638f0


Overall assessment: ⚠️ Needs changes

Executive summary:

  • No estimator/math changes; methodology registry content is unaffected.
  • The new skill uses --name directly in output paths without sanitization, which can lead to accidental path traversal or invalid filenames.
  • Scout logic treats any Read error as “page beyond document,” which can truncate extraction if a page is unreadable.
  • Minor inconsistency in slug derivation example vs rule could cause inconsistent filenames.

Methodology

No issues found. (No estimator, identification, or inference changes in this PR.)

Code Quality

  • P2 – Impact: Scout treats any Read error as “page beyond document,” so a single unreadable page can truncate content_pages and skip methodology sections. – Fix: differentiate out‑of‑range vs parse errors; on error, retry or prompt user and fall back to a linear scan with a warning. (.claude/commands/paper-review.md:L178-L181)

Performance

No issues found.

Maintainability

  • P3 – Impact: Slug derivation rules say “first author’s last name,” but the example uses two surnames (callaway-santanna), risking inconsistent filenames across reviews. – Fix: clarify and standardize the slug rule (first author only vs first+second) and update the example. (.claude/commands/paper-review.md:L16-L19, .claude/commands/paper-review.md:L140-L143)

Tech Debt

No issues found.

Security

  • P2 – Impact: --name is interpolated into output paths without sanitization, allowing path traversal or illegal filenames and possible accidental overwrite outside docs/methodology/papers. – Fix: slugify to [a-z0-9-], strip path separators, and confirm the final path before writing. (.claude/commands/paper-review.md:L16-L19, .claude/commands/paper-review.md:L156-L157)

Documentation/Tests

No issues found (docs/skill-only change; tests not expected).

Fix scout treating any Read error as end-of-document by adding corrupt-page
caveat and post-search sanity check. Fix slug derivation rule to match
two-author example (callaway-santanna-2021). Add slug sanitization to
prevent path traversal via --name argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 8, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 8, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 1230e75bb410d66b1014f1f94a990a42f9f0e0a8


Methodology

  • P3 — No estimator, math, weighting, variance/SE, or default-behavior changes in this PR; Impact: none; Fix: none.

Code Quality

  • P1 — Missing guard for content_pages == 0 leads to an invalid Read range ("1-0") when references start on page 1 or there is no content; Impact: skill fails on empty-content PDFs; Fix: after parsing scout results, explicitly handle content_pages < 1 (error or AskUserQuestion for manual ranges, or fall back to 1-total_pages) and skip short-path Read in that case; Location: .claude/commands/paper-review.md:L230-L259.
  • P2 — Short-paper fast path skips output verification and cleans up immediately; Impact: a Write failure could be silently reported as success; Fix: add a read/exists check for docs/methodology/papers/{paper-name}-review.md before cleanup (reuse Phase 4 check logic); Location: .claude/commands/paper-review.md:L256-L263.

Performance

  • P3 — No runtime code changes; Impact: none; Fix: none.

Maintainability

  • P3 — PDF extension check is case-sensitive (.pdf only); Impact: avoidable UX friction for .PDF paths; Fix: accept case-insensitive extension or validate via file existence/MIME sniffing; Location: .claude/commands/paper-review.md:L14-L26.

Tech Debt

  • P3 — No additional tech debt identified beyond issues above; Impact: none; Fix: none.

Security

  • P3 — No secrets or security-sensitive code changes in this diff; Impact: none; Fix: none.

Documentation/Tests

  • P3 — Documentation updates are present and adequate for a command file; no tests expected; Impact: none; Fix: none.

Open Questions / Assumptions

  • Should a PDF with references_start_page == 1 be treated as invalid input (error) or should the skill prompt for manual content ranges?

Overall assessment: ⚠️ Needs changes

Executive summary

  • One P1 edge-case gap: content_pages == 0 triggers an invalid Read range in the short-path flow.
  • One P2 robustness gap: short-paper path doesn’t verify output before cleanup.
  • Minor UX issue: .PDF extensions are rejected due to case-sensitive check.
  • No methodology changes; registry content remains unaffected.

Guard against zero content pages, add output verification to short-paper
fast path, and make PDF extension check case-insensitive.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 8, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 8, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: ee59f004a2535f05798d2d2bc179ac3e355e1187


Overall assessment: ✅ Looks good
Executive Summary

  • No estimator/math/SE/assumption/default behavior changes; diff is a Claude command + doc updates.
  • Paper-review template explicitly aligns to the Methodology Registry format and preserves equation numbering for cross-checks.
  • Edge-case checklist not triggered; no inference code paths added/modified.
  • No security-sensitive changes; temp outputs are gitignored.
  • Testing gap: no automated validation for the new command (manual use only).

Methodology

  • No findings. (Registry format reviewed in docs/methodology/REGISTRY.md:1 and referenced by the new command in .claude/commands/paper-review.md:45 and .claude/commands/paper-review.md:459.)

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P3: The new /paper-review command is not covered by automated checks; correctness relies on manual runs. Impact: regressions in command structure/templates could slip in unnoticed. Fix: add a lightweight doc-lint/smoke test or document a manual validation checklist in CI/CLAUDE.md. (Related: .claude/commands/paper-review.md:1)

@igerber igerber merged commit 4bb60bf into main Feb 8, 2026
@igerber igerber deleted the paper-review-skill branch February 8, 2026 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant