Commit Graph

2 Commits

Author SHA1 Message Date
Matt
ed5e782f61 Fix document analysis: switch to unpdf + mammoth for PDF/Word parsing
All checks were successful
Build and Push Docker Image / build (push) Successful in 11m26s
pdf-parse v2 requires DOMMatrix (browser API) which fails in Node.js.
Replaced with unpdf (serverless PDF.js build) for PDFs and mammoth for
Word .docx files. Also fixed the same broken pdf-parse usage in
file-content-extractor.ts used by AI filtering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 10:27:36 +01:00
Matt
c9640c6086 Add document analysis: page count, text extraction & language detection
All checks were successful
Build and Push Docker Image / build (push) Successful in 11m7s
Introduces a document analyzer service that extracts page count (via pdf-parse),
text preview, and detected language (via franc) from uploaded files. Analysis runs
automatically on upload (configurable via SystemSettings) and can be triggered
retroactively for existing files. Results are displayed as badges in the FileViewer
and fed to AI screening for language-based filtering criteria.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 10:09:04 +01:00