Dang good Office parsing on the web with officeParser

0 ▲

6 hours ago · Tech · 0 comments

A few weeks ago I wrote about using Chrome's built-in AI support to summarize documents - "Summarizing Docs with Built-in AI". This was a followup on an earlier post that was PDF only and made use of an excellent library, officeParser, to work with Microsoft Office files. This library worked well, but had one issue that made it a bit harder to use. Parsing a doc itself was super easy: const getAST = async (file, config) => (await OfficeParser.parseOffice(file, config)); But the issue I ran into was taking that result and turning it into something meaningful for Chrome's model to analyze. PDFs supported a toText() method but for other formats I had to do a bit of work to get a text value. For example, here's the code I used to turn an Excel file into CSV: let arrayBuffer = await file.arrayBuffer(); let data = await getAST(arrayBuffer, {}); const getCSV = s => { let result = ''; let rows = s.children.filter(c => c.type === 'row'); console.log(rows[0]); rows.forEach(r => { let data = [];…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.