Package: tableParser 1.0.5

tableParser: Parse Tabled Content to Text Vector and Extract Statistical Standard Results

Features include the ability to extract tabled content from NISO-JATS-coded XML, any native HTML or HML file, DOCX, and PDF documents, and then collapse it into a text format that is readable by humans by mimicking the actions of a screen reader. As tables within PDF documents are extracted with the 'tabulapdf' package, and the table captions and footnotes cannot be extracted, the results on tables within PDF documents have to be considered less precise. The function table2matrix() returns a list of the tables within a document as character matrices. table2text() collapses the matrix content into a list of character strings by imitating the behavior of a screen reader. The textual representation of characters and numbers can be unified with unifyMatrix() before parsing. The function table2stats() extracts the tabled statistical test results from the collapsed text with the function standardStats() from the 'JATSdecoder' package and, if activated, checks the reported and coded p-values for consistency. Due to the great variability and potential complexity of table structures, parsing accuracy may vary. A detailed description of how 'tableParser' works is provided here: <doi:10.48550/arXiv.2603.19756>.

Authors:Ingmar Böschen [aut, cre]

tableParser_1.0.5.tar.gz
tableParser_1.0.5.zip(r-4.7)tableParser_1.0.5.zip(r-4.6)tableParser_1.0.5.zip(r-4.5)
tableParser_1.0.5.tgz(r-4.6-any)tableParser_1.0.5.tgz(r-4.5-any)
tableParser_1.0.5.tar.gz(r-4.7-any)tableParser_1.0.5.tar.gz(r-4.6-any)
tableParser_1.0.5.tgz(r-4.6-emscripten)
manual.pdf |manual.html
card.svg |card.png
tableParser/json (API)

# Install 'tableParser' in R:
install.packages('tableParser', repos = c('https://ingmarboeschen.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/ingmarboeschen/tableparser/issues

Uses libs:
  • openjdk– OpenJDK Java runtime, using Hotspot JIT

On CRAN:

Conda:

openjdk

4.48 score 3 stars 448 downloads 16 exports 31 dependencies

Last updated from:321a2ac610. Checks:4 NOTE, 2 OK, 3 ERROR. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64NOTE140
source / vignettesOK167
linux-release-x86_64NOTE140
macos-release-arm64NOTE137
macos-oldrel-arm64NOTE93
windows-develERROR93
windows-releaseERROR124
windows-oldrelERROR85
wasm-releaseOK111

Exports:docx2matrixget.captionget.footerget.HTML.tablesguessCaptionFootnotehtml2unicodelegendCodingsmatrix2textparseMatrixContentprepareMatrixtable2matrixtable2statstable2texttableClassunifyMatrixContentunifyStats

Dependencies:bitbit64clicliprcpp11crayongluehmsJATSdecoderlifecyclemagrittrNLPopenNLPopenNLPdatapillarpkgconfigpngprettyunitsprogressR6readrrJavarlangtabulapdftibbletidyselecttzdbutf8vctrsvroomwithr

Readme and manuals

Help Manual

Help pageTopics
docx2matrixdocx2matrix
get.captionget.caption
get.footerget.footer
get.HTML.tablesget.HTML.tables
get.tablesget.tables
guessCaptionFootnoteguessCaptionFootnote
html2unicodehtml2unicode
legendCodingslegendCodings
matrix2textmatrix2text
parseMatrixContentparseMatrixContent
prepareMatrixprepareMatrix
table2matrixtable2matrix
table2statstable2stats
table2texttable2text
tableClasstableClass
unifyMatrixContentunifyMatrixContent
unifyStatsunifyStats