This is really cool, thank you. Maintaining good PDF parsers is a full time job...
I want to be honest and open here: I did not write the PDF parser on my own. I heavily relied on the PDF.js project from Mozilla. I have a disclosure in the footer, but perhaps I should communicate about it more clearly.
Very neat!
I spotted a typo, which led me to a bug. When I click on a "stream contents" node, the right panel says "It's a actual content" (instead of "an"), and there is some mouse handling issue that prevents me from selecting the text in the right panel.
Thanks! I've fixed the typo and also allowed the selection of the node's text on the left panel (it was disabled by default).
That's very cool!
1. Is the source available anywhere? I'm curious to see how it works.
2. Is there a way to connect the structure displayed here, to the rendered version in the PDF? To visually display the subcomponents?
I haven't decided if I want to create an open-source version. In the first place, I made it private to worry less about my code quality and to finish the product faster before I lose interest in it.
It heavily relies on the core part of PDF.js: I've made a fork of the PDF.js project, removed everything not related to the core part, and added an export for low-level primitives [1].
Also, as inspiration, I used the pdf.js.utils [2] project, which almost does the same but in a different form.
1. https://github.com/hyzyla/pdf.js-core
2. https://github.com/brendandahl/pdf.js.utils
Regarding 2.: Most of these objects do not directly correspond to rendered elements. Basically every page has one (typically) content stream which will contain all rendered elements. The biggest rendered thing you see outside of that are annotations (link boxes, form fields, actual annotations, ...).
It's a bit different if you are looking at a tagged PDF, where the tagging structure is in there, but if you want to look at that in detail you are probably better served with e.g. ngPDF (https://ngpdf.com/) which will show the tagging structure including the mapping to rendered elements.