Note: must the Windows binary really be 78MB ?
Another seemingly extremely similar project released in the last few days: https://github.com/raulcd/datanomy
Right now parqeye looks mainly single-file focused. Do you have plans for a “dataset mode” that takes a dir/S3 prefix and surfaces per-file/row-group summaries (row counts, min/max, null %, schema diffs vs a reference file) using just Parquet stats so it scales to tens of GB? Or do you see parqeye intentionally staying a single-file inspector?
duckdb -c "from 'foo.parquet'"
but maybe still useful for other formats or multi-file or remote situations[1] https://github.com/Vitruves/nail-parquet [2] https://github.com/NixOS/nixpkgs/pull/449066
Native Mac/Windows app with multi-threaded parsing (simdjson), automatic nested object flattening, and handles 10M+ rows instantly.
For HN: Use code HN100 for free access
https://iotdatasystems.gumroad.com/
Built with C++ for native performance (~6MB app, not Electron).
Would love feedback from folks working with large JSONL files.
I think you can afford the extra characters to show the whole page in portrait mode. (iPhone 16 pro Safari)
Also just added a Data Plot feature for visualizing numeric columns.
Thanks to everyone who reported the issue!
Will take a look when i get to my laptop!
I did submit a feature request for vi keybindings; though I could look into contributing this myself if I find a bit of spare time.
The other thing that surprised me was the size of the binaries: 90MB for a TUI tool (x64 Linux)? I wonder what the bulk of that is? Is there an issue with LTO? An other commenter noticed as well.
It also looks like you are building against a relatively recent glibc (2.34), which limits compatibility with older systems. Building against an older glibc can be hard to do, so I am not faulting you here, and you do provide a musl fallback, which is appreciated (mandatory notice that the musl allocator can dramatically degrade the performance of rust programs, just in case you were not aware of this).
A few more ideas for improvements (you probably already have your own laundry list):
- Mouse support?
- Seeing that you do have graphs, it would be fun to see a scatter plot as well as a distribution plot under statistics in the "Row Groups" tab (though you probably pull these from the metadata, so that would require further processing, which may be out of scope).
i tried to install with brew, but it told me my cli tools were "too out of date". Never seen that before! and also just upgraded.
Will try again tomorrow