A bunch of enhancements for dealing with large tar archives#37
Merged
splitbrain merged 6 commits intosplitbrain:masterfrom Dec 9, 2024
Merged
A bunch of enhancements for dealing with large tar archives#37splitbrain merged 6 commits intosplitbrain:masterfrom
splitbrain merged 6 commits intosplitbrain:masterfrom
Conversation
In 2001 the GNU tar introduced support for large and negative numbers (https://www.gnu.org/software/tar/manual/html_node/Extensions.html#Extensions) This is required to handle files bigger than 8G.
So far there was no way to read the data from a file in an archive without extracting it and extraction of a single file required rereading of a whole archive. This commit changes the yieldContents() in a way it does not skip to the next header entry before returning a current header content. A position of the next header entry is remembered instead and rewinded to only at the next next() call on the generator. This allows to read the current entry content until the next() call. For that the Tar::readCurrentEntry() method was added.
Tar::addData(): pad only the last block of data and write everything else with just a single writebytes() call and without pack(). Tar::addFile(): move the read chunk size to a class constant.
Owner
|
That's pretty cool. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request contains following changes to the
Tarclass:static publicmethods to make them testable without a need to create >8GiB files.pack()calls (only the very last 512 bytes block needs it).Tar::readCurrentEntry()method which allows reading tar entry content while iterating trough the archive with the generator returned by theTar::yieldContents(). This allows efficient inspection of large tar archive content without need to extract it. In my particular case it is allows me to check backup consistency with code like$chunkSizeonly) and reads the archive only once no matter the tar archive size and format (compressed or not).