Does compression into one large archive result in better compression than individual compression of folders?
up vote
1
down vote
favorite
I have several folders of around 8GB or so. Together these folders total around 60GB of data. I can compress these folders one of two ways: either individually, creating one compressed archive for each of them, or altogether into a single large compressed archive.
Generally speaking, assuming all the data to be compressed is of the same type and the compression algorithm used is the same (and that I also don't care about the time it would take to decompress the larger file), will either method result in better compression than another, or will the total sizes of the compressed files in the two scenarios tend to be equal?
windows compression 7-zip archiving
add a comment |
up vote
1
down vote
favorite
I have several folders of around 8GB or so. Together these folders total around 60GB of data. I can compress these folders one of two ways: either individually, creating one compressed archive for each of them, or altogether into a single large compressed archive.
Generally speaking, assuming all the data to be compressed is of the same type and the compression algorithm used is the same (and that I also don't care about the time it would take to decompress the larger file), will either method result in better compression than another, or will the total sizes of the compressed files in the two scenarios tend to be equal?
windows compression 7-zip archiving
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have several folders of around 8GB or so. Together these folders total around 60GB of data. I can compress these folders one of two ways: either individually, creating one compressed archive for each of them, or altogether into a single large compressed archive.
Generally speaking, assuming all the data to be compressed is of the same type and the compression algorithm used is the same (and that I also don't care about the time it would take to decompress the larger file), will either method result in better compression than another, or will the total sizes of the compressed files in the two scenarios tend to be equal?
windows compression 7-zip archiving
I have several folders of around 8GB or so. Together these folders total around 60GB of data. I can compress these folders one of two ways: either individually, creating one compressed archive for each of them, or altogether into a single large compressed archive.
Generally speaking, assuming all the data to be compressed is of the same type and the compression algorithm used is the same (and that I also don't care about the time it would take to decompress the larger file), will either method result in better compression than another, or will the total sizes of the compressed files in the two scenarios tend to be equal?
windows compression 7-zip archiving
windows compression 7-zip archiving
asked Dec 5 at 0:07
Hashim
2,97863056
2,97863056
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
up vote
3
down vote
accepted
Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.
Only if the archive is using solid compression. A non-solid archive (like a Zip archive) compresses files individually. This enables you to easily decompress single files from the archive. It also allows you to add files to the archive without having to recompress everything.
With solid archives, all this is a lot harder: To decompress a file at the very end of the stream, everything has to be decompressed (though not necessarily written to disk). When adding a file, the algorithm also needs to go through everything.
There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.
In the 7-Zip GUI, it’s this option:
Without taking into account the data being compressed, it’s really simple:
- Non-solid: Fast interactive access, worst compression
- Solid blocks: Somewhat efficient interactive access, better compression
- Solid: No interactive access, best compression
Depending on the predicted access pattern, you should select a suitable variant.
add a comment |
up vote
3
down vote
While it is impossible to say with absolute certainty, one larger archive theoretically should result in a smaller archive size, as more blocks of data can be found as repetitive. This is assuming the data is as homogenized as you say.
However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.
The only true way to know which method is best would be to test both ways.
add a comment |
up vote
1
down vote
The single archive will almost always be smaller, though not for the reason you think.
Put simply, by having only one archive, you don't waste space with multiple archive file headers. There's some minimal amount of space an archive file takes up just to be a valid archive, and you end up taking up that much space with each archive you create. The only widely used exception to this is the cpio
format, which has no header for the archive itself, but instead just has per-file headers.
More realistically, you will usually get at least as good of a compression ratio using just one archive instead of more than one, and with some archivers it can be significantly better (for example, zpaq
does deduplication within the archive, so it can save a lot of space if there's lots of duplicated data).
There's another question you need to ask before you decide on this though: Is the overhead of having to handle a single large archive instead of multiple smaller ones worth the space savings? Depending on where you're storing the data, it may be more economical to just use the smaller archives, especially if you're likely to only need one of the folders at a time.
Overall though, Keltari is correct, the only way to know for sure is to test it.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1380866%2fdoes-compression-into-one-large-archive-result-in-better-compression-than-indivi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.
Only if the archive is using solid compression. A non-solid archive (like a Zip archive) compresses files individually. This enables you to easily decompress single files from the archive. It also allows you to add files to the archive without having to recompress everything.
With solid archives, all this is a lot harder: To decompress a file at the very end of the stream, everything has to be decompressed (though not necessarily written to disk). When adding a file, the algorithm also needs to go through everything.
There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.
In the 7-Zip GUI, it’s this option:
Without taking into account the data being compressed, it’s really simple:
- Non-solid: Fast interactive access, worst compression
- Solid blocks: Somewhat efficient interactive access, better compression
- Solid: No interactive access, best compression
Depending on the predicted access pattern, you should select a suitable variant.
add a comment |
up vote
3
down vote
accepted
Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.
Only if the archive is using solid compression. A non-solid archive (like a Zip archive) compresses files individually. This enables you to easily decompress single files from the archive. It also allows you to add files to the archive without having to recompress everything.
With solid archives, all this is a lot harder: To decompress a file at the very end of the stream, everything has to be decompressed (though not necessarily written to disk). When adding a file, the algorithm also needs to go through everything.
There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.
In the 7-Zip GUI, it’s this option:
Without taking into account the data being compressed, it’s really simple:
- Non-solid: Fast interactive access, worst compression
- Solid blocks: Somewhat efficient interactive access, better compression
- Solid: No interactive access, best compression
Depending on the predicted access pattern, you should select a suitable variant.
add a comment |
up vote
3
down vote
accepted
up vote
3
down vote
accepted
Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.
Only if the archive is using solid compression. A non-solid archive (like a Zip archive) compresses files individually. This enables you to easily decompress single files from the archive. It also allows you to add files to the archive without having to recompress everything.
With solid archives, all this is a lot harder: To decompress a file at the very end of the stream, everything has to be decompressed (though not necessarily written to disk). When adding a file, the algorithm also needs to go through everything.
There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.
In the 7-Zip GUI, it’s this option:
Without taking into account the data being compressed, it’s really simple:
- Non-solid: Fast interactive access, worst compression
- Solid blocks: Somewhat efficient interactive access, better compression
- Solid: No interactive access, best compression
Depending on the predicted access pattern, you should select a suitable variant.
Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.
Only if the archive is using solid compression. A non-solid archive (like a Zip archive) compresses files individually. This enables you to easily decompress single files from the archive. It also allows you to add files to the archive without having to recompress everything.
With solid archives, all this is a lot harder: To decompress a file at the very end of the stream, everything has to be decompressed (though not necessarily written to disk). When adding a file, the algorithm also needs to go through everything.
There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.
In the 7-Zip GUI, it’s this option:
Without taking into account the data being compressed, it’s really simple:
- Non-solid: Fast interactive access, worst compression
- Solid blocks: Somewhat efficient interactive access, better compression
- Solid: No interactive access, best compression
Depending on the predicted access pattern, you should select a suitable variant.
answered Dec 5 at 21:03
Daniel B
33.2k76087
33.2k76087
add a comment |
add a comment |
up vote
3
down vote
While it is impossible to say with absolute certainty, one larger archive theoretically should result in a smaller archive size, as more blocks of data can be found as repetitive. This is assuming the data is as homogenized as you say.
However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.
The only true way to know which method is best would be to test both ways.
add a comment |
up vote
3
down vote
While it is impossible to say with absolute certainty, one larger archive theoretically should result in a smaller archive size, as more blocks of data can be found as repetitive. This is assuming the data is as homogenized as you say.
However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.
The only true way to know which method is best would be to test both ways.
add a comment |
up vote
3
down vote
up vote
3
down vote
While it is impossible to say with absolute certainty, one larger archive theoretically should result in a smaller archive size, as more blocks of data can be found as repetitive. This is assuming the data is as homogenized as you say.
However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.
The only true way to know which method is best would be to test both ways.
While it is impossible to say with absolute certainty, one larger archive theoretically should result in a smaller archive size, as more blocks of data can be found as repetitive. This is assuming the data is as homogenized as you say.
However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.
The only true way to know which method is best would be to test both ways.
answered Dec 5 at 0:36
Keltari
50.2k18115168
50.2k18115168
add a comment |
add a comment |
up vote
1
down vote
The single archive will almost always be smaller, though not for the reason you think.
Put simply, by having only one archive, you don't waste space with multiple archive file headers. There's some minimal amount of space an archive file takes up just to be a valid archive, and you end up taking up that much space with each archive you create. The only widely used exception to this is the cpio
format, which has no header for the archive itself, but instead just has per-file headers.
More realistically, you will usually get at least as good of a compression ratio using just one archive instead of more than one, and with some archivers it can be significantly better (for example, zpaq
does deduplication within the archive, so it can save a lot of space if there's lots of duplicated data).
There's another question you need to ask before you decide on this though: Is the overhead of having to handle a single large archive instead of multiple smaller ones worth the space savings? Depending on where you're storing the data, it may be more economical to just use the smaller archives, especially if you're likely to only need one of the folders at a time.
Overall though, Keltari is correct, the only way to know for sure is to test it.
add a comment |
up vote
1
down vote
The single archive will almost always be smaller, though not for the reason you think.
Put simply, by having only one archive, you don't waste space with multiple archive file headers. There's some minimal amount of space an archive file takes up just to be a valid archive, and you end up taking up that much space with each archive you create. The only widely used exception to this is the cpio
format, which has no header for the archive itself, but instead just has per-file headers.
More realistically, you will usually get at least as good of a compression ratio using just one archive instead of more than one, and with some archivers it can be significantly better (for example, zpaq
does deduplication within the archive, so it can save a lot of space if there's lots of duplicated data).
There's another question you need to ask before you decide on this though: Is the overhead of having to handle a single large archive instead of multiple smaller ones worth the space savings? Depending on where you're storing the data, it may be more economical to just use the smaller archives, especially if you're likely to only need one of the folders at a time.
Overall though, Keltari is correct, the only way to know for sure is to test it.
add a comment |
up vote
1
down vote
up vote
1
down vote
The single archive will almost always be smaller, though not for the reason you think.
Put simply, by having only one archive, you don't waste space with multiple archive file headers. There's some minimal amount of space an archive file takes up just to be a valid archive, and you end up taking up that much space with each archive you create. The only widely used exception to this is the cpio
format, which has no header for the archive itself, but instead just has per-file headers.
More realistically, you will usually get at least as good of a compression ratio using just one archive instead of more than one, and with some archivers it can be significantly better (for example, zpaq
does deduplication within the archive, so it can save a lot of space if there's lots of duplicated data).
There's another question you need to ask before you decide on this though: Is the overhead of having to handle a single large archive instead of multiple smaller ones worth the space savings? Depending on where you're storing the data, it may be more economical to just use the smaller archives, especially if you're likely to only need one of the folders at a time.
Overall though, Keltari is correct, the only way to know for sure is to test it.
The single archive will almost always be smaller, though not for the reason you think.
Put simply, by having only one archive, you don't waste space with multiple archive file headers. There's some minimal amount of space an archive file takes up just to be a valid archive, and you end up taking up that much space with each archive you create. The only widely used exception to this is the cpio
format, which has no header for the archive itself, but instead just has per-file headers.
More realistically, you will usually get at least as good of a compression ratio using just one archive instead of more than one, and with some archivers it can be significantly better (for example, zpaq
does deduplication within the archive, so it can save a lot of space if there's lots of duplicated data).
There's another question you need to ask before you decide on this though: Is the overhead of having to handle a single large archive instead of multiple smaller ones worth the space savings? Depending on where you're storing the data, it may be more economical to just use the smaller archives, especially if you're likely to only need one of the folders at a time.
Overall though, Keltari is correct, the only way to know for sure is to test it.
answered Dec 5 at 20:31
Austin Hemmelgarn
2,46418
2,46418
add a comment |
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1380866%2fdoes-compression-into-one-large-archive-result-in-better-compression-than-indivi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown