Does compression into one large archive result in better compression than individual compression of folders?

up vote
1
down vote

favorite

I have several folders of around 8GB or so. Together these folders total around 60GB of data. I can compress these folders one of two ways: either individually, creating one compressed archive for each of them, or altogether into a single large compressed archive.

Generally speaking, assuming all the data to be compressed is of the same type and the compression algorithm used is the same (and that I also don't care about the time it would take to decompress the larger file), will either method result in better compression than another, or will the total sizes of the compressed files in the two scenarios tend to be equal?

asked Dec 5 at 0:07

Hashim

2,97863056

add a comment |

up vote
1
down vote

favorite

asked Dec 5 at 0:07

Hashim

2,97863056

add a comment |

up vote
1
down vote

favorite

asked Dec 5 at 0:07

Hashim

2,97863056

windows compression 7-zip archiving

asked Dec 5 at 0:07

Hashim

2,97863056

asked Dec 5 at 0:07

Hashim

2,97863056

asked Dec 5 at 0:07

Hashim

2,97863056

asked Dec 5 at 0:07

Hashim

2,97863056

asked Dec 5 at 0:07

Hashim

2,97863056

add a comment |

3 Answers
3

active

oldest

votes

up vote
3
down vote

accepted

Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.

Only if the archive is using solid compression. A non-solid archive (like a Zip archive) compresses files individually. This enables you to easily decompress single files from the archive. It also allows you to add files to the archive without having to recompress everything.

With solid archives, all this is a lot harder: To decompress a file at the very end of the stream, everything has to be decompressed (though not necessarily written to disk). When adding a file, the algorithm also needs to go through everything.

There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.

In the 7-Zip GUI, it’s this option:

7-Zip Add dialog

Without taking into account the data being compressed, it’s really simple:

Non-solid: Fast interactive access, worst compression

Solid blocks: Somewhat efficient interactive access, better compression

Solid: No interactive access, best compression

Depending on the predicted access pattern, you should select a suitable variant.

answered Dec 5 at 21:03

Daniel B

33.2k76087

add a comment |

up vote
3
down vote

While it is impossible to say with absolute certainty, one larger archive theoretically should result in a smaller archive size, as more blocks of data can be found as repetitive. This is assuming the data is as homogenized as you say.

However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.

The only true way to know which method is best would be to test both ways.

answered Dec 5 at 0:36

Keltari

50.2k18115168

add a comment |

up vote
1
down vote

The single archive will almost always be smaller, though not for the reason you think.

Put simply, by having only one archive, you don't waste space with multiple archive file headers. There's some minimal amount of space an archive file takes up just to be a valid archive, and you end up taking up that much space with each archive you create. The only widely used exception to this is the cpio format, which has no header for the archive itself, but instead just has per-file headers.

More realistically, you will usually get at least as good of a compression ratio using just one archive instead of more than one, and with some archivers it can be significantly better (for example, zpaq does deduplication within the archive, so it can save a lot of space if there's lots of duplicated data).

There's another question you need to ask before you decide on this though: Is the overhead of having to handle a single large archive instead of multiple smaller ones worth the space savings? Depending on where you're storing the data, it may be more economical to just use the smaller archives, especially if you're likely to only need one of the folders at a time.

Overall though, Keltari is correct, the only way to know for sure is to test it.

answered Dec 5 at 20:31

Austin Hemmelgarn

2,46418

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1380866%2fdoes-compression-into-one-large-archive-result-in-better-compression-than-indivi%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
3
down vote

accepted

Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.

There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.

In the 7-Zip GUI, it’s this option:

7-Zip Add dialog

Without taking into account the data being compressed, it’s really simple:

Non-solid: Fast interactive access, worst compression

Solid blocks: Somewhat efficient interactive access, better compression

Solid: No interactive access, best compression

Depending on the predicted access pattern, you should select a suitable variant.

answered Dec 5 at 21:03

Daniel B

33.2k76087

add a comment |

up vote
3
down vote

accepted

Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.

There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.

In the 7-Zip GUI, it’s this option:

7-Zip Add dialog

Without taking into account the data being compressed, it’s really simple:

Non-solid: Fast interactive access, worst compression

Solid blocks: Somewhat efficient interactive access, better compression

Solid: No interactive access, best compression

Depending on the predicted access pattern, you should select a suitable variant.

answered Dec 5 at 21:03

Daniel B

33.2k76087

add a comment |

up vote
3
down vote

accepted

Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.

There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.

In the 7-Zip GUI, it’s this option:

7-Zip Add dialog

Without taking into account the data being compressed, it’s really simple:

Non-solid: Fast interactive access, worst compression

Solid blocks: Somewhat efficient interactive access, better compression

Solid: No interactive access, best compression

Depending on the predicted access pattern, you should select a suitable variant.

answered Dec 5 at 21:03

Daniel B

33.2k76087

Does compression into one large archive result in better compression than individual compression of folders? Not necessarily.

There is a middle ground, however: Using “solid blocks”. Now the archiver doesn’t have to process the entire file all the time but only some of the file.

In the 7-Zip GUI, it’s this option:

7-Zip Add dialog

Without taking into account the data being compressed, it’s really simple:

Non-solid: Fast interactive access, worst compression

Solid blocks: Somewhat efficient interactive access, better compression

Solid: No interactive access, best compression

Depending on the predicted access pattern, you should select a suitable variant.

answered Dec 5 at 21:03

Daniel B

33.2k76087

answered Dec 5 at 21:03

Daniel B

33.2k76087

answered Dec 5 at 21:03

Daniel B

33.2k76087

answered Dec 5 at 21:03

Daniel B

33.2k76087

add a comment |

up vote
3
down vote

However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.

The only true way to know which method is best would be to test both ways.

answered Dec 5 at 0:36

Keltari

50.2k18115168

add a comment |

up vote
3
down vote

However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.

The only true way to know which method is best would be to test both ways.

answered Dec 5 at 0:36

Keltari

50.2k18115168

add a comment |

up vote
3
down vote

However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.

The only true way to know which method is best would be to test both ways.

answered Dec 5 at 0:36

Keltari

50.2k18115168

However, it is entirely possible that certain folders contain files that have more similar blocks of data and therefore, might compress better as its own individual archive.

The only true way to know which method is best would be to test both ways.

answered Dec 5 at 0:36

Keltari

50.2k18115168

answered Dec 5 at 0:36

Keltari

50.2k18115168

answered Dec 5 at 0:36

Keltari

50.2k18115168

answered Dec 5 at 0:36

Keltari

50.2k18115168

add a comment |

up vote
1
down vote

The single archive will almost always be smaller, though not for the reason you think.

Overall though, Keltari is correct, the only way to know for sure is to test it.

answered Dec 5 at 20:31

Austin Hemmelgarn

2,46418

add a comment |

up vote
1
down vote

The single archive will almost always be smaller, though not for the reason you think.

Overall though, Keltari is correct, the only way to know for sure is to test it.

answered Dec 5 at 20:31

Austin Hemmelgarn

2,46418

add a comment |

up vote
1
down vote

The single archive will almost always be smaller, though not for the reason you think.

Overall though, Keltari is correct, the only way to know for sure is to test it.

answered Dec 5 at 20:31

Austin Hemmelgarn

2,46418

The single archive will almost always be smaller, though not for the reason you think.

Overall though, Keltari is correct, the only way to know for sure is to test it.

answered Dec 5 at 20:31

Austin Hemmelgarn

2,46418

answered Dec 5 at 20:31

Austin Hemmelgarn

2,46418

answered Dec 5 at 20:31

Austin Hemmelgarn

2,46418

answered Dec 5 at 20:31

Austin Hemmelgarn

2,46418

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Super User!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtyktl