Questions about storage virtualization
I am trying to understand and compare storage virtualization methods, including RAID and LVM. I hope I could get some general idea and big picture for the relation between various concepts.
I was wondering if various storage
virtualization methods can be
classified into virtualization at
the device (disk), partition or filesystem
levels, as following
RAID belongs to virtualization at the device/disk level, which replace
physical disks with logical/virtual
disks.
LVM belongs to virtualization at the partition level, which
replaces partitions with logical/virtual
partitionss (also called logical
volumes).- There is also vitualization at the filesystem level, which
replaces filesystems with logical/virtual
filesystems, for example,
Network-attached storage (NAS).
If my above understanding is
correct, does virtualization at each
level also implement virtualization
at all lower levels? For example,
virtualization at partition level
also implements virtualization at
device level, and virtualization at
filesystem level also implements
virtualization at both partition and
device levels?How do different levels of
virtualization affect/determine
their different areas of
applications? For example, are there
applications suitable for RAID but
not for LVM, and for LVM but not for
RAID?
There is a Wikipedia article for
storage virtualization, where
there are two main categories of
methods, block virtualization (which
can further be classified into
storage device-based and host-based
and network-based) and file
virtualization.
Compare the article with my
understanding in part 1,:
- Is it correct that storage device-based block virtualization is same as virtualization at the device level. Host-based block Virtualization is same as virtualization at the partition level. File virtualization is same as virtualization at the filesystem level.
- But in Host-based block Virtualization#Specific_examples,
it looks like Host-based block Virtualization includes virtualization at the filesystem level? How shall one understand what is File virtualization then?
- I would rather to single out
network-based from block
virtualization in the aforementioned
Wikipedia article, because for
storage virtualization over
network, I think we can also
classify the various methods into
the levels of device, partition and
filesystem? For example, can I say
Storage Area Network (SAN) belongs
to the level of device, and
Network-attached storage (NAS) to
the level of filesystem?
Thanks and regards!
virtualization storage
add a comment |
I am trying to understand and compare storage virtualization methods, including RAID and LVM. I hope I could get some general idea and big picture for the relation between various concepts.
I was wondering if various storage
virtualization methods can be
classified into virtualization at
the device (disk), partition or filesystem
levels, as following
RAID belongs to virtualization at the device/disk level, which replace
physical disks with logical/virtual
disks.
LVM belongs to virtualization at the partition level, which
replaces partitions with logical/virtual
partitionss (also called logical
volumes).- There is also vitualization at the filesystem level, which
replaces filesystems with logical/virtual
filesystems, for example,
Network-attached storage (NAS).
If my above understanding is
correct, does virtualization at each
level also implement virtualization
at all lower levels? For example,
virtualization at partition level
also implements virtualization at
device level, and virtualization at
filesystem level also implements
virtualization at both partition and
device levels?How do different levels of
virtualization affect/determine
their different areas of
applications? For example, are there
applications suitable for RAID but
not for LVM, and for LVM but not for
RAID?
There is a Wikipedia article for
storage virtualization, where
there are two main categories of
methods, block virtualization (which
can further be classified into
storage device-based and host-based
and network-based) and file
virtualization.
Compare the article with my
understanding in part 1,:
- Is it correct that storage device-based block virtualization is same as virtualization at the device level. Host-based block Virtualization is same as virtualization at the partition level. File virtualization is same as virtualization at the filesystem level.
- But in Host-based block Virtualization#Specific_examples,
it looks like Host-based block Virtualization includes virtualization at the filesystem level? How shall one understand what is File virtualization then?
- I would rather to single out
network-based from block
virtualization in the aforementioned
Wikipedia article, because for
storage virtualization over
network, I think we can also
classify the various methods into
the levels of device, partition and
filesystem? For example, can I say
Storage Area Network (SAN) belongs
to the level of device, and
Network-attached storage (NAS) to
the level of filesystem?
Thanks and regards!
virtualization storage
2
i don't think you are being very accurate in describing RAID and LVM as "disk" or "partition" level virtualization. Storage Virtualization refers to the abstraction of multiple, commonly network-linked equipment that are centrally managed and allow access to the system as a whole rather than a per-server basis. RAID/LVM has little to do with Storage virtualization per se, although (of course) they are commonly used in SAN clusters.
– bubu
May 31 '11 at 19:17
Thanks! But I don't understand "RAID/LVM has little to do with Storage virtualization per se". From the Wikipedia articles for storage virtualization (en.wikipedia.org/wiki/Storage_virtualization), LVM (en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)) and RAID (en.wikipedia.org/wiki/RAID), both RAID and LVM are methods of storage virtualization and storage virtualization is not just about the case of network-linked storage devices.
– Tim
May 31 '11 at 19:40
1
the wikipedia article on storage virtualization is mediocre at best.
– bubu
May 31 '11 at 19:51
Then any references worth recommendation?
– Tim
May 31 '11 at 19:53
have a look: www-03.ibm.com/systems/resources/…
– bubu
May 31 '11 at 20:08
add a comment |
I am trying to understand and compare storage virtualization methods, including RAID and LVM. I hope I could get some general idea and big picture for the relation between various concepts.
I was wondering if various storage
virtualization methods can be
classified into virtualization at
the device (disk), partition or filesystem
levels, as following
RAID belongs to virtualization at the device/disk level, which replace
physical disks with logical/virtual
disks.
LVM belongs to virtualization at the partition level, which
replaces partitions with logical/virtual
partitionss (also called logical
volumes).- There is also vitualization at the filesystem level, which
replaces filesystems with logical/virtual
filesystems, for example,
Network-attached storage (NAS).
If my above understanding is
correct, does virtualization at each
level also implement virtualization
at all lower levels? For example,
virtualization at partition level
also implements virtualization at
device level, and virtualization at
filesystem level also implements
virtualization at both partition and
device levels?How do different levels of
virtualization affect/determine
their different areas of
applications? For example, are there
applications suitable for RAID but
not for LVM, and for LVM but not for
RAID?
There is a Wikipedia article for
storage virtualization, where
there are two main categories of
methods, block virtualization (which
can further be classified into
storage device-based and host-based
and network-based) and file
virtualization.
Compare the article with my
understanding in part 1,:
- Is it correct that storage device-based block virtualization is same as virtualization at the device level. Host-based block Virtualization is same as virtualization at the partition level. File virtualization is same as virtualization at the filesystem level.
- But in Host-based block Virtualization#Specific_examples,
it looks like Host-based block Virtualization includes virtualization at the filesystem level? How shall one understand what is File virtualization then?
- I would rather to single out
network-based from block
virtualization in the aforementioned
Wikipedia article, because for
storage virtualization over
network, I think we can also
classify the various methods into
the levels of device, partition and
filesystem? For example, can I say
Storage Area Network (SAN) belongs
to the level of device, and
Network-attached storage (NAS) to
the level of filesystem?
Thanks and regards!
virtualization storage
I am trying to understand and compare storage virtualization methods, including RAID and LVM. I hope I could get some general idea and big picture for the relation between various concepts.
I was wondering if various storage
virtualization methods can be
classified into virtualization at
the device (disk), partition or filesystem
levels, as following
RAID belongs to virtualization at the device/disk level, which replace
physical disks with logical/virtual
disks.
LVM belongs to virtualization at the partition level, which
replaces partitions with logical/virtual
partitionss (also called logical
volumes).- There is also vitualization at the filesystem level, which
replaces filesystems with logical/virtual
filesystems, for example,
Network-attached storage (NAS).
If my above understanding is
correct, does virtualization at each
level also implement virtualization
at all lower levels? For example,
virtualization at partition level
also implements virtualization at
device level, and virtualization at
filesystem level also implements
virtualization at both partition and
device levels?How do different levels of
virtualization affect/determine
their different areas of
applications? For example, are there
applications suitable for RAID but
not for LVM, and for LVM but not for
RAID?
There is a Wikipedia article for
storage virtualization, where
there are two main categories of
methods, block virtualization (which
can further be classified into
storage device-based and host-based
and network-based) and file
virtualization.
Compare the article with my
understanding in part 1,:
- Is it correct that storage device-based block virtualization is same as virtualization at the device level. Host-based block Virtualization is same as virtualization at the partition level. File virtualization is same as virtualization at the filesystem level.
- But in Host-based block Virtualization#Specific_examples,
it looks like Host-based block Virtualization includes virtualization at the filesystem level? How shall one understand what is File virtualization then?
- I would rather to single out
network-based from block
virtualization in the aforementioned
Wikipedia article, because for
storage virtualization over
network, I think we can also
classify the various methods into
the levels of device, partition and
filesystem? For example, can I say
Storage Area Network (SAN) belongs
to the level of device, and
Network-attached storage (NAS) to
the level of filesystem?
Thanks and regards!
virtualization storage
virtualization storage
edited May 31 '11 at 20:02
Tim
asked May 31 '11 at 19:01
TimTim
1
1
2
i don't think you are being very accurate in describing RAID and LVM as "disk" or "partition" level virtualization. Storage Virtualization refers to the abstraction of multiple, commonly network-linked equipment that are centrally managed and allow access to the system as a whole rather than a per-server basis. RAID/LVM has little to do with Storage virtualization per se, although (of course) they are commonly used in SAN clusters.
– bubu
May 31 '11 at 19:17
Thanks! But I don't understand "RAID/LVM has little to do with Storage virtualization per se". From the Wikipedia articles for storage virtualization (en.wikipedia.org/wiki/Storage_virtualization), LVM (en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)) and RAID (en.wikipedia.org/wiki/RAID), both RAID and LVM are methods of storage virtualization and storage virtualization is not just about the case of network-linked storage devices.
– Tim
May 31 '11 at 19:40
1
the wikipedia article on storage virtualization is mediocre at best.
– bubu
May 31 '11 at 19:51
Then any references worth recommendation?
– Tim
May 31 '11 at 19:53
have a look: www-03.ibm.com/systems/resources/…
– bubu
May 31 '11 at 20:08
add a comment |
2
i don't think you are being very accurate in describing RAID and LVM as "disk" or "partition" level virtualization. Storage Virtualization refers to the abstraction of multiple, commonly network-linked equipment that are centrally managed and allow access to the system as a whole rather than a per-server basis. RAID/LVM has little to do with Storage virtualization per se, although (of course) they are commonly used in SAN clusters.
– bubu
May 31 '11 at 19:17
Thanks! But I don't understand "RAID/LVM has little to do with Storage virtualization per se". From the Wikipedia articles for storage virtualization (en.wikipedia.org/wiki/Storage_virtualization), LVM (en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)) and RAID (en.wikipedia.org/wiki/RAID), both RAID and LVM are methods of storage virtualization and storage virtualization is not just about the case of network-linked storage devices.
– Tim
May 31 '11 at 19:40
1
the wikipedia article on storage virtualization is mediocre at best.
– bubu
May 31 '11 at 19:51
Then any references worth recommendation?
– Tim
May 31 '11 at 19:53
have a look: www-03.ibm.com/systems/resources/…
– bubu
May 31 '11 at 20:08
2
2
i don't think you are being very accurate in describing RAID and LVM as "disk" or "partition" level virtualization. Storage Virtualization refers to the abstraction of multiple, commonly network-linked equipment that are centrally managed and allow access to the system as a whole rather than a per-server basis. RAID/LVM has little to do with Storage virtualization per se, although (of course) they are commonly used in SAN clusters.
– bubu
May 31 '11 at 19:17
i don't think you are being very accurate in describing RAID and LVM as "disk" or "partition" level virtualization. Storage Virtualization refers to the abstraction of multiple, commonly network-linked equipment that are centrally managed and allow access to the system as a whole rather than a per-server basis. RAID/LVM has little to do with Storage virtualization per se, although (of course) they are commonly used in SAN clusters.
– bubu
May 31 '11 at 19:17
Thanks! But I don't understand "RAID/LVM has little to do with Storage virtualization per se". From the Wikipedia articles for storage virtualization (en.wikipedia.org/wiki/Storage_virtualization), LVM (en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)) and RAID (en.wikipedia.org/wiki/RAID), both RAID and LVM are methods of storage virtualization and storage virtualization is not just about the case of network-linked storage devices.
– Tim
May 31 '11 at 19:40
Thanks! But I don't understand "RAID/LVM has little to do with Storage virtualization per se". From the Wikipedia articles for storage virtualization (en.wikipedia.org/wiki/Storage_virtualization), LVM (en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)) and RAID (en.wikipedia.org/wiki/RAID), both RAID and LVM are methods of storage virtualization and storage virtualization is not just about the case of network-linked storage devices.
– Tim
May 31 '11 at 19:40
1
1
the wikipedia article on storage virtualization is mediocre at best.
– bubu
May 31 '11 at 19:51
the wikipedia article on storage virtualization is mediocre at best.
– bubu
May 31 '11 at 19:51
Then any references worth recommendation?
– Tim
May 31 '11 at 19:53
Then any references worth recommendation?
– Tim
May 31 '11 at 19:53
have a look: www-03.ibm.com/systems/resources/…
– bubu
May 31 '11 at 20:08
have a look: www-03.ibm.com/systems/resources/…
– bubu
May 31 '11 at 20:08
add a comment |
3 Answers
3
active
oldest
votes
Might as well provide an answer. Refer to @soandos 's answer for more detailed answering of your specific questions.
LVM vs RAID
RAID, as many have mentioned, is a standard of technologies in which multiple disk drives are allocated together as an array of disks, providing varying level of performance and reliability benefits. For example, RAID 0 provides the best performance one can possibly get with the harddrives, and is extremely sensitive to disk loss (one loss = essentially total loss) whereas RAID 6 provide redundancy even when rebuilding array in a one drive loss scenario. RAID array are usually seen as one single drive to the OS.
One can say that RAID is a many to one mapping.
LVM, on the other hand, allows logical "disk drives" (block device to be accurate, but anyways) to be formed by parts of different disk drives. They exist in a "many-to-many mapping" manner. While one can use LVM to accomplish what can be accomplished by RAID, LVM is actually something that can accomplish much more. For example, to add another disk drive to a RAID array it would likely be necessary to rebuild the whole array from scratch. With LVM, it is just adding a disk drive to the machine, adding the disk drive mapping to a logical volume, and using it (the actual configuration is a little bit more complicated but certainly less than rebuilding a whole array).
add a comment |
- RAID is a backup technology that insure that in the case of drive failure, all data remains intact, and LVM is Logical Volume Manager that can do many things.
- It's not.
- That seems like an odd question. RAID is a way of separating data across drives, so that if one fails no data is lost. LVM is a volume manager that can be used to change the way a user/OS looks at all the hard drives. They have nothing to do with each other (though LVM can implement RAID 1 and RAID 0, that is not its primary focus).
- The first means you don't have to know what physical device the data is on, and the second means that you can store for lack of a better word the links between files in a more abstract way.
- As stated above there is no "device level" or "partition level" to talk about so no, you can't refer to them as such.
Thanks! But I still can't see how RAID and LVM are fundamentally different, as both are realization of storage virtualization. Therefore still don't understand why they are used for different purposes?
– Tim
May 31 '11 at 19:36
RAID is not about virtualization. It stands for "Redundant Array of Independent Disks." The word virtualization does not even appear on the wikipedia article about it. Read this:en.wikipedia.org/wiki/RAID for a longer explaination.
– soandos
May 31 '11 at 19:40
Thanks! But I still don't understand your opinion. (1) "virtual disk" and "virtual device" appear in the article for RAID, and I think RAID is storage virtualization at the device level. In the article for storage device, RAID is mentioned in storage device-based block virtualization (en.wikipedia.org/wiki/…).
– Tim
May 31 '11 at 19:50
They might have something in common, but the goals are totally different. In RAID, you need a virtual disk because of data striping. Since the data that would generally be written to one disk is now written to more than one, you need a way to read the whole file as if it was contiguous. This is a side point to the idea of RAID though, and is not needed in RAID 1 for example.
– soandos
May 31 '11 at 19:50
(2) As to "RAID is not about virtualization", I wonder if RAID makes several physical storage devices look like and used as a whole virtual/logical storage device? Quoted from the Raid article "this is achieved by combining multiple disk drive components into a logical unit".
– Tim
May 31 '11 at 19:51
|
show 3 more comments
The article is not well written.
One of the biggest problems is that there are multiple layers of abstraction built into the full stack of storage, and "virtualization" is a fuzzy enough word as to be hard to definitively assign a place to put it. For a good look at the many layers of abstraction in storage, I'll point you at a blog piece I did last year (read it here for the gory details).
In marketing-speak, "Storage Virtualization" is just introducing abstraction where there previously hasn't been any. That can happen at many points depending on the market segment. But that's just marketing. Time for technical.
The storage stack (somewhat simplified):
- Disk
- RAID controller
- Software RAID
- Volume manager
- Filesystem
- Network filesystem
- Network filesystem client
Disk, even old school spinning magnetic disks, do a level of virtualization. They present a logical view of the actual blocks on the platters (or storage cells for an SSD), and this has been this way since the mid 80's or so. Magnetic drives reserve a certain number of blocks for reassigning blocks that go bad, and the logical view is how this is abstracted away from the disk controller. Technologies like SMART can catch this in the act and report that the drive is "pre-fail" so you can plan your transition accordingly. This has been in place in some form since the 80's.
RAID cards provide another abstraction layer, hiding the true shape of storage from an operating system. This has been in place since the first RAID cards came out in the late 80's, and they've only gotten more complex since then. Cards with write-caches on them provide still another abstraction layer, as writes can be reported as committed before they're actually on a disk somewhere. The really fancy ones (such as those in Storage Area Network arrays) can even write to two separate disk arrays for realtime replication, and the OS is none the wiser.
Once you get into the operating system things get a lot more murky, as each does their own thing. Software RAID (md in Linux) is typically implemented as a low level storage driver that presents the logically combined storage to higher storage layers. As with the RAID cards, you can do all sorts of interesting things here. Some of the "Storage Virtualization" products you see out there are implemented at this stage.
Going higher you get to the volume managers (LVM) can provide for some seriously complex configurations. Where the next layer down aggregates disks into a single virtual volume, the volume managers can combine multiple volumes into a single bigger volume... or split a pool of volumes into an arbitrary number of volumes. Again, some of the Storage Virtualization products you see have a presence in this layer as well.
The next step up is the filesystem. This is the layer where the well known abstractions of "file" and "directory" come into existence. Some filesystems (btrfs, zfs) have volume-manager like features built into them which allows things like snapshotting, deduplication, replication to other devices, and even migration of files between storage tiers. That last bit is not in many filesystems yet, but is definitely a target for Storage Virtualization vendors.
The next step up is the network filesystem. This is things like Samba/CIFS, NetATalk/Appletalk, NFS, and others. If written the right way, these network filesystems can further abstract storage. One product I'm thinking of, Novell's Open Enterprise Server and their ShadowVolumes, takes multiple volumes on different storage (presumably differing speeds/cost) and presents them as a single volume to the network user, and then migrates files between the volumes based on usage statistics. Some of the "Storage Virtualization" appliances you can buy actually do their heavy lifting at this layer.
The last stop on our trip up the storage stack is the network filesystem client in the client machine. It is at this level that the Distributed File System (DFS) exists, which allows a single logical presentation of a filesystem to exist on multiple network filesystems. The client knows that this is a DFS share, and that specific object is a DFS link, and when following it present the specified network-share as a sub-directory of the parent directory. There have been other examples of abstraction at this level, but DFS is perhaps the most common.
One thing to keep in mind is that through all of this, each layer of the storage stack is independent of those above it. Many layers are already doing block-level abstraction, so adding one more doesn't change a whole lot. File-level abstraction has to happen near the top of the stack (for that's where the file-systems live) and the impacts lower down are highly decoupled to the point that it may not even be noticed.
At it's core "Storage Virtualization" is still mostly a marketing term for something that has been happening since the dawn of the PC era (if not earlier), only this time the new abstraction layers are happening when virtualization is the buzzword of the moment.
The one new abstraction layer I know of is something called a "Storage Router", which you'll only ever see on large Storage Area Networks. This device has several different storage arrays behind it, and presents those separate arrays as single array with multiple LUNs. The fancier ones can do interesting block-level abstractions like moving rarely used blocks to slower/cheaper storage and moving the highly used blocks to SSD layers, or handle realtime replication between storage arrays that normally wouldn't allow that kind of thing.
P.S.: RAID is not just device-level virtualization. I'm working with a storage array right now that takes slices of disks and assigns them to different RAID groups. It is working just fine (I'm doing it right now), and I have both RAID1 and RAID5 volumes on the same disk device. Lose two drives and the RAID5 volumes are toast, but the RAID1 volumes on the same disks are just fine.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f291175%2fquestions-about-storage-virtualization%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Might as well provide an answer. Refer to @soandos 's answer for more detailed answering of your specific questions.
LVM vs RAID
RAID, as many have mentioned, is a standard of technologies in which multiple disk drives are allocated together as an array of disks, providing varying level of performance and reliability benefits. For example, RAID 0 provides the best performance one can possibly get with the harddrives, and is extremely sensitive to disk loss (one loss = essentially total loss) whereas RAID 6 provide redundancy even when rebuilding array in a one drive loss scenario. RAID array are usually seen as one single drive to the OS.
One can say that RAID is a many to one mapping.
LVM, on the other hand, allows logical "disk drives" (block device to be accurate, but anyways) to be formed by parts of different disk drives. They exist in a "many-to-many mapping" manner. While one can use LVM to accomplish what can be accomplished by RAID, LVM is actually something that can accomplish much more. For example, to add another disk drive to a RAID array it would likely be necessary to rebuild the whole array from scratch. With LVM, it is just adding a disk drive to the machine, adding the disk drive mapping to a logical volume, and using it (the actual configuration is a little bit more complicated but certainly less than rebuilding a whole array).
add a comment |
Might as well provide an answer. Refer to @soandos 's answer for more detailed answering of your specific questions.
LVM vs RAID
RAID, as many have mentioned, is a standard of technologies in which multiple disk drives are allocated together as an array of disks, providing varying level of performance and reliability benefits. For example, RAID 0 provides the best performance one can possibly get with the harddrives, and is extremely sensitive to disk loss (one loss = essentially total loss) whereas RAID 6 provide redundancy even when rebuilding array in a one drive loss scenario. RAID array are usually seen as one single drive to the OS.
One can say that RAID is a many to one mapping.
LVM, on the other hand, allows logical "disk drives" (block device to be accurate, but anyways) to be formed by parts of different disk drives. They exist in a "many-to-many mapping" manner. While one can use LVM to accomplish what can be accomplished by RAID, LVM is actually something that can accomplish much more. For example, to add another disk drive to a RAID array it would likely be necessary to rebuild the whole array from scratch. With LVM, it is just adding a disk drive to the machine, adding the disk drive mapping to a logical volume, and using it (the actual configuration is a little bit more complicated but certainly less than rebuilding a whole array).
add a comment |
Might as well provide an answer. Refer to @soandos 's answer for more detailed answering of your specific questions.
LVM vs RAID
RAID, as many have mentioned, is a standard of technologies in which multiple disk drives are allocated together as an array of disks, providing varying level of performance and reliability benefits. For example, RAID 0 provides the best performance one can possibly get with the harddrives, and is extremely sensitive to disk loss (one loss = essentially total loss) whereas RAID 6 provide redundancy even when rebuilding array in a one drive loss scenario. RAID array are usually seen as one single drive to the OS.
One can say that RAID is a many to one mapping.
LVM, on the other hand, allows logical "disk drives" (block device to be accurate, but anyways) to be formed by parts of different disk drives. They exist in a "many-to-many mapping" manner. While one can use LVM to accomplish what can be accomplished by RAID, LVM is actually something that can accomplish much more. For example, to add another disk drive to a RAID array it would likely be necessary to rebuild the whole array from scratch. With LVM, it is just adding a disk drive to the machine, adding the disk drive mapping to a logical volume, and using it (the actual configuration is a little bit more complicated but certainly less than rebuilding a whole array).
Might as well provide an answer. Refer to @soandos 's answer for more detailed answering of your specific questions.
LVM vs RAID
RAID, as many have mentioned, is a standard of technologies in which multiple disk drives are allocated together as an array of disks, providing varying level of performance and reliability benefits. For example, RAID 0 provides the best performance one can possibly get with the harddrives, and is extremely sensitive to disk loss (one loss = essentially total loss) whereas RAID 6 provide redundancy even when rebuilding array in a one drive loss scenario. RAID array are usually seen as one single drive to the OS.
One can say that RAID is a many to one mapping.
LVM, on the other hand, allows logical "disk drives" (block device to be accurate, but anyways) to be formed by parts of different disk drives. They exist in a "many-to-many mapping" manner. While one can use LVM to accomplish what can be accomplished by RAID, LVM is actually something that can accomplish much more. For example, to add another disk drive to a RAID array it would likely be necessary to rebuild the whole array from scratch. With LVM, it is just adding a disk drive to the machine, adding the disk drive mapping to a logical volume, and using it (the actual configuration is a little bit more complicated but certainly less than rebuilding a whole array).
answered May 31 '11 at 20:07
bubububu
8,95622343
8,95622343
add a comment |
add a comment |
- RAID is a backup technology that insure that in the case of drive failure, all data remains intact, and LVM is Logical Volume Manager that can do many things.
- It's not.
- That seems like an odd question. RAID is a way of separating data across drives, so that if one fails no data is lost. LVM is a volume manager that can be used to change the way a user/OS looks at all the hard drives. They have nothing to do with each other (though LVM can implement RAID 1 and RAID 0, that is not its primary focus).
- The first means you don't have to know what physical device the data is on, and the second means that you can store for lack of a better word the links between files in a more abstract way.
- As stated above there is no "device level" or "partition level" to talk about so no, you can't refer to them as such.
Thanks! But I still can't see how RAID and LVM are fundamentally different, as both are realization of storage virtualization. Therefore still don't understand why they are used for different purposes?
– Tim
May 31 '11 at 19:36
RAID is not about virtualization. It stands for "Redundant Array of Independent Disks." The word virtualization does not even appear on the wikipedia article about it. Read this:en.wikipedia.org/wiki/RAID for a longer explaination.
– soandos
May 31 '11 at 19:40
Thanks! But I still don't understand your opinion. (1) "virtual disk" and "virtual device" appear in the article for RAID, and I think RAID is storage virtualization at the device level. In the article for storage device, RAID is mentioned in storage device-based block virtualization (en.wikipedia.org/wiki/…).
– Tim
May 31 '11 at 19:50
They might have something in common, but the goals are totally different. In RAID, you need a virtual disk because of data striping. Since the data that would generally be written to one disk is now written to more than one, you need a way to read the whole file as if it was contiguous. This is a side point to the idea of RAID though, and is not needed in RAID 1 for example.
– soandos
May 31 '11 at 19:50
(2) As to "RAID is not about virtualization", I wonder if RAID makes several physical storage devices look like and used as a whole virtual/logical storage device? Quoted from the Raid article "this is achieved by combining multiple disk drive components into a logical unit".
– Tim
May 31 '11 at 19:51
|
show 3 more comments
- RAID is a backup technology that insure that in the case of drive failure, all data remains intact, and LVM is Logical Volume Manager that can do many things.
- It's not.
- That seems like an odd question. RAID is a way of separating data across drives, so that if one fails no data is lost. LVM is a volume manager that can be used to change the way a user/OS looks at all the hard drives. They have nothing to do with each other (though LVM can implement RAID 1 and RAID 0, that is not its primary focus).
- The first means you don't have to know what physical device the data is on, and the second means that you can store for lack of a better word the links between files in a more abstract way.
- As stated above there is no "device level" or "partition level" to talk about so no, you can't refer to them as such.
Thanks! But I still can't see how RAID and LVM are fundamentally different, as both are realization of storage virtualization. Therefore still don't understand why they are used for different purposes?
– Tim
May 31 '11 at 19:36
RAID is not about virtualization. It stands for "Redundant Array of Independent Disks." The word virtualization does not even appear on the wikipedia article about it. Read this:en.wikipedia.org/wiki/RAID for a longer explaination.
– soandos
May 31 '11 at 19:40
Thanks! But I still don't understand your opinion. (1) "virtual disk" and "virtual device" appear in the article for RAID, and I think RAID is storage virtualization at the device level. In the article for storage device, RAID is mentioned in storage device-based block virtualization (en.wikipedia.org/wiki/…).
– Tim
May 31 '11 at 19:50
They might have something in common, but the goals are totally different. In RAID, you need a virtual disk because of data striping. Since the data that would generally be written to one disk is now written to more than one, you need a way to read the whole file as if it was contiguous. This is a side point to the idea of RAID though, and is not needed in RAID 1 for example.
– soandos
May 31 '11 at 19:50
(2) As to "RAID is not about virtualization", I wonder if RAID makes several physical storage devices look like and used as a whole virtual/logical storage device? Quoted from the Raid article "this is achieved by combining multiple disk drive components into a logical unit".
– Tim
May 31 '11 at 19:51
|
show 3 more comments
- RAID is a backup technology that insure that in the case of drive failure, all data remains intact, and LVM is Logical Volume Manager that can do many things.
- It's not.
- That seems like an odd question. RAID is a way of separating data across drives, so that if one fails no data is lost. LVM is a volume manager that can be used to change the way a user/OS looks at all the hard drives. They have nothing to do with each other (though LVM can implement RAID 1 and RAID 0, that is not its primary focus).
- The first means you don't have to know what physical device the data is on, and the second means that you can store for lack of a better word the links between files in a more abstract way.
- As stated above there is no "device level" or "partition level" to talk about so no, you can't refer to them as such.
- RAID is a backup technology that insure that in the case of drive failure, all data remains intact, and LVM is Logical Volume Manager that can do many things.
- It's not.
- That seems like an odd question. RAID is a way of separating data across drives, so that if one fails no data is lost. LVM is a volume manager that can be used to change the way a user/OS looks at all the hard drives. They have nothing to do with each other (though LVM can implement RAID 1 and RAID 0, that is not its primary focus).
- The first means you don't have to know what physical device the data is on, and the second means that you can store for lack of a better word the links between files in a more abstract way.
- As stated above there is no "device level" or "partition level" to talk about so no, you can't refer to them as such.
edited Feb 7 at 11:28
karel
9,27093139
9,27093139
answered May 31 '11 at 19:31
soandossoandos
20.3k2892131
20.3k2892131
Thanks! But I still can't see how RAID and LVM are fundamentally different, as both are realization of storage virtualization. Therefore still don't understand why they are used for different purposes?
– Tim
May 31 '11 at 19:36
RAID is not about virtualization. It stands for "Redundant Array of Independent Disks." The word virtualization does not even appear on the wikipedia article about it. Read this:en.wikipedia.org/wiki/RAID for a longer explaination.
– soandos
May 31 '11 at 19:40
Thanks! But I still don't understand your opinion. (1) "virtual disk" and "virtual device" appear in the article for RAID, and I think RAID is storage virtualization at the device level. In the article for storage device, RAID is mentioned in storage device-based block virtualization (en.wikipedia.org/wiki/…).
– Tim
May 31 '11 at 19:50
They might have something in common, but the goals are totally different. In RAID, you need a virtual disk because of data striping. Since the data that would generally be written to one disk is now written to more than one, you need a way to read the whole file as if it was contiguous. This is a side point to the idea of RAID though, and is not needed in RAID 1 for example.
– soandos
May 31 '11 at 19:50
(2) As to "RAID is not about virtualization", I wonder if RAID makes several physical storage devices look like and used as a whole virtual/logical storage device? Quoted from the Raid article "this is achieved by combining multiple disk drive components into a logical unit".
– Tim
May 31 '11 at 19:51
|
show 3 more comments
Thanks! But I still can't see how RAID and LVM are fundamentally different, as both are realization of storage virtualization. Therefore still don't understand why they are used for different purposes?
– Tim
May 31 '11 at 19:36
RAID is not about virtualization. It stands for "Redundant Array of Independent Disks." The word virtualization does not even appear on the wikipedia article about it. Read this:en.wikipedia.org/wiki/RAID for a longer explaination.
– soandos
May 31 '11 at 19:40
Thanks! But I still don't understand your opinion. (1) "virtual disk" and "virtual device" appear in the article for RAID, and I think RAID is storage virtualization at the device level. In the article for storage device, RAID is mentioned in storage device-based block virtualization (en.wikipedia.org/wiki/…).
– Tim
May 31 '11 at 19:50
They might have something in common, but the goals are totally different. In RAID, you need a virtual disk because of data striping. Since the data that would generally be written to one disk is now written to more than one, you need a way to read the whole file as if it was contiguous. This is a side point to the idea of RAID though, and is not needed in RAID 1 for example.
– soandos
May 31 '11 at 19:50
(2) As to "RAID is not about virtualization", I wonder if RAID makes several physical storage devices look like and used as a whole virtual/logical storage device? Quoted from the Raid article "this is achieved by combining multiple disk drive components into a logical unit".
– Tim
May 31 '11 at 19:51
Thanks! But I still can't see how RAID and LVM are fundamentally different, as both are realization of storage virtualization. Therefore still don't understand why they are used for different purposes?
– Tim
May 31 '11 at 19:36
Thanks! But I still can't see how RAID and LVM are fundamentally different, as both are realization of storage virtualization. Therefore still don't understand why they are used for different purposes?
– Tim
May 31 '11 at 19:36
RAID is not about virtualization. It stands for "Redundant Array of Independent Disks." The word virtualization does not even appear on the wikipedia article about it. Read this:en.wikipedia.org/wiki/RAID for a longer explaination.
– soandos
May 31 '11 at 19:40
RAID is not about virtualization. It stands for "Redundant Array of Independent Disks." The word virtualization does not even appear on the wikipedia article about it. Read this:en.wikipedia.org/wiki/RAID for a longer explaination.
– soandos
May 31 '11 at 19:40
Thanks! But I still don't understand your opinion. (1) "virtual disk" and "virtual device" appear in the article for RAID, and I think RAID is storage virtualization at the device level. In the article for storage device, RAID is mentioned in storage device-based block virtualization (en.wikipedia.org/wiki/…).
– Tim
May 31 '11 at 19:50
Thanks! But I still don't understand your opinion. (1) "virtual disk" and "virtual device" appear in the article for RAID, and I think RAID is storage virtualization at the device level. In the article for storage device, RAID is mentioned in storage device-based block virtualization (en.wikipedia.org/wiki/…).
– Tim
May 31 '11 at 19:50
They might have something in common, but the goals are totally different. In RAID, you need a virtual disk because of data striping. Since the data that would generally be written to one disk is now written to more than one, you need a way to read the whole file as if it was contiguous. This is a side point to the idea of RAID though, and is not needed in RAID 1 for example.
– soandos
May 31 '11 at 19:50
They might have something in common, but the goals are totally different. In RAID, you need a virtual disk because of data striping. Since the data that would generally be written to one disk is now written to more than one, you need a way to read the whole file as if it was contiguous. This is a side point to the idea of RAID though, and is not needed in RAID 1 for example.
– soandos
May 31 '11 at 19:50
(2) As to "RAID is not about virtualization", I wonder if RAID makes several physical storage devices look like and used as a whole virtual/logical storage device? Quoted from the Raid article "this is achieved by combining multiple disk drive components into a logical unit".
– Tim
May 31 '11 at 19:51
(2) As to "RAID is not about virtualization", I wonder if RAID makes several physical storage devices look like and used as a whole virtual/logical storage device? Quoted from the Raid article "this is achieved by combining multiple disk drive components into a logical unit".
– Tim
May 31 '11 at 19:51
|
show 3 more comments
The article is not well written.
One of the biggest problems is that there are multiple layers of abstraction built into the full stack of storage, and "virtualization" is a fuzzy enough word as to be hard to definitively assign a place to put it. For a good look at the many layers of abstraction in storage, I'll point you at a blog piece I did last year (read it here for the gory details).
In marketing-speak, "Storage Virtualization" is just introducing abstraction where there previously hasn't been any. That can happen at many points depending on the market segment. But that's just marketing. Time for technical.
The storage stack (somewhat simplified):
- Disk
- RAID controller
- Software RAID
- Volume manager
- Filesystem
- Network filesystem
- Network filesystem client
Disk, even old school spinning magnetic disks, do a level of virtualization. They present a logical view of the actual blocks on the platters (or storage cells for an SSD), and this has been this way since the mid 80's or so. Magnetic drives reserve a certain number of blocks for reassigning blocks that go bad, and the logical view is how this is abstracted away from the disk controller. Technologies like SMART can catch this in the act and report that the drive is "pre-fail" so you can plan your transition accordingly. This has been in place in some form since the 80's.
RAID cards provide another abstraction layer, hiding the true shape of storage from an operating system. This has been in place since the first RAID cards came out in the late 80's, and they've only gotten more complex since then. Cards with write-caches on them provide still another abstraction layer, as writes can be reported as committed before they're actually on a disk somewhere. The really fancy ones (such as those in Storage Area Network arrays) can even write to two separate disk arrays for realtime replication, and the OS is none the wiser.
Once you get into the operating system things get a lot more murky, as each does their own thing. Software RAID (md in Linux) is typically implemented as a low level storage driver that presents the logically combined storage to higher storage layers. As with the RAID cards, you can do all sorts of interesting things here. Some of the "Storage Virtualization" products you see out there are implemented at this stage.
Going higher you get to the volume managers (LVM) can provide for some seriously complex configurations. Where the next layer down aggregates disks into a single virtual volume, the volume managers can combine multiple volumes into a single bigger volume... or split a pool of volumes into an arbitrary number of volumes. Again, some of the Storage Virtualization products you see have a presence in this layer as well.
The next step up is the filesystem. This is the layer where the well known abstractions of "file" and "directory" come into existence. Some filesystems (btrfs, zfs) have volume-manager like features built into them which allows things like snapshotting, deduplication, replication to other devices, and even migration of files between storage tiers. That last bit is not in many filesystems yet, but is definitely a target for Storage Virtualization vendors.
The next step up is the network filesystem. This is things like Samba/CIFS, NetATalk/Appletalk, NFS, and others. If written the right way, these network filesystems can further abstract storage. One product I'm thinking of, Novell's Open Enterprise Server and their ShadowVolumes, takes multiple volumes on different storage (presumably differing speeds/cost) and presents them as a single volume to the network user, and then migrates files between the volumes based on usage statistics. Some of the "Storage Virtualization" appliances you can buy actually do their heavy lifting at this layer.
The last stop on our trip up the storage stack is the network filesystem client in the client machine. It is at this level that the Distributed File System (DFS) exists, which allows a single logical presentation of a filesystem to exist on multiple network filesystems. The client knows that this is a DFS share, and that specific object is a DFS link, and when following it present the specified network-share as a sub-directory of the parent directory. There have been other examples of abstraction at this level, but DFS is perhaps the most common.
One thing to keep in mind is that through all of this, each layer of the storage stack is independent of those above it. Many layers are already doing block-level abstraction, so adding one more doesn't change a whole lot. File-level abstraction has to happen near the top of the stack (for that's where the file-systems live) and the impacts lower down are highly decoupled to the point that it may not even be noticed.
At it's core "Storage Virtualization" is still mostly a marketing term for something that has been happening since the dawn of the PC era (if not earlier), only this time the new abstraction layers are happening when virtualization is the buzzword of the moment.
The one new abstraction layer I know of is something called a "Storage Router", which you'll only ever see on large Storage Area Networks. This device has several different storage arrays behind it, and presents those separate arrays as single array with multiple LUNs. The fancier ones can do interesting block-level abstractions like moving rarely used blocks to slower/cheaper storage and moving the highly used blocks to SSD layers, or handle realtime replication between storage arrays that normally wouldn't allow that kind of thing.
P.S.: RAID is not just device-level virtualization. I'm working with a storage array right now that takes slices of disks and assigns them to different RAID groups. It is working just fine (I'm doing it right now), and I have both RAID1 and RAID5 volumes on the same disk device. Lose two drives and the RAID5 volumes are toast, but the RAID1 volumes on the same disks are just fine.
add a comment |
The article is not well written.
One of the biggest problems is that there are multiple layers of abstraction built into the full stack of storage, and "virtualization" is a fuzzy enough word as to be hard to definitively assign a place to put it. For a good look at the many layers of abstraction in storage, I'll point you at a blog piece I did last year (read it here for the gory details).
In marketing-speak, "Storage Virtualization" is just introducing abstraction where there previously hasn't been any. That can happen at many points depending on the market segment. But that's just marketing. Time for technical.
The storage stack (somewhat simplified):
- Disk
- RAID controller
- Software RAID
- Volume manager
- Filesystem
- Network filesystem
- Network filesystem client
Disk, even old school spinning magnetic disks, do a level of virtualization. They present a logical view of the actual blocks on the platters (or storage cells for an SSD), and this has been this way since the mid 80's or so. Magnetic drives reserve a certain number of blocks for reassigning blocks that go bad, and the logical view is how this is abstracted away from the disk controller. Technologies like SMART can catch this in the act and report that the drive is "pre-fail" so you can plan your transition accordingly. This has been in place in some form since the 80's.
RAID cards provide another abstraction layer, hiding the true shape of storage from an operating system. This has been in place since the first RAID cards came out in the late 80's, and they've only gotten more complex since then. Cards with write-caches on them provide still another abstraction layer, as writes can be reported as committed before they're actually on a disk somewhere. The really fancy ones (such as those in Storage Area Network arrays) can even write to two separate disk arrays for realtime replication, and the OS is none the wiser.
Once you get into the operating system things get a lot more murky, as each does their own thing. Software RAID (md in Linux) is typically implemented as a low level storage driver that presents the logically combined storage to higher storage layers. As with the RAID cards, you can do all sorts of interesting things here. Some of the "Storage Virtualization" products you see out there are implemented at this stage.
Going higher you get to the volume managers (LVM) can provide for some seriously complex configurations. Where the next layer down aggregates disks into a single virtual volume, the volume managers can combine multiple volumes into a single bigger volume... or split a pool of volumes into an arbitrary number of volumes. Again, some of the Storage Virtualization products you see have a presence in this layer as well.
The next step up is the filesystem. This is the layer where the well known abstractions of "file" and "directory" come into existence. Some filesystems (btrfs, zfs) have volume-manager like features built into them which allows things like snapshotting, deduplication, replication to other devices, and even migration of files between storage tiers. That last bit is not in many filesystems yet, but is definitely a target for Storage Virtualization vendors.
The next step up is the network filesystem. This is things like Samba/CIFS, NetATalk/Appletalk, NFS, and others. If written the right way, these network filesystems can further abstract storage. One product I'm thinking of, Novell's Open Enterprise Server and their ShadowVolumes, takes multiple volumes on different storage (presumably differing speeds/cost) and presents them as a single volume to the network user, and then migrates files between the volumes based on usage statistics. Some of the "Storage Virtualization" appliances you can buy actually do their heavy lifting at this layer.
The last stop on our trip up the storage stack is the network filesystem client in the client machine. It is at this level that the Distributed File System (DFS) exists, which allows a single logical presentation of a filesystem to exist on multiple network filesystems. The client knows that this is a DFS share, and that specific object is a DFS link, and when following it present the specified network-share as a sub-directory of the parent directory. There have been other examples of abstraction at this level, but DFS is perhaps the most common.
One thing to keep in mind is that through all of this, each layer of the storage stack is independent of those above it. Many layers are already doing block-level abstraction, so adding one more doesn't change a whole lot. File-level abstraction has to happen near the top of the stack (for that's where the file-systems live) and the impacts lower down are highly decoupled to the point that it may not even be noticed.
At it's core "Storage Virtualization" is still mostly a marketing term for something that has been happening since the dawn of the PC era (if not earlier), only this time the new abstraction layers are happening when virtualization is the buzzword of the moment.
The one new abstraction layer I know of is something called a "Storage Router", which you'll only ever see on large Storage Area Networks. This device has several different storage arrays behind it, and presents those separate arrays as single array with multiple LUNs. The fancier ones can do interesting block-level abstractions like moving rarely used blocks to slower/cheaper storage and moving the highly used blocks to SSD layers, or handle realtime replication between storage arrays that normally wouldn't allow that kind of thing.
P.S.: RAID is not just device-level virtualization. I'm working with a storage array right now that takes slices of disks and assigns them to different RAID groups. It is working just fine (I'm doing it right now), and I have both RAID1 and RAID5 volumes on the same disk device. Lose two drives and the RAID5 volumes are toast, but the RAID1 volumes on the same disks are just fine.
add a comment |
The article is not well written.
One of the biggest problems is that there are multiple layers of abstraction built into the full stack of storage, and "virtualization" is a fuzzy enough word as to be hard to definitively assign a place to put it. For a good look at the many layers of abstraction in storage, I'll point you at a blog piece I did last year (read it here for the gory details).
In marketing-speak, "Storage Virtualization" is just introducing abstraction where there previously hasn't been any. That can happen at many points depending on the market segment. But that's just marketing. Time for technical.
The storage stack (somewhat simplified):
- Disk
- RAID controller
- Software RAID
- Volume manager
- Filesystem
- Network filesystem
- Network filesystem client
Disk, even old school spinning magnetic disks, do a level of virtualization. They present a logical view of the actual blocks on the platters (or storage cells for an SSD), and this has been this way since the mid 80's or so. Magnetic drives reserve a certain number of blocks for reassigning blocks that go bad, and the logical view is how this is abstracted away from the disk controller. Technologies like SMART can catch this in the act and report that the drive is "pre-fail" so you can plan your transition accordingly. This has been in place in some form since the 80's.
RAID cards provide another abstraction layer, hiding the true shape of storage from an operating system. This has been in place since the first RAID cards came out in the late 80's, and they've only gotten more complex since then. Cards with write-caches on them provide still another abstraction layer, as writes can be reported as committed before they're actually on a disk somewhere. The really fancy ones (such as those in Storage Area Network arrays) can even write to two separate disk arrays for realtime replication, and the OS is none the wiser.
Once you get into the operating system things get a lot more murky, as each does their own thing. Software RAID (md in Linux) is typically implemented as a low level storage driver that presents the logically combined storage to higher storage layers. As with the RAID cards, you can do all sorts of interesting things here. Some of the "Storage Virtualization" products you see out there are implemented at this stage.
Going higher you get to the volume managers (LVM) can provide for some seriously complex configurations. Where the next layer down aggregates disks into a single virtual volume, the volume managers can combine multiple volumes into a single bigger volume... or split a pool of volumes into an arbitrary number of volumes. Again, some of the Storage Virtualization products you see have a presence in this layer as well.
The next step up is the filesystem. This is the layer where the well known abstractions of "file" and "directory" come into existence. Some filesystems (btrfs, zfs) have volume-manager like features built into them which allows things like snapshotting, deduplication, replication to other devices, and even migration of files between storage tiers. That last bit is not in many filesystems yet, but is definitely a target for Storage Virtualization vendors.
The next step up is the network filesystem. This is things like Samba/CIFS, NetATalk/Appletalk, NFS, and others. If written the right way, these network filesystems can further abstract storage. One product I'm thinking of, Novell's Open Enterprise Server and their ShadowVolumes, takes multiple volumes on different storage (presumably differing speeds/cost) and presents them as a single volume to the network user, and then migrates files between the volumes based on usage statistics. Some of the "Storage Virtualization" appliances you can buy actually do their heavy lifting at this layer.
The last stop on our trip up the storage stack is the network filesystem client in the client machine. It is at this level that the Distributed File System (DFS) exists, which allows a single logical presentation of a filesystem to exist on multiple network filesystems. The client knows that this is a DFS share, and that specific object is a DFS link, and when following it present the specified network-share as a sub-directory of the parent directory. There have been other examples of abstraction at this level, but DFS is perhaps the most common.
One thing to keep in mind is that through all of this, each layer of the storage stack is independent of those above it. Many layers are already doing block-level abstraction, so adding one more doesn't change a whole lot. File-level abstraction has to happen near the top of the stack (for that's where the file-systems live) and the impacts lower down are highly decoupled to the point that it may not even be noticed.
At it's core "Storage Virtualization" is still mostly a marketing term for something that has been happening since the dawn of the PC era (if not earlier), only this time the new abstraction layers are happening when virtualization is the buzzword of the moment.
The one new abstraction layer I know of is something called a "Storage Router", which you'll only ever see on large Storage Area Networks. This device has several different storage arrays behind it, and presents those separate arrays as single array with multiple LUNs. The fancier ones can do interesting block-level abstractions like moving rarely used blocks to slower/cheaper storage and moving the highly used blocks to SSD layers, or handle realtime replication between storage arrays that normally wouldn't allow that kind of thing.
P.S.: RAID is not just device-level virtualization. I'm working with a storage array right now that takes slices of disks and assigns them to different RAID groups. It is working just fine (I'm doing it right now), and I have both RAID1 and RAID5 volumes on the same disk device. Lose two drives and the RAID5 volumes are toast, but the RAID1 volumes on the same disks are just fine.
The article is not well written.
One of the biggest problems is that there are multiple layers of abstraction built into the full stack of storage, and "virtualization" is a fuzzy enough word as to be hard to definitively assign a place to put it. For a good look at the many layers of abstraction in storage, I'll point you at a blog piece I did last year (read it here for the gory details).
In marketing-speak, "Storage Virtualization" is just introducing abstraction where there previously hasn't been any. That can happen at many points depending on the market segment. But that's just marketing. Time for technical.
The storage stack (somewhat simplified):
- Disk
- RAID controller
- Software RAID
- Volume manager
- Filesystem
- Network filesystem
- Network filesystem client
Disk, even old school spinning magnetic disks, do a level of virtualization. They present a logical view of the actual blocks on the platters (or storage cells for an SSD), and this has been this way since the mid 80's or so. Magnetic drives reserve a certain number of blocks for reassigning blocks that go bad, and the logical view is how this is abstracted away from the disk controller. Technologies like SMART can catch this in the act and report that the drive is "pre-fail" so you can plan your transition accordingly. This has been in place in some form since the 80's.
RAID cards provide another abstraction layer, hiding the true shape of storage from an operating system. This has been in place since the first RAID cards came out in the late 80's, and they've only gotten more complex since then. Cards with write-caches on them provide still another abstraction layer, as writes can be reported as committed before they're actually on a disk somewhere. The really fancy ones (such as those in Storage Area Network arrays) can even write to two separate disk arrays for realtime replication, and the OS is none the wiser.
Once you get into the operating system things get a lot more murky, as each does their own thing. Software RAID (md in Linux) is typically implemented as a low level storage driver that presents the logically combined storage to higher storage layers. As with the RAID cards, you can do all sorts of interesting things here. Some of the "Storage Virtualization" products you see out there are implemented at this stage.
Going higher you get to the volume managers (LVM) can provide for some seriously complex configurations. Where the next layer down aggregates disks into a single virtual volume, the volume managers can combine multiple volumes into a single bigger volume... or split a pool of volumes into an arbitrary number of volumes. Again, some of the Storage Virtualization products you see have a presence in this layer as well.
The next step up is the filesystem. This is the layer where the well known abstractions of "file" and "directory" come into existence. Some filesystems (btrfs, zfs) have volume-manager like features built into them which allows things like snapshotting, deduplication, replication to other devices, and even migration of files between storage tiers. That last bit is not in many filesystems yet, but is definitely a target for Storage Virtualization vendors.
The next step up is the network filesystem. This is things like Samba/CIFS, NetATalk/Appletalk, NFS, and others. If written the right way, these network filesystems can further abstract storage. One product I'm thinking of, Novell's Open Enterprise Server and their ShadowVolumes, takes multiple volumes on different storage (presumably differing speeds/cost) and presents them as a single volume to the network user, and then migrates files between the volumes based on usage statistics. Some of the "Storage Virtualization" appliances you can buy actually do their heavy lifting at this layer.
The last stop on our trip up the storage stack is the network filesystem client in the client machine. It is at this level that the Distributed File System (DFS) exists, which allows a single logical presentation of a filesystem to exist on multiple network filesystems. The client knows that this is a DFS share, and that specific object is a DFS link, and when following it present the specified network-share as a sub-directory of the parent directory. There have been other examples of abstraction at this level, but DFS is perhaps the most common.
One thing to keep in mind is that through all of this, each layer of the storage stack is independent of those above it. Many layers are already doing block-level abstraction, so adding one more doesn't change a whole lot. File-level abstraction has to happen near the top of the stack (for that's where the file-systems live) and the impacts lower down are highly decoupled to the point that it may not even be noticed.
At it's core "Storage Virtualization" is still mostly a marketing term for something that has been happening since the dawn of the PC era (if not earlier), only this time the new abstraction layers are happening when virtualization is the buzzword of the moment.
The one new abstraction layer I know of is something called a "Storage Router", which you'll only ever see on large Storage Area Networks. This device has several different storage arrays behind it, and presents those separate arrays as single array with multiple LUNs. The fancier ones can do interesting block-level abstractions like moving rarely used blocks to slower/cheaper storage and moving the highly used blocks to SSD layers, or handle realtime replication between storage arrays that normally wouldn't allow that kind of thing.
P.S.: RAID is not just device-level virtualization. I'm working with a storage array right now that takes slices of disks and assigns them to different RAID groups. It is working just fine (I'm doing it right now), and I have both RAID1 and RAID5 volumes on the same disk device. Lose two drives and the RAID5 volumes are toast, but the RAID1 volumes on the same disks are just fine.
edited Jun 3 '11 at 12:02
answered Jun 3 '11 at 4:20
SysAdmin1138SysAdmin1138
5,1591722
5,1591722
add a comment |
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f291175%2fquestions-about-storage-virtualization%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
i don't think you are being very accurate in describing RAID and LVM as "disk" or "partition" level virtualization. Storage Virtualization refers to the abstraction of multiple, commonly network-linked equipment that are centrally managed and allow access to the system as a whole rather than a per-server basis. RAID/LVM has little to do with Storage virtualization per se, although (of course) they are commonly used in SAN clusters.
– bubu
May 31 '11 at 19:17
Thanks! But I don't understand "RAID/LVM has little to do with Storage virtualization per se". From the Wikipedia articles for storage virtualization (en.wikipedia.org/wiki/Storage_virtualization), LVM (en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)) and RAID (en.wikipedia.org/wiki/RAID), both RAID and LVM are methods of storage virtualization and storage virtualization is not just about the case of network-linked storage devices.
– Tim
May 31 '11 at 19:40
1
the wikipedia article on storage virtualization is mediocre at best.
– bubu
May 31 '11 at 19:51
Then any references worth recommendation?
– Tim
May 31 '11 at 19:53
have a look: www-03.ibm.com/systems/resources/…
– bubu
May 31 '11 at 20:08