海派开源潮流社区

一起参与开源.共同打造海派开源潮流社区(Kernel/Fedora/CentOS/Perl/Drupal)

Using ZFS though FUSE

Using ZFS though FUSE
FUSE下使用ZFS文件系统
By: Ben Martin

ZFS is an advanced filesystem created by Sun Microsystems but not supported in the Linux kernel. The ZFS_on_FUSE project allows you to use ZFS through the Linux kernel as a FUSE filesystem. This means that a ZFS filesystem will be accessible just like any other filesystem the Linux kernel lets you use.

ZFS是由Sun研发的一种高级的文件系统。但是Linux内核默认不支持这种ZFS文件系统。"ZFS on FUSE"项目允许用户在Linux内核上以FUSE文件系统的形式使用ZFS。这意味着用户可以使ZFS文件系统就像其他Linux内核支持的文件系统一样能够随心所欲地使用。

Apart from any technical or funding issues, a major reason that ZFS support has not been integrated into the Linux kernel is that Sun has released it under its Common Development and Distribution License, which is incompatible with the GPL used by the kernel. There are also patent issues with ZFS. However, the source code for ZFS is available, and running ZFS through FUSE does not violate any licenses, because you are not linking CDDL and GPL code together. You're on your own as far as patents go.

除了技术和资金上的问题,Linux内核另一个未整合ZFS的原因是Sun公司是以《通用开发和发布许可》(Common Development and Distribution License)的版权声明发布的。而这个版权声明恰恰和内核所使用的版权声明《GNU通用公共许可证》(GPL)并不兼容。ZFS的专利问题也同样困扰着这样的局面。不过值得欣慰的是,人们仍旧可以获得ZFS的源代码,通过FUSE使用ZFS而不必违反任何版权声明。这种行为并不算是钻CDDL和GPL协议的空子。

The idea of running what is normally an in-kernel filesystem through FUSE will make some in-kernel filesystem developers grumble about inefficiency. When an application makes a call into the kernel, a context switch must be performed. The x86 architecture is not particularly fast at performing context switches. Because a FUSE filesystem runs out of the kernel, the kernel must at times perform a context switch to the FUSE filesystem. This means that overall there are more context switches required to run a filesystem through FUSE than in-kernel. However, accessing information that is stored on disk is so much slower than performing a context switch that performing two instead of one context switch is likely to have minimal impact, if any, on benchmarks. It has been reported that NTFS running through FUSE has results comparable to those of a native Linux filesystem.

通过FUSE运行一个Kernel内核本身支持的文件系统,Kernel内核开发者质疑这种想法带来的效率问题。当一个软件调用内核进行操作时,数据的上下文(context)必须进行切换。x86架构在进行上下文切换的性能表现得并不好。因为FUSE文件系统是独立于Kernel内核之外运行,所以内核非常频繁地地会切换到FUSE文件系统。这就意味着,总体上通过FUSE使用其他文件系统在上下文的切换次数上将多于Kernel内核文件系统。但是,相对于读取磁盘上的数据带来的延迟来说,文件系统的上下文切换带来的延迟显得有些微不足道。据有关资料,通过FUSE使用NTFS性能非常接近于在FUSE上使用那些Linux内核支持的文件系统。

Installation
安装

No packages for zfs-fuse exist for Ubuntu, openSUSE, or Fedora. As of writing, the latest release of zfs-fuse, 0.4.0 beta, is from March 2007. Looking at the source repository for the 0.4.x version of zfs-fuse, it appears the developers have made many desirable additions since then -- for example, the ability to compile using recent versions of gcc, which were not available in the March 2007 release. I used the 0.4.x version from the source repository instead of the latest released tarball and performed benchmarking on a 64-bit Fedora 8 machine.

Ubuntu,OpenSuse和Fedora都没有对应的zfs-fuse软件包。在写此文时,zfs-fuse最新发布是在2007年三月的0.4.0 beta。从0.4.X版本以后zfs-fuse的源代码来看,开发者自从2007年三月以来做了许多令人激动的新特性。比如,可以使用最近几个版本的Gcc(这个在2007年3月份的版本中却不支持一些新版本的Gcc)。我使用 0.4.x的源代码版本,而不是最新发布的Tarball安装包。测试环境是Fedora8 64位环境。

The source repository uses the Mercurial revision control system, which is itself available in the main Hardy and Fedora 9 repositories. To compile zfs-fuse you will need SCons and the development package for libaio. Both of these are packaged for Hardy (libaio-dev, scons), openSUSE 10.3 1-Click installs (libaio-devel, scons), and in the Fedora 9 repository. The installation step places five executables into /usr/local/sbin.

源代码版本是使用Mercurial的版本控制系统,该Mercurial可以在Ubuntu的Hardy和Fedora9 的软件仓库中找到。为了编译zfs-fuse,必须使用SCons和libaio的开发包。这些都可以在Ubuntu的Hardy(libaio-dev,scons),openSUSE一键安装,以及Fedora 9 的软件仓库中获得。安装过程会在/usr/local/sbin 下放置5个执行包。

$ hg clone http://www.wizy.org/mercurial/zfs-fuse/0.4.x
$ cd 0.4.x/src
$ scons
$ sudo scons install
$ sudo zfs-fuse

Once the zfs-fuse daemon is started you use the zpool and zfs commands to set up your zfs filesystems. If you have not used ZFS before, you might like to read the OpenSolaris intro or the more serious documentation for it.

一旦你运行zpool和ZFS命令后这些zfs-fuse进程便会启动并建立zfs文件系统。如果你没有使用过ZFS,你可以需要阅读OpenSolaris intro或者更多它的相关文档。

Performance
性能表现

I tested performance inside a VMWare server virtual machine. I created a new virtual disk, preallocating 8GB of space for the disk. The use of virtualization would likely affect the overall benchmark, but the relative performance of ZFS vs. the in-kernel filesystem should still be indicative of the performance you might expect from ZFS running through FUSE. As the in-kernel Linux filesystem I used XFS because it performs well on large files such as the Bonnie++ benchmark I used.

我在VMWare 服务器虚拟机中进行测试。我创建了一个虚拟磁盘,给它预留了8GB的空间。在虚拟机的环境下测试难免会影响这次评测,但是ZFS和Kernel内核文件系统应该可以相对地呈现各自优势和劣势。在选用Linux内核文件系统方面,我就采用XFS作为对比。因为它在基准测试Bonnie++中处理大文件方面表现得很好。

The design of ZFS is a little different from that of most Linux filesystems. Given one or more partition, you set up a ZFS "pool," and then create as many filesystems as you like inside that pool. For the benchmark I created a pool for a single partition on the 8GB virtual disk and create two ZFS filesystems on that pool. To benchmark XFS I created an XFS filesystem directly on the partition that ZFS was using, wiping out the ZFS data in the process.

ZFS的设计和其他Linux文件系统有所不同。用户在一个或者多个分区建立一个ZFS“存储池”,然后在“存储池”当中任意建立各种文件系统。在这次基准测试中,我建立了一个8GB的存储池,在上面创建了两个ZFS文件系统。为了对比XFS的性能表现,我在测试ZFS结束后,直接在ZFS使用的分区上清空数据后重新建立XFS系统。

Shown below is the setup and benchmarking of ZFS. First I use fdisk to create a new partition for the whole disk. I use the zool create command to create new pools, associating physical disks with the pool. The -n option informs you of what would have been done but doesn't actually make the pool. I include its output here to make things easier to follow. Once I create the tank/testfs ZFS filesystem with the zfs command, I have a new filesystem that I can access through the Linux kernel at /tank/testfs, as shown using the standard df command. I then ran the Bonnie benchmark multiple times to make sure that the figures were not taken from a first run that was disadvantaged in any manner.

正如下面所展示ZFS的基准测试记录。首先我用fdisk在整个磁盘中建立一个新的分区。我用zool命令在物理硬盘上划出一部分区域作为新的存储池。-n 选项会提示这个命令将会产什么结果,但是并不真正的建立存储池操作。我把它的输出放在下面,以便于读者参考。一旦我用zfs命令建立了 tank/testfs ZFS文件系统,我就可以通过Linux内核访问 /tank/testfs,正如使用标准的df命令所示。之后,我运行了几次Bonnie基准测试来确保数据不受到首次运行的影响。

# fdisk /dev/sdd
...
Disk /dev/sdd: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
...
/dev/sdd1 1 1044 8385898+ 83 Linux
...
# zfs-fuse
# zpool create -n tank /dev/sdd1
would create 'tank' with the following layout:
tank
sdd1

# zpool create tank /dev/sdd1
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
tank 7.94G 92.5K 7.94G 0% ONLINE -
# zfs create tank/testfs
# df -h /tank/testfs/
Filesystem Size Used Avail Use% Mounted on
tank/testfs 7.9G 18K 7.9G 1% /tank/testfs
$ cd /tank/testfs
$ /usr/sbin/bonnie++ -d `pwd`
...
$ /usr/sbin/bonnie++ -d `pwd`
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
linuxcomf8 4G 12373 24 14707 11 10604 8 33935 50 36985 3 109.0 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 2272 17 3657 20 2754 18 2534 15 3736 20 3061 20
linuxcomf8,4G,12373,24,14707,11,10604,8,33935,50,36985,3,109.0,0,16,2272,17,3657,20,2754,18,2534,15,3736,20,3061,20

The commands below show how the Bonnie benchmark was performed on the XFS filesystem. Once again, I ran the benchmarks multiple times.

以下命令显示Bonnie基准测试在XFS文件系统下的表现情况,同样,我运行了多次Bonnie基准测试。

# mkfs.xfs /dev/sdd1
meta-data=/dev/sdd1 isize=256 agcount=8, agsize=262059 blks
= sectsz=512 attr=0
data = bsize=4096 blocks=2096472, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal log bsize=4096 blocks=2560, version=1
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
# mkdir /raw
# mount /dev/sdd1 /raw

$ cd /raw
$ /usr/sbin/bonnie++ -d `pwd`
...
$ /usr/sbin/bonnie++ -d `pwd`
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
linuxcomf8 4G 38681 65 34840 6 16528 6 18312 40 18585 5 365.8 2
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 1250 26 +++++ +++ 3032 39 2883 69 +++++ +++ 3143 59
linuxcomf8,4G,38681,65,34840,6,16528,6,18312,40,18585,5,365.8,2,16,1250,26,+++++,+++,3032,39,2883,69,+++++,+++,3143,59

As you can see from the benchmark results, for output operations you might only get 30-60% the performance of XFS using ZFS through FUSE. On the other hand, the caching that FUSE performs allowed zfs-fuse to perform noticeably better than XFS for both character and block input tests. In real world terms, this means that there is no speed penalty for using ZFS through FUSE for a filesystem that is read more than it is written. Write operations do suffer a performance loss with zfs-fuse as apposed to an in-kernel filesystem, but the loss should not render the system unusable. As always, you should benchmark for the task you have at hand to make sure you can get the performance you expect.

正如你从基准测试结果中看到的,在FUSE下使用XFS输入性能大约只能得到相当于30~60%XFS的性能。从另外一方面讲,在字符和块(character and block)输入测试中,FUSE所使用的缓存技术则让zfs-fuse的表现明显好于XFS。在实际工作情况中,这意味着:在读取数据的量大于写入数据的量的话,那么使用通过FUSE使用ZFS文件系统并没有太大的性能损失。但是在和内核文件系统相比写操作的时候,fuse-zfs的确有些劣势。当然,话说回来,只有你测试你实际情况中的工作的时候才知道真正的性能表现,所以测试结果也不能一概而论。

There are many issues with running ZFS under Linux. For instance, the fact that the zfs-fuse FUSE process runs as the root user implies potential security issues and gives any bugs that might be present in zfs-fuse free rein over the sysstem. Also, the sharenfs ZFS directive does not currently work with zfs-fuse, and if you wish to export your ZFS filesystems manually then you'll likely have to recompile your FUSE kernel module too.

在Linux下面运行ZFS存在较多的问题。比如,以root权限运行的zfs-fuse 进程隐含着潜在的安全问题,这些bug可能让现有系统一团糟。另外,ZFS的共享文件系统当前并不能在zfs-fuse下使用,而且用户希望手动导出ZFS文件系统的话,那么必须重新编译FUSE内核模块。

zfs-fuse does bring the flexibility of creating many filesystems using ZFS, and the manner in which quotas and space reservation is performed can make system administration to Linux. Because of the way ZFS uses pools to let you quickly create as many filesystems as you like, it's not uncommon to create a new ZFS filesystem in your pool for a new project you are working on. New filesystems being quick and easy to create works well with the rest of ZFS administration, where you can snapshot a ZFS filesystem in its current state and export the current filesystem or a snapshot to another machine. Though, as mentioned above, the sharenfs directive is currently not supported by zfs-fuse.

使用ZFS的情况下,zfs-fuse的确能够在创建多个文件系统的时候带来相当的灵活性,比如磁盘配额以及空间和空间预留处理的方式。。。因为ZFS使用存储池来使用户快速创建多个文件系统。比如,在存储池中为当前的项目创建一个新的ZFS系统这种做法非常普遍。ZFS可以很轻松地管理它创建的新系统,比如你可以在ZFS文件系统创建镜像快照(snapshot ),然后当前系统或者把镜像快照(snapshot )导入到另外一个机器。不过,----就像之前提到的---共享文件系统当前并没有在zfs-fuse中实现。

ZFS also reimplements much of the functionality of the Linux kernel, such as software RAID and logical volume management combination (LVM). One downside of this, as is noted in the March 2008 ZFS administration documentation on page 60, is that you cannot attach an additional disk to an existing RAID-Z configuration. With Linux, you can grow an existing RAID-5 array, adding new disks as you desire.

ZFS 能够实现Linux内核的大部分功能,比如软RAID和逻辑卷的管理(LVM)。正如2008三月ZFS管理文档第60页中写到的,它另外有一个缺点,“用户无法在现有RAID-Z的设置下增加新的磁盘。在Linux的情况下,用户可以创建一个RAID-5阵列,在这样的情况下才可以随意增加磁盘数量。

 

Reserved by www.17LAMP.net