12-03-2016, 11:54 #1
[EN] Ubuntu and ZFS: Possibly IllegalMarch 10, 2016
The project originally known as the Zettabyte File System was born the same year that Windows XP began shipping. Conceived and originally written by Bill Moore, Jeff Bonwick and Matthew Ahrens among others, it was a true next generation project – designed for needs that could not be imagined at the time. It was a filesystem built for the future.
Fifteen years later, ZFS’s features remain attractive enough that Canonical – the company behind the Ubuntu distribution – wants to ship ZFS as a default. Which wouldn’t seem terribly controversial as it’s an open source project, except for the issue of its licensing.
Questions about open source licensing, once common, have thankfully subsided in recent years as projects have tended to coalesce around standard, understood models – project (e.g. GPL), file (e.g. MPL) or permissive (e.g. Apache). The steady rise in share of the latter category has further throttled licensing controversy, as permissive licenses impose few if any restrictions on the consumption of open source, so potential complications are minimized.
ZFS, and the original OpenSolaris codebase it was included with, were not permissively licensed, however. When Sun made its Solaris codebase available for the first time in 2005, it was offered under the CDDL (Common Development and Distribution License), an MPL (Mozilla Public License) derivative previously written by Sun and later approved by the OSI. Why this license was selected for Solaris remains a matter of some debate, but one of the plausible explanations centered around questions of compatibility with the GPL – or lackthereof.
At the time of its release, and indeed still to this day as examples like ZFS suggest, Solaris was technically differentiated from the far more popular Linux, offering features that were unavailable on operating system alternatives. For this reason, the theory went, Sun chose the CDDL at least in part to avoid its operating system being strip-mined, with its best features poached and ported to Linux specifically.
Whether this was actually the intent or whether the license was selected entirely on its merits, the perceived incompatibility between the licenses (verbal permission from Sun’s CEO notwithstanding) – along with healthy doses of antagonism and NIH between the communities – kept Solaris’ most distinctive features out of Linux codebases. There were experimental ports in the early days, and the quality of these has progressed over the years and been made available as on-demand packages, but no major Linux distributions have ever shipped CDDL-licensed features by default.
That may change soon, however. In February, Canonical announced its intent to include ZFS in its next Long Term Support version, 16.04. This prompted a wide range of reactions.
Many Linux users, who have eyed ZFS’ distinctive featureset with envy, were excited by the prospect of having official, theoretically legitimate access to the technology in a mainstream distribution. Even some of the original Solaris authors were enthusiastic about the move. Observers with an interest in licensing issues, however, were left with questions, principally: aren’t these two licenses incompatible? That had, after all, been the prevailing assumption for over a decade.
The answer is, perhaps unsurprisingly, not clear. Canonical, for its part, was unequivocal, saying:
We at Canonical have conducted a legal review, including discussion with the industry’s leading software freedom legal counsel, of the licenses that apply to the Linux kernel and to ZFS.
And in doing so, we have concluded that we are acting within the rights granted and in compliance with their terms of both of those licenses. Others have independently achieved the same conclusion.
The Software Freedom Conservancy, for its part, was equally straightforward:
We are sympathetic to Canonical’s frustration in this desire to easily support more features for their users. However, as set out below, we have concluded that their distribution of zfs.ko violates the GPL.
If those contradictory opinions weren’t confusing enough, the Software Freedom Law Center’s position is dependent on a specific interpretation of the intent of the GPL:
Canonical, in its Ubuntu distribution, has chosen to provide kernel and module binaries bundling ZFS with the kernel, while providing the source tree in full, with the relevant ZFS filesystem code in files licensed as required by CDDL.
If there exists a consensus among the licensing copyright holders to prefer the literal meaning to the equity of the license, the copyright holders can, at their discretion, object to the distribution of such combinations
The one thing that seems certain here, then, is that very little is certain about Canonical’s decision to ship ZFS by default.
The evidence suggests that Canonical either believes its legal position is defensible, that none of the actors would be interested or willing to pursue litigation on the matter, or both. As stated elsewhere, this is if nothing else a testament to the quality of the original ZFS engineering. The fact that on evidence, Canonical perceives the benefits to outweigh the potential overhead of this fifteen year old technology is remarkable.
But if there are questions for Canonical, there are for their users as well. Not about the technology, for the most part: it has withstood impressive amounts of technical scrutiny, and remains in demand. But as much as it would be nice for questions of its licensing to give way before its attractive features, it will be surprising if conservative enterprises consider Ubuntu ZFS a viable option.
If ZFS were a technology less fundamental than a filesystem, reactions might be less binary. As valuable as DTrace is, for example, it is optional for a system in a way that a filesystem is not. With technology like filesystems or databases, however, enterprises will build the risk of having to migrate into their estimates of support costs, making it problematic economically. Even if we assume the legal risks to end users of the ZFS version distributed with Ubuntu to be negligible, concerns about support will persist.
According to the SFLC, for example, the remedy for an objection from “licensing copyright holders” would be for distributors to “cease distributing such combinations.” End users could certainly roll their own versions of the distribution including ZFS, and Canonical would not be under legal restriction from supporting the software, but it’s difficult to imagine conservative buyers being willing to invest long term in a platform that their support vendor may not legally distribute. Oracle could, as has been pointed out, remove the uncertainty surrounding ZFS by relicensing the asset, but the chances of this occurring are near zero.
The uncertainty around the legality of shipping ZFS notwithstanding, this announcement is likely to be a net win for both Canonical and Ubuntu. If we assume that the SFLC’s analysis is correct, the company’s economic downside is relatively limited as long as it complies promptly to objections from copyright holders. Even in such a scenario, meanwhile, developers are reminded at least that ZFS is an available option for the distribution, regardless of whether the distribution’s sponsor is able to provide it directly. It’s also worth noting that the majority of Ubuntu in usage today is commercially unsupported, and therefore unlikely to be particularly concerned with questions of commercial support. If you browse various developer threads on the ZFS announcement, in fact, you’ll find notable developers from high profile web properties who are already using Ubuntu and ZFS in production.
Providing developers with interesting and innovative tools – which most certainly describes ZFS – is in general an approach we recommend. While this announcement is not without its share of controversy, then, and may not be significant ultimately in the commercial sense, it’s exciting news for a lot of developers. As one developer put it in a Slack message to me, “i’d really like native zfs.”
One way or another, they’ll be getting it soon.
12-03-2016, 12:35 #2
The future of file systems
Queue: A Conversation with Jeff Bonwick and Bill Moore
November 15, 2007
Volume 5, issue 6
This month ACM Queue speaks with two Sun engineers who are bringing file systems into the 21st century. Jeff Bonwick, CTO for storage at Sun, led development of the ZFS file system, which is now part of Solaris. Bonwick and his co-lead, Sun Distinguished Engineer Bill Moore, developed ZFS to address many of the problems they saw with current file systems, such as data integrity, scalability, and administration. In our discussion this month, Bonwick and Moore elaborate on these points and what makes ZFS such a big leap forward.
Also in the conversation is Pawel Jakub Dawidek, a FreeBSD developer who successfully ported ZFS to FreeBSD. Ports to other operating systems, such as Mac OS X, Linux, and NetBSD are already under way, and his experience could pave the way for even wider adoption of ZFS.
Leading the discussion is David Brown, who works in Sun’s Solaris engineering group and is a valued member of Queue’s editorial advisory board.
DAVID BROWN To start things off, can you discuss what your design goals were in creating ZFS?
BILL MOORE We had several design goals, which we’ll break down by category. The first one that we focused on quite heavily is data integrity. If you look at the trend of storage devices over the past decade, you’ll see that while disk capacities have been doubling every 12 to 18 months, one thing that’s remaining relatively constant is the bit-error rate on the disk drives, which is about one uncorrectable error every 10 to 20 terabytes. The other interesting thing to note is that at least in a server environment, the number of disk drives per deployment is increasing, so the amount of data people have is actually growing at a super-exponential rate. That means with the bit-error rate being relatively constant, you have essentially an ever-decreasing amount of time until you notice some form of uncorrectable data error. That’s not really cool because before, say, about 20 terabytes or so, you would see either a silent or a noisy data error.
JEFF BONWICK In retrospect, it isn’t surprising either because the error rates we’re observing are in fact in line with the error rates the drive manufacturers advertise. So it’s not like the drives are performing out of spec or that people have got a bad batch of hardware. This is just the nature of the beast at this point in time.
BM So, one of the design principles we set for ZFS was: never, ever trust the underlying hardware. As soon as an application generates data, we generate a checksum for the data while we’re still in the same fault domain where the application generated the data, running on the same CPU and the same memory subsystem. Then we store the data and the checksum separately on disk so that a single failure cannot take them both out.
When we read the data back, we validate it against that checksum and see if it’s indeed what we think we wrote out before. If it’s not, we employ all sorts of recovery mechanisms. Because of that, we can, on very cheap hardware, provide more reliable storage than you could get with the most reliable external storage. It doesn’t matter how perfect your storage is, if the data gets corrupted in flight—and we’ve actually seen many customer cases where this happens—then nothing you can do can recover from that. With ZFS, on the other hand, we can actually authenticate that we got the right answer back and, if not, enact a bunch of recovery scenarios. That’s data integrity.
Another design goal we had was to simplify storage management. When you’re thinking about petabytes of data and hundreds, maybe even thousands of disk drives, you’re talking about something that no human would ever willingly take care of.
By comparison, when you have a server and you want to upgrade its memory, the process is pretty straightforward. You power down the server, plug in some DIMMs, power it back on, and you’re done. You don’t run dimmconfig, you don’t edit /etc/dimmtab, and you don’t create virtual DIMMs that you mount on applications. The memory is simply a pooled resource that’s managed by the operating system on behalf of the application. If Firefox wants another megabyte of memory, it asks for it, and if it’s available, it gets it. When it’s done, it frees it, and back it goes into the pool for other applications to use. It’s a very simple, very natural way of thinking about storage.
With ZFS, we asked this question: why can’t your on-disk storage be the same way? That’s exactly what we do in ZFS. We have a pooled storage model. The disks are like DIMMs, and the file systems are like applications. You add devices into the storage pool, and now the file system is no longer tied to the concept of a physical disk. It grabs data from the pool as it needs to store your files, and as you remove or delete your files, it releases that storage back to the pool for other file systems to use. Again, it’s a very natural, very simple way to administer large quantities of data.
Finally, in designing ZFS, we also wanted to consider performance. A lot of people, until they get bitten by data corruption that causes them severe pain, are of the mindset, “I don’t care, I just want it to go fast. I’ve never lost any data. My laptop is just fine after all these years.” So unless you make it fast, people will fundamentally be uninterested—except for those who have experienced data corruption firsthand.
JB All of whom become rabid converts, as it turns out. There’s nothing like having data corruption that is detected and healed for you to change your perspective on the importance of that problem and the value of having it solved.
DB Can you identify other things that have changed over the years, and that are going to be disruptive, that we haven’t yet realized?
JB One thing that has changed, as Bill already mentioned, is that the error rates have remained constant, yet the amount of data and the I/O bandwidths have gone up tremendously. Back when we added large file support to Solaris 2.6, creating a one-terabyte file was a big deal. It took a week and an awful lot of disks to create this file.
Now for comparison, take a look at, say, Greenplum’s database software, which is based on Solaris and ZFS. Greenplum has created a data-warehousing appliance consisting of a rack of 10 Thumpers (SunFire x4500s). They can scan data at a rate of one terabyte per minute. That’s a whole different deal. Now if you’re getting an uncorrectable error occurring once every 10 to 20 terabytes, that’s once every 10 to 20 minutes—which is pretty bad, actually.
DB You’re saying that it’s not just scale, it’s also the rate at which we’re accessing that stuff, so the failure rates are really very visible now.
BM Yes, that’s right, and as the incident rate goes up, people are going to become more and more aware that this is a problem, after all. When it happens only once a year or once every several months, it’s easy to chalk that up to any number of problems—“Ah, stupid Windows, it already needs to be reinstalled every six months.” Now people are thinking, “Well, maybe there’s a reason. Maybe Windows isn’t as bad as everyone says it is.”
Or pick your favorite operating system, or your favorite laptop. Bad things happen, and while I’m not saying software isn’t buggy, a lot of this stuff that we tend to chalk up to buggy software, if you dig into it, often has an underlying cause.
DB I’m recalling Jim Gray a little bit and maybe channeling some of those ideas that I’ve heard him discuss over the years, such as the teraserver that he put together. Originally he had been shipping disks around between people, and then he discovered it was an inconvenient way of connecting, and ultimately he shipped around whole PCs and was using Ethernet and NFS (network file system) as the logical, easy point of connectivity. Did you think at all about what’s an appropriate representational level? I know ZFS is a file system, but have you given any thought to those kinds of issues?
BM Yes, as a matter of fact, I was going to mention earlier that ZFS is composed of several layers, architecturally, but the core of the whole thing is a transactional object store. The bulk of ZFS, the bulk of the code, is providing a transactional store of objects.
You can have up to 264 objects, each 264 bytes in size, and you can perform arbitrary atomic transactions on those objects. Moreover, a storage pool can have up to 264 sets of these objects, each of which is a logically independent file system. Given this foundation, a lot of the heavy lifting of writing a Posix file system is already done for you. You still have to interface with the operating system and the virtual memory subsystem, but all the stuff you don’t have to worry about is kind of nice: you don’t have to worry about on-disk format; you don’t have to worry about consistently modifying things; and you don’t have to worry about FSCK (file system check).
JB We can thank the skeptics in the early days for that. In the very beginning, I said I wanted to make this file system a FSCK-less file system, though it seemed like an inaccessible goal, looking at the trends and how much storage people have. Pretty much everybody said it’s not possible, and that’s just not a good thing to say to me. My response was, “OK, by God, this is going to be a FSCK-less file system.” We shipped in ’05, and here we are in ’07, and we haven’t needed to FSCK anything yet.
12-03-2016, 12:37 #3
BM Having your data offline, while you try to find the needle in the haystack, is not really where most people want to live.
DB Well, especially when the scale of the storage is many terabytes or petabytes versus circa 1972 when the file system on Unix was five megabytes.
You talked about essentially building a layer of abstraction on the physical storage so that you have a more reliable transactional storage model down low. Historically, we’ve had storage welded into the operating system in terms of the specific file systems that a given operating system offers. This makes it not as amenable to moving that storage around, if you want, whether it’s hot-pluggability or mobility. Did you look at those things as well?
BM If you were to go back five years and ask people, “Would you mind having all your data over the network somewhere?” they would say, “Ha, ha, ha, that’s funny. Way too low performance. I’ll never get my data fast enough.” Now you look around and gigabit is ubiquitous and 10-gigabit is starting to emerge. Right now with gigabit you get more than 100 megabytes a second over a $5 interface that’s on every device you can buy—you can hardly buy anything slower anymore.
The interesting thing to note about that is the network is now faster than any local disk you can buy. One of the things we see in the HPC (high-performance computing) industry is people moving away from local storage and using high-speed, low-latency interconnects—which Ethernet will eventually catch up with—and using that as the primary means of accessing their data.
My prediction is that in the next five to 10 years, for the most part, your data is going to be somewhere else. You’ll still have local storage when you’re not connected to any network, but in an office environment or a corporate environment or a data center, I just don’t think you’re going to have a whole lot of local storage.
JB Yes, because if you’ve got a relatively small client compared with a relatively large server providing the data, you have to consider the cost of disk access on your local system versus the latency of going over the wire to get something that in all likelihood is in memory on the server side—and that actually is faster now that we’re up to 10-gigabit Ethernet.
BM I think this will continue to be truer as we go forward, but only in recent history have the trend lines crossed to such a point that having your storage remote is acceptable and perhaps can even exceed the performance of a local storage solution.
DB I wonder if there are some emerging issues to talk about in this regard. There is ongoing discourse about NAS (network-attached storage), the pNFS (parallel NFS) type of approach versus the SAN (storage area network).
JB Whether you talk about storage as SAN or NAS, you’re describing it fundamentally as a relatively dumb thing that you plug in, and then some smart thing comes along to access it. The big change that we’re going to see is more intelligence in the storage itself. Rather than simply saying, “Give me the following region of bytes out of some file,” you’re going to come at it with a question, and then it’s going to do some work for you and come up with an answer. If you think about it, this is exactly how you interact with Google. You don’t go to Google and have it feed two petabytes of data over your DSL line, and then run grep locally. That would be kind of bad.
Instead, you send a small question over the wire to Google, which runs the compute right near the data, where it’s all low latency, and then sends you back a very small answer. We see storage in general moving in that direction because you have more and more of it and you want to be able to do unstructured queries.
You need to have that kind of technology. It’s too expensive to do brute-force searches without any kind of support from underlying storage systems, just because of the change in capacity. In fact, there was a prediction made about 10 years ago that by roughly today, all data would be in databases, and file systems were going away. I think it’s really all about human behavior. If you want to sell to people who are organized in the way they store things, the only people you’ll be selling to will be in the database market. I’d rather sell to people who are disorganized in the way they store things because there are more of them and they have a lot more stuff. People are pack rats and they don’t plan.
DB I would like to ask Pawel about some of the problems and challenges he faced while porting ZFS to FreeBSD. What made it difficult? What were the things perhaps you expected to be difficult that were less so?
PAWEL JAKUB DAWIDEK At first I just wanted to see how much work it would take to port ZFS to FreeBSD. I started by making it compile on FreeBSD, and once I did that, I was quite sure it would take at least six months to have the first prototype working. The funny thing was that after another week or so, ZFS was running on my test machine. It was truly surprising that the code was so portable; it was self-contained and I had initial read-write support after 10 days of work. The parts that need to be ported are actually only ZVOL, ZPL, and VDEV; all the rest is mostly self-contained and doesn’t need a lot of work. Of course, after those 10 days there was still a lot of work to do and I encountered some problems. I found that VFS was the hardest part in ZFS’s ZPL layer, but I managed to make it work.
There were some parts I needed to port from OpenSolaris itself, such as the GFS (generic file system) framework, which in ZFS is responsible for handling the .zfs/ directory and snapshot mountpoint. There were a few really tough problems, but the project was so exciting that it somehow kept me working. I had this feeling that what I’m doing is really cool and that I should continue. So, basically, it went much, much better and faster than I ever expected.
The FreeBSD port of ZFS is almost feature-complete. We even have support for FreeBSD’s jails working the same way as ZFS support for Solaris zones. We, of course, support snapshots, ZVOLs, NFS exports, etc.
The biggest problem, which was recently fixed, was the very high ZFS memory consumption. Users were often running out of KVA (kernel virtual address space) on 32-bit systems. The memory consumption is still high, but the problem we had was related to a vnode leak. FreeBSD 7.0 will be the first release that contains ZFS.
DB Is that going to be 64 bits?
PJD We want to release ZFS for every architecture we support, but we don’t recommend ZFS for 32-bit architectures.
JB We debated that in the early days because obviously it has been a bug for a long time now. Back when virtual memory was first created, the idea was that you had far more virtual addressing than you had physical memory. As physical memory capacity crossed the four-gigabyte (32-bit) threshold, however, the contents of the bag got bigger than the bag. That’s an awkward situation to be in, to have an eight-gigabyte system and only two or four gigabytes that are actually addressable at any one time.
I had actually hoped that this problem would go away a little sooner than it has—and Bill is laughing because he gave me a hard time about this—but as we were developing ZFS, I basically said, “Look, by the time we actually ship this thing, this problem will have taken care of itself because everything will be 64-bit.” I wasn’t entirely off base.
BM You were right within a decade.
JB Maybe even within half a decade, but it’s one of those things where you look at it and say, “Gee, the code is so much simpler if I don’t have to deal with that whole set of problems,” keeping only a subset of the data mapped.
DB I think the assumption is pretty valid in terms of what has happened with hardware commoditization and cost. Hardware has become free and you can have large tracts of physical memory and disk and everything, but, unfortunately, people still have these old applications, relying on these old operating systems.
JB Sometimes it’s new ones, too. It’s not your next desktop; it’s your next laptop, or PDA, or cellphone, and all of a sudden, you have this new class of device and you’re back to square one. You have more limited addressing, a smaller amount of memory, and less compute power, and you want to be able to fit in those devices.
DB The “ZFS for my cellphone” kind of problem.
BM Actually, we’ve released ZFS not as a be-all, end-all product; we’ve tried to design and maintain it in such a way that it’s really not just an endpoint but rather a starting point. It’s a framework, with the transactional object store and the pooled storage, on top of which you can build far more interesting things.
12-03-2016, 12:38 #4
DB What kinds of interesting things do you have in mind?
BM There are a lot of other things we’ve been planning to do with it, and haven’t quite found the time to do. One thing we have found the time for is basically to take one of these objects and export it as a raw block device, as a virtual disk, if you will.
With this virtual disk you can do anything you would do with the real disk. You can put UFS (Unix file system) on it, or you can export it as an iSCSI device. You can run your favorite database on it or you can cat(1) it, if that’s what you’re into.
Imagine if you took, say, a database such as Postgres or MySQL and put the SQL engine on top of ZFS’s transactional object store. Then you would have something that has all the file system and volume manager stuff integrated into it, so there’s no waste doing the same job five different times on your way down from your database app before it gets to disk. It’s all handled once, yet you still have the same programming interface and the SQL search engine that you know and love and are used to.
JB If you have a database sitting on top of a transactional file system, the database is sitting up there being very careful about ordering its writes to the log and being clear about saying when it wants to know this stuff is flushed out. Then beneath it, you’ve got this transactional file system creating transaction groups, putting in a bunch of changes, and then committing those out to disk atomically.
Why not have a blending of the layers where basically the whole back end to the database—the part that isn’t parsing the SQL—participates directly in the transaction groups that ZFS provides, so that consistency is no longer a problem the database has to solve? It can be handled entirely by the storage software below.
BM That is especially nice when you have a transactional object store and you have the capability of rolling back. If bad things, such as a power fail, should happen, you don’t ever have to do an offline FSCK, as you would in a traditional file system.
DB What are the provocative problems in storage that are still outstanding, and does ZFS help? What’s next? What’s still left? What are the things that you see down the pike that might be the big issues that we’ll be dealing with?
JB There are not just issues, but opportunities, too. I’ll give you an example. We were looking at the spec sheets for one of the newest Seagate drives recently, and they had an awful lot of error-correction support in there to deal with the fact that the media is not perfect.
BM They’re pushing the limits of the physics on these devices so hard that there’s a statistical error rate.
JB Right, so we looked at the data rates coming out of the drive. The delivered bandwidth from the outer tracks was about 80 megabytes per second, but the raw data rate—the rate that is actually coming off the platter—was closer to 100. This tells you that some 20 percent of the bits on that disk are actually error corrections.
BM Error correcting, tracking, bad sector remapping.
JB Exactly, so one of the questions you ask yourself is, “Well, if I’m going to start moving my data-integrity stuff up into the file system anyway—because I can actually get end-to-end data integrity that way, which is always stronger—then why not get some additional performance out of the disk drive? Why not give me an option with this disk drive?” I’ll remap all the bad sectors, because we don’t even have to remap them. It suffices to allocate it elsewhere and basically deliberately leak the block that is defective. It wouldn’t take a whole lot of file-system code to do that.
Then you can say, “Put the drive in this mode,” and you’ve got a drive with 20 percent more capacity and 20 percent higher bandwidth because you’re running ZFS on top of it. That would be pretty cool.
DB That’s a really exciting idea. Have you had those discussions with the drive vendors about whether they would offer that mode?
BM Not quite, because they’re most interested in moving up the margin chain, if you will, and providing more unreliable devices that they sell at a lower cost; it isn’t really something they care to entertain all that thoroughly. I think we will see this again with flash, however, because flash has a much more interesting failure mode. With flash, as you wear out a given region of the device, you wind up getting statistically higher and higher bit-error rates in that region. In ZFS we have a notion of built-in compression. Say the application gives us 128 kilobytes of data. We compress it down to 37 kilobytes. Then we write a single 37-kilobyte block of data to disk, and to the application it looks like it’s reading and writing 128 kilobytes, while we’re compressing it to whatever size that works.
Conversely, you could go another way, which is potentially to expand the data with some sort of error-correcting code such that when you read the data back—or before you write the data—you would say, “All right, with the media I’m going to write to, I expect this kind of error rate, and I want this kind of recoverability. Let me expand it using my own error-correcting code, write it out, and then when I read it back, undo that and correct any errors you wind up with.” The idea being that now that it’s under ZFS control, it could be software controlled. As you wear out a region of a flash device, you say, “All right, that region is getting worn out. Maybe I’ll start expanding the data in that region of the device a little more,” or, “This area is not worn out, so maybe I won’t expand those as much and get the capacity.”
There are some very interesting ideas that can come from putting everything up at the file-system level.
DB I think it’s also a very provocative point. When we thought about nonvolatile storage through most of our history in the field, it’s been disk drives. Then all of a sudden, we’ve got flash, which has totally different properties than the disk drive. When you started out with ZFS, were you thinking at all about the application of hybrid storage?
BM Not explicitly. We were thinking more about keeping everything as compartmentalized as possible, so that as time moves on and things we couldn’t have anticipated come up, the amount of code we have to change to implement some new radical technology should be minimal.
JB We did have an overall design principle of not designing the file system assuming that you’re writing fundamentally to cylindrical storage, that other things will come into existence. At the time it wasn’t clear that it was going to be flash; that wasn’t even on the radar screen. We were thinking it would be more like MRAM or Ovonic unified memory or holographic memory. It’s hard to say which one of these things is ultimately going to take over, but I don’t think that on the deck of the Enterprise we’ll have rotating rust.
BM Every year inside Sun, the Solaris kernel engineers have a session at the beginning of the year called predicteria where we make one-, three-, and six-year predictions for ourselves and the industry. For the past 10 years, Jeff has been betting that five years out we’ll have nonvolatile memory and there will be no more DRAM in computers. So far he has been losing that bet every time it comes up, but one of these years he’ll probably be right. I just don’t know which year that’ll be.
12-03-2016, 12:48 #5
Nov 01, 2009
You knew this day was coming: ZFS now has built-in deduplication.
If you already know what dedup is and why you want it, you can skip the next couple of sections. For everyone else, let's start with a little background.
What is it?
Deduplication is the process of eliminating duplicate copies of data. Dedup is generally either file-level, block-level, or byte-level. Chunks of data -- files, blocks, or byte ranges -- are checksummed using some hash function that uniquely identifies data with very high probability. When using a secure hash like SHA256, the probability of a hash collision is about 2\^-256 = 10\^-77 or, in more familiar notation, 0.000000000000000000000000000000000000000000000000 00000000000000000000000000001. For reference, this is 50 orders of magnitude less likely than an undetected, uncorrected ECC memory error on the most reliable hardware you can buy.
Chunks of data are remembered in a table of some sort that maps the data's checksum to its storage location and reference count. When you store another copy of existing data, instead of allocating new space on disk, the dedup code just increments the reference count on the existing data. When data is highly replicated, which is typical of backup servers, virtual machine images, and source code repositories, deduplication can reduce space consumption not just by percentages, but by multiples.
What to dedup: Files, blocks, or bytes?
Data can be deduplicated at the level of files, blocks, or bytes.
File-level assigns a hash signature to an entire file. File-level dedup has the lowest overhead when the natural granularity of data duplication is whole files, but it also has significant limitations: any change to any block in the file requires recomputing the checksum of the whole file, which means that if even one block changes, any space savings is lost because the two versions of the file are no longer identical. This is fine when the expected workload is something like JPEG or MPEG files, but is completely ineffective when managing things like virtual machine images, which are mostly identical but differ in a few blocks.
Block-level dedup has somewhat higher overhead than file-level dedup when whole files are duplicated, but unlike file-level dedup, it handles block-level data such as virtual machine images extremely well. Most of a VM image is duplicated data -- namely, a copy of the guest operating system -- but some blocks are unique to each VM. With block-level dedup, only the blocks that are unique to each VM consume additional storage space. All other blocks are shared.
Byte-level dedup is in principle the most general, but it is also the most costly because the dedup code must compute 'anchor points' to determine where the regions of duplicated vs. unique data begin and end. Nevertheless, this approach is ideal for certain mail servers, in which an attachment may appear many times but not necessary be block-aligned in each user's inbox. This type of deduplication is generally best left to the application (e.g. Exchange server), because the application understands the data it's managing and can easily eliminate duplicates internally rather than relying on the storage system to find them after the fact.
ZFS provides block-level deduplication because this is the finest granularity that makes sense for a general-purpose storage system. Block-level dedup also maps naturally to ZFS's 256-bit block checksums, which provide unique block signatures for all blocks in a storage pool as long as the checksum function is cryptographically strong (e.g. SHA256).
When to dedup: now or later?
In addition to the file/block/byte-level distinction described above, deduplication can be either synchronous (aka real-time or in-line) or asynchronous (aka batch or off-line). In synchronous dedup, duplicates are eliminated as they appear. In asynchronous dedup, duplicates are stored on disk and eliminated later (e.g. at night). Asynchronous dedup is typically employed on storage systems that have limited CPU power and/or limited multithreading to minimize the impact on daytime performance. Given sufficient computing power, synchronous dedup is preferable because it never wastes space and never does needless disk writes of already-existing data.
ZFS deduplication is synchronous. ZFS assumes a highly multithreaded operating system (Solaris) and a hardware environment in which CPU cycles (GHz times cores times sockets) are proliferating much faster than I/O. This has been the general trend for the last twenty years, and the underlying physics suggests that it will continue.
How do I use it?
What are the tradeoffs?
It all depends on your data.
If your data doesn't contain any duplicates, enabling dedup will add overhead (a more CPU-intensive checksum and on-disk dedup table entries) without providing any benefit. If your data does contain duplicates, enabling dedup will both save space and increase performance. The space savings are obvious; the performance improvement is due to the elimination of disk writes when storing duplicate data, plus the reduced memory footprint due to many applications sharing the same pages of memory.
Most storage environments contain a mix of data that is mostly unique and data that is mostly replicated. ZFS deduplication is per-dataset, which means you can selectively enable dedup only where it is likely to help. For example, suppose you have a storage pool containing home directories, virtual machine images, and source code repositories. You might choose to enable dedup follows:
zfs set dedup=off tank/home
zfs set dedup=on tank/vm
zfs set dedup=on tank/src
Trust or verify?
If you accept the mathematical claim that a secure hash like SHA256 has only a 2\^-256 probability of producing the same output given two different inputs, then it is reasonable to assume that when two blocks have the same checksum, they are in fact the same block. You can trust the hash. An enormous amount of the world's commerce operates on this assumption, including your daily credit card transactions. However, if this makes you uneasy, that's OK: ZFS provies a 'verify' option that performs a full comparison of every incoming block with any alleged duplicate to ensure that they really are the same, and ZFS resolves the conflict if not. To enable this variant of dedup, just specify 'verify' instead of 'on':
zfs set dedup=verify tank
Selecting a checksum
Given the ability to detect hash collisions as described above, it is possible to use much weaker (but faster) hash functions in combination with the 'verify' option to provide faster dedup. ZFS offers this option for the fletcher4 checksum, which is quite fast:zfs set dedup=fletcher4,verify tank
The tradeoff is that unlike SHA256, fletcher4 is not a pseudo-random hash function, and therefore cannot be trusted not to collide. It is therefore only suitable for dedup when combined with the 'verify' option, which detects and resolves hash collisions. On systems with a very high data ingest rate of largely duplicate data, this may provide better overall performance than a secure hash without collision verification.
Unfortunately, because there are so many variables that affect performance, I cannot offer any absolute guidance on which is better. However, if you are willing to make the investment to experiment with different checksum/verify options on your data, the payoff may be substantial. Otherwise, just stick with the default provided by setting dedup=on; it's cryptograhically strong and it's still pretty fast.
Scalability and performance
Most dedup solutions only work on a limited amount of data -- a handful of terabytes -- because they require their dedup tables to be resident in memory.
ZFS places no restrictions on your ability to dedup. You can dedup a petabyte if you're so inclined. The performace of ZFS dedup will follow the obvious trajectory: it will be fastest when the DDTs (dedup tables) fit in memory, a little slower when they spill over into the L2ARC, and much slower when they have to be read from disk. The topic of dedup performance could easily fill many blog entries -- and it will over time -- but the point I want to emphasize here is that there are no limits in ZFS dedup. ZFS dedup scales to any capacity on any platform, even a laptop; it just goes faster as you give it more hardware.
Bill Moore and I developed the first dedup prototype in two very intense days in December 2008. Mark Maybee and Matt Ahrens helped us navigate the interactions of this mostly-SPA code change with the ARC and DMU. Our initial prototype was quite primitive: it didn't support gang blocks, ditto blocks, out-of-space, and various other real-world conditions. However, it confirmed that the basic approach we'd been planning for several years was sound: namely, to use the 256-bit block checksums in ZFS as hash signatures for dedup.
Over the next several months Bill and I tag-teamed the work so that at least one of us could make forward progress while the other dealt with some random interrupt of the day.
As we approached the end game, Matt Ahrens and Adam Leventhal developed several optimizations for the ZAP to minimize DDT space consumption both on disk and in memory, key factors in dedup performance. George Wilson stepped in to help with, well, just about everything, as he always does.
Well, kids, dedup is done. We're going to have some fun now.
12-03-2016, 12:53 #6
EMC goes all in with DSSD
5 May, 2014
The drama with XtremIO’s delays only whetted EMC’s appetite for more drama. Or the hot breath of competition and declining VMAX sales.
Let’s go with the latter.
StorageMojo has been following DSSD for some time. The core of the Sun ZFS team, Jeff Bonwick and Bill Moore, are founders, and Andy Bechtolscheim an investor and advisor.
In the post on DSSD, StorageMojo opined:
So what are they building? They are taking a radically different approach to the problem of high-performance transaction processing storage. The use of flash is a given in TP, and the extra durability, scalability and guaranteed read latency would be very attractive in large TP applications.
The EMC presser:
Menlo Park-based DSSD is the developer of an innovative new rack-scale flash storage architecture for I/O-intensive in-memory databases and Big Data workloads like SAP HANA and Hadoop.
EMC and StorageMojo agree! Will it last? No.
EMC also said:
Menlo Park-based DSSD will operate as a standalone unit within EMC’s Emerging Technology Products Division reporting to Chirantan “C.J.” Desai. DSSD President and CEO Bill Moore, formerly Sun Microsystems’ Chief Storage Engineer, ZFS co-lead and 3Par’s first employee, will lead the DSSD business within EMC. . . .
Andy Bechtolsheim said, “The prospects of what EMC and DSSD can achieve together are truly remarkable. We ventured out to create a new storage tier for transactional and Big Data applications that have the highest performance I/O requirements. Working together with EMC, DSSD will deliver a new type of storage system with game-changing latency, IOPS and bandwidth characteristics while offering the operational efficiency of shared storage.”
Good, they’re calling it what it is, an emerging technology. Nor are they being as aggressive on the timeline as they were with XtremIO:
Products based on the new DSSD rack-scale flash storage architecture are expected to be available in 2015 and will be optimized for:
- In-memory databases (e.g. SAP HANA, GemFire, etc.)
- Real-time analytics (e.g. risk management, fraud detection, high-frequency applications, Pivotal HD, etc.)
- High-performance applications used by research and government agencies (e.g. genomics, facial recognition, climate analysis, etc.)
Customers desiring a platform capable of delivering unprecedented performance for I/O- intensive Big Data and in-memory applications like SAP HANA and Hadoop will choose DSSD rack-scale flash storage as the fastest tier in their multi-tier storage architecture.
The StorageMojo take
Joe Tucci is reinventing EMC one more time. Will it work?
XtremIO is the VNX replacement. DSSD is the VMAX replacement.
It’s been clear for years that standard RAID arrays – say, VNX and VMAX – are on the way out. Object/cloud storage is winning the bulk storage market and flash-based storage the high-performance market.
This is a return to the EMC of the mid-90s, where a young and flaky Symm was positioned as the high-performance alternative to IBM’s tired mirrored disks. As soon as others caught up on performance, EMC promptly repositioned it.
DSSD’s hardware development has not been trouble-free, despite the promising architecture, patents and dev team. Sometimes you just need to lay out a PC board, and if high performance is the goal that takes serious and specialized experience and skill.
Competitors and would-be competitors need to rev up their engines. You’ve got a year to get a footprint in as many EMC customers as possible before they bring this new VMAX hammer down.
12-03-2016, 13:07 #7
EMC says its DSSD D5 rack-scale flash storage attracts interest in financial services
March 10, 2016
By Renee Caruthers
EMC recently launched what it calls a new category of flash storage designed for data-intensive, high-performance, low-latency applications. The company has said the product, DSSD D5, which moves into general release this month, has been attracting interest from the financial services industry and has been beta tested by financial firms looking to process massive data sets with high speeds.
The product is the result of EMC's acquisition of DSSD, Inc., first announced in May of 2014. The product was in an early beta stage at the time of acquisition, according to Matt McDonough, senior director of product management and marketing at EMC, but has since logged more than 200,000 hours of testing time with 20 beta customers.
The product has been beta tested within the financial industry for several high-intensity applications, including market risk modeling to speed the processing of risk applications; capturing massive data sets for data-intensive real time fraud detection; and other financial applications involving capturing and analyzing massive data sets in real time, McDonough said.
Key to the product is that it eliminates bottlenecks in underlying storage systems that hinder performance of some of the newer, higher intensity data processing applications that have emerged in recent years. For example, many companies have bought server attached flash cards, called PCIE cards, in recent years because they offer high-performance and low latency and can be plugged directly into a local machine. However the cards can be limited in their storage capacity and can introduce multiple single points of failure into the infrastructure, McDonough said.
"Really what the rack-scale flash is, is a new category of storage that is really about enabling the architecture of high performance and borrowing the best of both worlds," McDonough said. "These are shared storage devices. You are getting a lot of the great things from the traditional all-flash array market, but similar to PCIE flash cards, we are plugged into the local machine – then run over the wire to the shared chassis."
Overall EMC is targeting three main buckets of use cases, high performance data bases and data warehouses, Hadoop, and a third bucket that EMC refers to as the custom applications bucket, McDonough said. With Hadoop, for example, while Hadoop has matured for traditional batch analytics use cases, EMC said DSSD D5 would help Hadoop move into real-time enterprise analytics because DSSD D5 is ten times faster than traditional Hadoop infrastructures. For the custom applications bucket, EMC has an API, allowing customers to write natively to the API for high performance low latency capability.
"We expect customers to be able to start to do new things they haven't even imagined before based on this additional level of performance, so we think there will be a lot of new innovation in the financial sector in the application side of the house to be able to take advantage of the performance of this large pool of very fast flash storage," McDonough said.
Última edição por 5ms; 12-03-2016 às 13:18.
12-03-2016, 23:31 #8
GPL Violations Related to Combining ZFS and Linux
Oracle is the primary copyright holder of ZFS and Oracle continues to license their code under their own GPL-incompatible license.
The Basic Facts
Sun released the Z File System (ZFS) code under the Common Development and Distribution License, version 1 (CDDLv1) as part of OpenSolaris. Sun was ultimately acquired by Oracle. Community members have improved ZFS and adapted it to function with Linux, but unfortunately, CDDLv1 is incompatible with GPLv2, so distribution of binaries is not permitted (see below for details).
The situation escalated last week because Canonical, Ltd. announced their plans to commercially distribute, in Ubuntu 16.04, a binary distribution of ZFS as a Linux kernel module, which adapts ZFS natively for Linux.
Conservancy contacted Canonical to inform them of their GPL violation, and Canonical encouraged us to speak publicly. We're glad to do so to clarify the differing views on this issue. As you'll read below, Conservancy disagrees with Canonical's decision, and Conservancy hopes to continue dialogue with Canonical regarding their violation.
Specifically, we provide our detailed analysis of the incompatibility between CDDLv1 and GPLv2 — and its potential impact on the trajectory of free software development — below. However, our conclusion is simple: Conservancy and the Linux copyright holders in the GPL Compliance Project for Linux Developers believe that distribution of ZFS binaries is a GPL violation and infringes Linux's copyright. We are also concerned that it may infringe Oracle's copyrights in ZFS. As such, we again ask Oracle to respect community norms against license proliferation and simply relicense its copyrights in ZFS under a GPLv2-compatible license.
The license of Linux, the GNU General Public License, version 2 (GPLv2), is conceptually known as a strong copyleft license. A strong copyleft license extends software freedom policies as far as copyright law allows. As such, GPLv2 requires that, when combinations and/or derivatives are made under copyright law with GPLv2'd works, the license of the resulting combination and/or derivative is also GPLv2.
The Free Software Foundation (FSF) has long discussed the question of licenses incompatible with the GPL, pointing out that:
In order to combine two programs (or substantial parts of them) into a larger work, you need to have permission to use both programs in this way. If the two programs' licenses permit this, they are compatible. If there is no way to satisfy both licenses at once, they are incompatible.
License compatibility is not merely a question for Free Software licenses. We can analyze any two copyright licenses and consider whether they are compatible.
In the proprietary software world, rarely are two licenses ever compatible. You can't, by default, license a copy of Oracle's database, and then make a combination with Apple's iOS. To do so, you would need to negotiate (and pay for) a special license from both Apple and Oracle to make that combination.
Furthermore, with proprietary software, there is a practical problem somewhat unrelated to the legal permission: you must procure a copy of the source code for Apple's and Oracle's proprietary software to have the practical ability to make the combination.
Since the GPL, and all copyleft licenses, are fundamentally copyright licenses, the analysis is similar. However, GPL requires that all software distributions include complete corresponding source code to any binaries, so the practical problem never presents itself. Nevertheless, when you wish to combine GPL'd software with some other software and distribute the resulting combination, both the copyright holders of the GPL'd software and the copyright holders of the other software must provide a license that allows distribution of the combination.
Most prefer to discuss the issue of combining truly proprietary, no-source-available copyrighted material with GPL'd software, as it creates the most stark practical contrast, and is the most offensive fact pattern. Proprietary software gives the users no freedom to even examine, let alone modify, rebuild, and reinstall the software. The proprietary license doesn't allow nor even give the practical ability to redistribute source code, and the GPL mandates source distribution when binary distribution occurs. The incompatibility is intuitively obvious. Few consider the fact that proprietary software licensing is just one (rather egregious) example of a GPL-incompatible license.
In that context, we can imagine licenses that are GPL-incompatible, but do give some interesting permissions to users. An example is source-code-available systems that prohibit commercial distribution and forbid modification to the source code. The GPL has terms that permit modification and allow commercial distribution of GPL'd software, and as such, even though source code is available for non-commercial, non-modifiable software, the license is nonetheless GPL-incompatible.
Finally, we can consider the most subtle class of GPL-incompatibility, in which we find ZFS's license, the Common Development and Distribution License, version 1 (CDDLv1). The CDDLv1 is considered both a Free Software and an Open Source license, and is a weak copyleft license. Nevertheless, neither CDDLv1 nor GPLv2 permits combination of copyrighted material under the other license.
To understand this non-intuitive incompatibility, we can analyze in detail the requirements of both licenses. First, GPLv2 requires:
[§2](b) You must cause any work that you distribute … that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole … under the terms of this License.…
[§]3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also…
a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above…
[§]6. …You may not impose any further restrictions on the recipients' exercise of the rights granted herein.
According to these provisions of GPLv2, if you create a binary work that incorporates components from the GPLv2'd program, you must provide the complete corresponding source for that entire work with the binary, and the terms of the GPLv2 apply to that source. If the sources as a whole cannot be outbound-licensed under GPLv2, you have no permission under copyright law to distribute the binary work, since GPLv2 didn't grant you that permission.
GPLv2-compatible licenses do not contradict the requirements of GPLv2, which is what makes them compatible. For example, highly permissive licenses like the ISC license allow imposition of additional licensing requirements (even proprietary ones), and so combining ISC-licensed source and GPLv2'd source into a binary work is permitted; compliance with GPLv2 is possible when distributing binaries based on the combined sources.
CDDLv1, however, contains various provisions that are incompatible with that outcome. Specifically, CDDLv1 requires (emphasis ours):
[§]3.1 … Any Covered Software that You distribute or otherwise make available in Executable form must also be made available in Source Code form and that Source Code form must be distributed only under the terms of this License. …
[§] 3.4 … You may not offer or impose any terms on any Covered Software in Source Code form that alters or restricts the applicable version of this License
CDDLv1 is a weak copyleft license in that it allows you create a binary work with components under different terms (see CDDLv1§3.6). However, as seen in the text above, with regard to the specific copyrighted material already under CDDLv1, that material must remain only licensed under the terms of the CDDLv1. Furthermore, when redistributing the source code, you cannot alter the terms of the license on that copyrighted material.
GPLv2, as shown above, requires that you alter those terms for the source code — namely, as a strong copyleft, the terms of GPLv2 apply to the entire complete corresponding source for any binary work. Furthermore, downstream users need permission to make GPLv2'd modifications to that source. This creates a contradiction; you cannot simultaneously satisfy that obligation of GPLv2 and also avoid alter[ing] the terms of CDDLv1-licensed source. Thus, the licenses are incompatible, and redistributing a binary work incorporating CDDLv1'd and GPLv2'd copyrighted portions constitutes copyright infringement in both directions. (In addition to this big incompatibility, there are also other smaller incompatibilities throughout CDDLv1.)
We believe Sun was aware when drafting CDDLv1 of the incompatibilities; in fact, our research into its history indicates the GPLv2-incompatibility was Sun's design choice. At the time, Sun's apparent goal was to draw developers away from GNU and Linux development into Solaris. Not only did Sun not want code from GNU and Linux in Solaris, more importantly, Sun did not want technological advantages from Solaris' kernel to appear in Linux.
12-03-2016, 23:38 #9
What Constitutes a Combined/Derivative Work?
Once license incompatibility is established, the remaining question is solely whether or not combining ZFS with Linux creates a combined and/or derivative work under copyright law (which then would, in turn, trigger the GPLv2 obligations on the ZFS code).
Conservancy has helped put similar questions (still pending) before a Court, in Hellwig's VMware case that Conservancy currently funds. In fact, the same questions come up with all sorts of GPL-incompatible Linux modules and reuses of Linux code.
Courts have not spoken specifically on this question; precedents that exist are not perfectly on-topic. Citing an opinion of a lawyer is often not helpful in this context, because lawyers advise clients, and argue zealously for their clients' views. When Courts are unclear on a matter, it generates disputes, and only Courts (or possibly new legislation) can ultimately resolve those disputes.
Nevertheless, our lawyers have analyzed these situations with the assistance of our license compliance and software forensics staff for many years, and we have yet to encounter a Linux module that — when distributed in binary form — did not, in our view, yield combined work with Linux. The FSF, stewards of the GPL, have stated many times over the past decades that they believe there is no legal distinction between dynamic and static linking of a C program and we agree. Accordingly, the analysis is quite obvious to us1: if ZFS were statically linked with Linux and shipped as a single work, few would argue it was not a “work based on the Program” under GPLv2. And, if we believe there is no legal difference when we change that linking from static to dynamic, we conclude easily that binary distribution of ZFS plus Linux — even with ZFS in a .ko file — constitutes distribution of a combined work, which we name Linux+ZFS.
Canonical has found some lawyers who disagree — a minority position, from our understanding of community norms. But Canonical's public position on the matter contributes to license uncertainty, and opponents of Free Software may use this as an opportunity to marginalize copyleft enforcement generally. Canonical can resolve the situation by ceasing the infringing distribution, but Oracle can also unilaterally resolve this trivially with a simple relicense of ZFS to a GPL-compatible license.
Thus, all parties currently stand at an impasse. Conservancy (as a Linux copyright holder ourselves), along with the members of our coalition in the GPL Compliance Project for Linux Developers, all agree that Canonical and others infringe Linux copyrights when they distribute zfs.ko. Canonical's lawyers disagree. Oracle refuses to relicense their ZFS copyrights under a GPL-compatible license.
Ultimately, various Courts in the world will have to rule on the more general question of Linux combinations. Conservancy is committed to working towards achieving clarity on these questions in the long term. That work began in earnest last year with the VMware lawsuit, and our work in this area will continue indefinitely, as resources permit. We must do so, because, too often, companies are complacent about compliance. While we and other community-driven organizations have historically avoided lawsuits at any cost in the past, the absence of litigation on these questions caused many companies to treat the GPL as a weaker copyleft than it actually is.
Is The Analysis Different With Source-Only Distribution?
We cannot close discussion without considering one final unique aspect to this situation. CDDLv1 does allow for free redistribution of ZFS source code. We can also therefore consider the requirements when distributing Linux and ZFS in source code form only.
Pure distribution of source with no binaries is undeniably different. When distributing source code and no binaries, requirements in those sections of GPLv2 and CDDLv1 that cover modification and/or binary (or “Executable”, as CDDLv1 calls it) distribution do not activate. Therefore, the analysis is simpler, and we find no specific clause in either license that prohibits source-only redistribution of Linux and ZFS, even on the same distribution media.
Nevertheless, there may be arguments for contributory and/or indirect copyright infringement in many jurisdictions. We present no specific analysis ourselves on the efficacy of a contributory infringement claim regarding source-only distributions of ZFS and Linux. However, in our GPL litigation experience, we have noticed that judges are savvy at sniffing out attempts to circumvent legal requirements, and they are skeptical about attempts to exploit loopholes. Furthermore, we cannot predict Oracle's view — given its past willingness to enforce copyleft licenses, and Oracle's recent attempts to adjudicate the limits of copyright in Court. Downstream users should consider carefully before engaging in even source-only distribution.
We note that Debian's decision to place source-only ZFS in a relegated area of their archive called contrib, is an innovative solution. Debian fortunately had a long-standing policy that contrib was specifically designed for source code that, while licensed under an acceptable license for Debian's Free Software Guidelines, also has a default use that can cause licensing problems for downstream Debian users. Therefore, Debian communicates clearly to their users that this code is problematic by keeping it out of their main archive. Furthermore, Debian does not distribute any binary form of zfs.ko.
(Full disclosure: Conservancy has a services agreement with Debian in which Conservancy occasionally gives its opinions, in a non-legal capacity, to Debian on topics of Free Software licensing, and gave Debian advice on this matter under that agreement. Conservancy is not Debian's legal counsel.)
Do Not Rely On This Document As Legal Advice
You cannot and should not rely on this document as legal advice. Our lawyers, in conjunction with our GPL compliance and software forensics experts, have analyzed the Linux+ZFS that Canonical includes in their Ubuntu 16.04 prereleases. Conservancy has determined, with the advice of both inside and outside law firm legal counsel, that the binary distribution constitutes a derivative and/or combined work of ZFS and Linux together, and therefore violates the GPL, as explained above. We also know from Canonical's blog post that they have found other lawyers to give them contradictory advice. Such situations are common on groundbreaking legal issues, and, after all, copyleft is perhaps the most novel legal construction for copyright in its history. Lawyers and their clients who oppose copyleft will attempt to limit copyleft's scope (with litigation, FUD, and moxie), and those of us who use copyleft as a tool for software freedom will diametrically seek to uphold its scope to achieve the license drafter's and licensors' intended broad impact for software freedom.
Indeed, Conservancy believes this situation is one battle in a larger proxy war by those who seek to limit the scope of strong copyleft generally. Yet, the GPL not only benefits charitable community organizations like Conservancy, but also for-profit companies, since GPL ensures your competitors cannot circumvent the license and gain an unfair advantage. We therefore urge charities, trade associations and companies who care about Linux to stand with us in opposition of GPL violations like this one.
0 More work might be required to relicense all modern ZFS code, since others have contributed, but we expect that those contributors would gladly relicense in the same manner if Oracle does first.
1 More discussion on these issues can be found in this section of Copyleft and the GNU General Public License: A Comprehensive Tutorial and Guide, which is part of copyleft.org, a project co-sponsored by the FSF and Software Freedom Conservancy.
Posted by Bradley M. Kuhn and Karen M. Sandler on February 25, 2016.