A revised version of this post has been published on the blog of the Apache Software Foundation.
Introduction
If you’re reading this post, you have already been using Apache software. The Apache web server is used by about every second web page on the WWW, including this website. You could say, Apache software runs the WWW. But it doesn’t stop there. Apache is more than a web server. Apache software also runs on mobile devices. Apache software is part of enterprise and banking software. Apache software is literally everywhere in today’s software world.
Apache has become a powerful brand and a philosophy of software development which remains unmatched in the world of open-source. Although the Apache© trademark is a known term even among the less tech-savvy people, many people struggle to define what Apache software really is about, and what role it plays for today’s software development and businesses.
In the last years I’ve learned a lot about Apache through my work on Apache Flink and Apache Beam with Data Artisans. In this post I present some of the things I learned by giving an overview of the Apache Software Foundation and its history. Moreover, I want to show how the “Apache way” of software development shaped the open-source software development as it is today.
The History of the Foundation
The Apache Software Foundation (ASF) was founded in 1999 by a group of open-source enthusiasts and some corporate entities which were eager to sponsor the foundation’s work. Among the first projects was the famous web server called Apache HTTP, which is also simply referred to as “Apache web server”. At that time, the Apache web server was already quite mature. In fact, not only did the Apache web server give the foundation its name but it became the role model for the “Apache way” of open and collaborative software development. To see how that took place, we have to go back a bit further in time.
A Web Server goes a long way
As early as 1994, Rob McCool at the National Center for Supercomputing Applications (NCSA) in Illinois created a simple web server which served pages using one of the early versions of today’s HTTP protocol. Web servers were not ubiquitous like they are today. In these days, the Web was still in its early days and there was only one web browser developed at CERN where the WWW was invented only shortly before. Rob’s web server was adopted quite fruitfully throughout the web due to its extensible nature. When its source code spread, web page administrators around the world developed extensions for the web server and helped to fix errors. When Rob left the NCSA in late 1994, he left a void because nobody maintained the web server along with its extensions. Quickly it became apparent that the group of existing users and developers needed to join forces to be able to maintain NCSA HTTP.
At the beginning of 1995, the Apache Group was formed to coordinate the development of the NCSA HTTP web server. This led to the first release of the Apache web server in April 1995. During the same time, development at NCSA started picking off again and the two teams were in vivid exchange about future ideas to improve the web server. However, the Apache Group was able to develop its version of the web server much faster because of their structure which encouraged worldwide collaboration. At the end of the year, the Apache server had its architecture redone to be modular and execute much faster.
One year later, at the beginning of 1996, the Apache web server already succeeded the popularity of the NCSA HTTP which had been the most popular web server on the Internet until then. Apache 1.0 finally was released on Dec 1, 1995. The web server continued to thrive and is still the most widely used web browser as of this writing.
The Rise of the Foundation
The team effort that led to the development and adoption of the Apache web server was a huge success. The Apache project kept receiving feedback and code changes (also called patches) from people all over the world. Could this be the development model for future software? It became apparent that more and more projects started to organize their groups similarly to the Apache group. Out of this need, the Apache Software Foundation (ASF) was formed as non-profit corporation in June 1999.
The ASF became a framework for open-source software development which, in its entirety, remains unmatched by other forms of open-source software development. The secret of its success is its unique approach to open-source software development where the foundation does not get in the way of the individual developers. Instead, it focuses on providing developers with the infrastructure and a minimal set of rules to manage their projects. The projects itself remain relatively autonomous.
Apache Governance - How does the foundation work?
There are about 200 independent projects running under the Apache umbrella. The question may arise, how does the foundation govern its project? First of all, the ASF is an organization that is run almost entirely by developers. Developers hate to spend too much time with administrative things (who doesn’t?), so the organization is structured in a way that requires little central control but favors autonomy of the projects which run under its umbrella.
Per-Project Entities
For every project (e.g. Apache HTTP, Apache Hadoop, Apache Commons, Apache Flink, Apache Beam, etc.), there is a Project Management Commitee (PMC), Committers, and Users.
Project Management Committee (PMC)
The Project Management Committee (PMC) manages a project and decides over its development direction. In that sense it has similar function as the original Apache Group which led the development of the Apache web server. When a new project is formed, the proposers constitute the initial PMC. Later on, new PMC members can be elected by the existing PMC. Note, that this goes without the permission of the central instances of the foundation. PMC members are also committers (see below).
Committers
Committers can modify the code base of the project but they can’t make major project changing decisions. They are trusted by the PMC to work in the interest of the project. When they contribute changes, they commit (thus, the name) these changes to the project. Committers don’t only change code but they can also update documentation or write blog posts on the project’s website. Committers are selected from the users of the project; more about this process in the Mediocrity section.
Users
Users are as important as the developers because they try out the project’s software, report bugs, and request new features. The term is a slightly confusing because, in the Apache world, most users are actually developers themselves. They are users in the sense that they are using an Apache project for their own work; they are not actively developing the Apache software they are using. However, they may also provide patches to the Committers. Users who contribute to a project are called Contributors. Contributors may eventually become committers.
In the following, the per-project entities are represented as circles. They exist for every project. The larger the circles, the more people. The redder the background color, the more decisional power the group has. Note that the user group circle is too large to fit in the image which is an accurate depiction of the user/developer ratio :)
Foundation-Wide Entities
The ASF does not work without some central instances. Here are the most important entities:
Apache Members
Apache members are the heart of the foundation. A prerequisite to becoming a members is to be active in at least one project. To become a member, you have to show a deep interest in the foundation and try to promote its values. Existing members can then invite you to become a member. Becoming a members does not only mean honor but it also provides the right to elect the Board.
The Board of Directors (Board)
The Board of Directors (Board) takes care of the overall government of the foundation. In particular, it is concerned with legal and financial matters like brand/patent issues, fundraising, and financial planning. The board is elected annually and is composed of Apache members. The current board can be viewed here.
Again, we use circles to set the per-project and foundation-wide entities into relation. Note that there is only one central Board for the entire foundation but Board members can be PMC members in different projects.
Officers of the corporation
Officers of the corporation are the executive part of the administration. They execute the decisions of the board and take care of everyday business.
Infrastructure (INFRA)
The support and administration team (INFRA) is the team that runs the Apache infrastructure and provides tools and support for developers. This includes running the apache.org web site and the mailing lists which are Apache’s main way of communication. Over time, the need for various tools to assist developers became apparent. The main tools available which are used by almost all projects are:
-
Mailing lists, for discussing the roadmap of the project, exchanging ideas, or reporting bugs (unwanted software behavior). Typically the mailing lists are divided into a developer and a user mailing list.
-
Bug trackers, which help developers to keep track of new features or bugs.
-
Version control, which helps developers to keep track of the code changes.
-
Build servers, which help to integrate/test new code or changes to existing code.
The Incubator
The Incubator is a division of the foundation dedicated to forming (bootstrapping) new Apache projects. The process is the following. People (volunteers, enthusiasts, or company employees) make a proposal to the Incubator. The proposal contains the name, the list of initial PMC members, and the motivation and goals for a new project. When the standards of the Apache Software Foundation are fulfilled by the proposal, the project enters the incubation phase. In the incubation phase, projects carry “incubating” with their names which is dropped once they graduate. To graduate, a project has to show that it adheres to the Apache standards and manages to develop a community. Formally, the project needs to prove that to the Incubator Project Management Committee (IPMC) which is comprised of Apache members. All existing work which is donated in the course of entering the incubator and, more importantly, all future work inside the project has to be licensed to the ASF under the Apache License. This ensures that development remains in the open-source according to the Apache philosophy. More about incubation on the official website
Meritocracy - How are decisions made?
The Apache Software Foundation uses the term “meritocracy” to describe how it governs itself. Going back to the ancient Greeks, meritocracy was a political system to put those into power which proofed that they had ability and talent within the field of power. The core of this philosophy can be found throughout history from ancient China to medieval Europe and is still present in many of today’s cultures in the sense that effort, increased responsibility, and service to a part of society ought to pay off in terms of power of decision, social status, or money.
Meritocracy in the Apache Software Foundation denotes that people who either work in the interest of the foundation or a project get promoted. Users who submit patches may be offered committer status. Comitters who are driving the project constructively, may gain PMC status. PMC members active across projects may earn the member status.
From there on, decision-making within the foundation and projects are typically performed using Lazy Consensus. Lazy consensus implies that even a few people can drive a discussion and make decisions for the entire community as long as nobody objects. The discussions have to be held in public on the mailing list. For instance, if a committer decides to introduce a new feature X, she may do so by proposing the feature on the mailing list. If nobody objects, she can go ahead and develop the feature. If lazy consensus does not work because an argument cannot be settled, a majority based vote can be started.
Meritocracy and Lazy Consensus are the core principles for governance within the Apache Software Foundation. On the one hand, Meritocracy ensures that new people can join those already in power. On the other hand, Lazy Consensus creates the opportunity to split up decision-making among the group such that it doesn’t always require the action of all members of the community.
The Apache License - A license for the world of open-source
With the incorporation of the foundation in 1999, a license had to be created to prevent conflicts with the intellectual property contributed by others to the ASF. Originally, the license was meant to be used exclusively by the ASF but it quickly became one of the most widely used software licenses for all kinds of open-source software development.
The Apache license is very liberal in the sense that source code modifications are not required to be open-sourced (= made publicly available) even when the source code is distributed or sold to other entities. This is in contrast to “Copyleft” licenses like the GNU Public License (GPL) which, upon redistribution, requires public attribution and publication of changes made to the source code.
The current version of the Apache License is 2.0, released in January 2004. The changes made since the initial release are only minor but they set the prerequisite for its prevalence. In the first place, the license was only available to Apache projects. Due to the success of the Apache model, people also wanted to use the license outside the foundation. This was made possible in version 2.0. Also, the new version made it possible to combine GPL code with Apache licensed code. In this case, the resulting product would have to be licensed under the GPL to be compatible with the GPL license. The last minor change for version 2.0 was to make inclusion of the license easier and require explicit patents for patent-relevant parts.
Apache Today
The ASF today is not the small circle as it used to be back in 1999. At the time of this writing, the Apache Software Foundation hosts 177 committees (same as PMCs) with close to 300 projects (latest statistics). Note that, a PMC may decide to host multiple projects if necessary. For instance, the Apache Commons PMC has broken up the different parts of the Apache Commons library, e.g. CLI, Email, Daemon, etc. Also, about 25 of the 300 projects have been retired and about 60 are currently in the incubation phase. So realistically, the number of projects is about 200.
The Apache Software Foundation regularly organizes conferences around the world called ApacheCons. These conferences are dedicated to the Apache community or certain topics like Big Data or IoT. It is a place to meet the developers and learn about the latest ideas and trends within the global Apache community. Apart from the official conferences, there are conferences on Apache software organized by companies or external organization, e.g. Strata, FlinkForward, Kafka Summit, Spark Summit, Elasticon.
Here’s a list of some projects that I have run across in the past. I grouped them into categories for a better overview. I realize you might not know a lot of the projects but maybe this list can be the starting point to discover more about these Apache projects :)
Big Data
- Hadoop
- Flink
- Spark
- Beam
- Samza
- Storm
- NiFi
- Kafka
- Flume
- Tez
- Zeppelin
Database
- CouchDB
- HBase
- Zookeeper
- Derby
- Cassandra
Query Tools / APIs
- Hive
- Pig
- Drill
- Crunch
- Ignite
- Solr
- Lucene
Programming Languages
- Groovy
Distributions
- Bigtop
- Ambari
Cloud
- Mesos
- CloudStack
- Libcloud
Machine Learning
- Mahout
- SAMOA
Office
- OpenOffice
Libraries
- Commons
- Avro
- Thrift
- ActiveMQ
- Parquet
Developer Tools
- Ant
- Maven
- Ivy
- Subversion
Web Servers
- Http (the one!)
- Tomcat
Web Frameworks
- Cocoon
- Struts
- Sling
Apache - A Successful Open-Source Development Model
My first attempt to learn more about Apache goes back several years. I was using the Apache License while working on Scalaris at Zuse Institute Berlin. I realized that the license was somehow connected to the Apache Software Foundation but I didn’t really understand the depth of this relationship until I started working on Apache Flink with Data Artisans. Besides the official homepage of the foundation, relatively little information was available on the Internet about the foundation and its projects. In hindsight, the best source of information would have been to read the email archives, ask the developers, or become a developer yourself :)
Still today, I couldn’t find an introductory guide to the ASF. So I wrote this blog post. I hope that I could provide an overview of the ASF and show you how significant the foundation has been for the open-source software development.
Thank you
Thank you for reading this article. Please drop me a message if I got something wrong or you would like to comment on anything.
Thank you to the Apache Flink project. Especially to Robert Metzger, Vasia Kalavri, Henry Saputra, Aljoscha Krettek, Matthias Sax, Ufuk Celebi, Till Rohrmann, Fabian Hueske, Stephan Ewen, and Kostas Tzoumas. You taught me a lot about the Apache way. Thanks also to the Apache Beam community which just graduated from the Incubator and has proven to be an excellent member of the Apache family.