I have been meaning to update this blog regularly. Ehem. So here comes a short and hopefully entertaining post for the Maven/SBT/Gradle users out there.
Ever wondered who or what is behind the central repository which is used to download your dependencies for Maven or SBT builds? Did you specify the repository URL in your build file? You didn’t?
The actual URL of the repository is http://repo1.maven.org/maven2. You can browse all the artifacts by package name if you click the link.
This URL is hardcoded inside Maven and is used to lookup and retrieve build artifacts.
In fact, when you build Maven itself, this URL is used to retrieve Maven’s dependencies from Maven Central (crazy, no?). Why? Because Maven is built with…Maven :)
Here is a snippet from Maven’s bootstrap code for constructing the initial Maven version which is then used to build Maven itself:
Hardcoded URL from ArtifactDownloader:
Note that this bootstrap process is pretty common in the compiler domain where you want to write the compiler itself in the language for which you build the compiler. Likewise, the authors of Maven wanted to use Maven to build…Maven.
The author already had some doubts whether hardcoding the URL would be a good idea and, thus, commented in the same file:
In fact, the Maven super POM sets the Maven Central URL.
If you haven’t worked with Maven, the
pom.xml (POM) is the build file
where you define the dependencies and properties of your project.
When you define your
pom.xml, by default, you inherit from the
super pom which
defines the central repository in the
However, it looks like it is not possible to go without the Maven Central repository because Maven contains code which checks if Maven Central is defined as a repository and, if not, re-adds it:
Taken from DefaultMavenExecutionRequestPopulator.java.
And indeed, the above method
createDefaultRemoteRepository uses the default central repository URL:
DEFAULT_REMOTE_REPO_URL is stored in the
So you can add your own repositories through the
<repository> XML tag but
you can’t get rid of the default. That’s interesting.
Wait, now the URL
What is going on here?
Well, the truth is that Maven Central is a CDN. A CDN is basically a set of servers which distribute data (e.g. web pages, videos, Maven artifacts, etc.) in such a way that the data is reliably and quickly accessible throughout the internet.
Who operates the CDN? Surprisingly, this is a service of a company called Sonatype Inc. Sonatype offers commercial services for open-source software with products related to repositories, continuous integration, and security. Their non-commercial branch is sonatype.org which gives back some of their services to open-source projects. One of these services is Maven Central.
The Maven Central website lists “Producers” who publish artifacts to Maven Central. The following organizations are currently “Producers”:
- eXo Platform
- Oracle / java.net
As we discovered, the default Maven Central URL has been changed from
Keep in mind that
maven.org is a domain owned by Sonatype, as opposed to
apache.org which is owned by the Apache Software Foundation.
Maven Central is a crucial component for Apache projects. As we have learned, Apache projects publish their artifacts on Maven Central and almost all of their dependencies are hosted on Maven Central as well.
Some stats of the central repository:
|Total number of artifacts indexed (GAV):||2,426,748|
|Total number of unique artifacts indexed (GA):||215,031|
That’s a lot of artifacts. If one day, the central repository breaks for whatever reason, developers would be in for a treat :)
Luckily, thanks to the repository URL now defaulting to
repo.maven.apache.org, we have the Apache Software Foundation to completely takeover any traffic in case the Sonatype CDN shut down. And I’m assuming they also have a copy of the artifacts.
Phew. So we are good after all.
How do other build systems serve their build dependencies? Is the situation any different there? Not really. Python build tools, for instance, use PyPi which is also backed by a central repository. Ruby Gemfiles use RubyGems. C/C++ build tools typically assume the dependencies have already been installed, e.g. via the package manager of the operating system. Of course the package manager also accesses central repositories. I have yet to find out who runs all these repositories :)
Thanks for reading this post. It was fun to learn about the central repository that we mostly take for granted. Thank you Sonatype for your service!
So we are really good after all :)