I have been meaning to update this blog regularly. Ehem. So here comes a short and hopefully entertaining post for the Maven/SBT/Gradle users out there.

Ever wondered who or what is behind the central repository which is used to download your dependencies for Maven or SBT builds? Did you specify the repository URL in your build file? You didn’t?

The actual URL of the repository is http://repo1.maven.org/maven2. You can browse all the artifacts by package name if you click the link.

This URL is hardcoded inside Maven and is used to lookup and retrieve build artifacts.

In fact, when you build Maven itself, this URL is used to retrieve Maven’s dependencies from Maven Central (crazy, no?). Why? Because Maven is built with…Maven :)

Here is a snippet from Maven’s bootstrap code for constructing the initial Maven version which is then used to build Maven itself:

Hardcoded URL from ArtifactDownloader:

private static final String REPO_URL = "http://repo1.maven.org/maven2";

Note that this bootstrap process is pretty common in the compiler domain where you want to write the compiler itself in the language for which you build the compiler. Likewise, the authors of Maven wanted to use Maven to build…Maven.

The author already had some doubts whether hardcoding the URL would be a good idea and, thus, commented in the same file:

// TODO: use super POM?
Repository repository = new Repository( REPO_URL, Repository.LAYOUT_DEFAULT );

In fact, the Maven super POM sets the Maven Central URL. If you haven’t worked with Maven, the pom.xml (POM) is the build file where you define the dependencies and properties of your project. When you define your pom.xml, by default, you inherit from the super pom which defines the central repository in the <repositories> section:

<repositories>
    <repository>
      <id>central</id>
      <name>Central Repository</name>
      <url>https://repo.maven.apache.org/maven2</url>
      ...
    </repository>
</repositories>

However, it looks like it is not possible to go without the Maven Central repository because Maven contains code which checks if Maven Central is defined as a repository and, if not, re-adds it:

if ( !definedRepositories.contains( RepositorySystem.DEFAULT_REMOTE_REPO_ID ) ) {
    try {
        request.addRemoteRepository(
            repositorySystem.createDefaultRemoteRepository( request ));
    } catch ( Exception e ) {
        throw new MavenExecutionRequestPopulationException(
            "Cannot create default remote repository.", e );
}

Taken from DefaultMavenExecutionRequestPopulator.java.

And indeed, the above method createDefaultRemoteRepository uses the default central repository URL:

public ArtifactRepository createDefaultRemoteRepository( MavenExecutionRequest request ) throws Exception {
    return createRepository( RepositorySystem.DEFAULT_REMOTE_REPO_URL, RepositorySystem.DEFAULT_REMOTE_REPO_ID, true, ArtifactRepositoryPolicy.UPDATE_POLICY_DAILY, false, ArtifactRepositoryPolicy.UPDATE_POLICY_DAILY, ArtifactRepositoryPolicy.CHECKSUM_POLICY_WARN );
}

DEFAULT_REMOTE_REPO_URL is stored in the RepositorySystem class:

String DEFAULT_REMOTE_REPO_URL = "https://repo.maven.apache.org/maven2";

So you can add your own repositories through the <repository> XML tag but you can’t get rid of the default. That’s interesting.

Wait, now the URL is https://repo.maven.apache.org/maven2/?!

What is going on here?

Well, the truth is that Maven Central is a CDN. A CDN is basically a set of servers which distribute data (e.g. web pages, videos, Maven artifacts, etc.) in such a way that the data is reliably and quickly accessible throughout the internet.

Who operates the CDN? Surprisingly, this is a service of a company called Sonatype Inc. Sonatype offers commercial services for open-source software with products related to repositories, continuous integration, and security. Their non-commercial branch is sonatype.org which gives back some of their services to open-source projects. One of these services is Maven Central.

The Maven Central website lists “Producers” who publish artifacts to Maven Central. The following organizations are currently “Producers”:

  • Apache
  • Atlassian
  • eXo Platform
  • JBoss/RedHat
  • Liferay
  • Oracle / java.net

As we discovered, the default Maven Central URL has been changed from http://repo1.maven.org/maven2 to https://repo.maven.apache.org/maven2/. Keep in mind that maven.org is a domain owned by Sonatype, as opposed to apache.org which is owned by the Apache Software Foundation.

Maven Central is a crucial component for Apache projects. As we have learned, Apache projects publish their artifacts on Maven Central and almost all of their dependencies are hosted on Maven Central as well.

Some stats of the central repository:

Total number of artifacts indexed (GAV): 2,426,748
Total number of unique artifacts indexed (GA): 215,031

That’s a lot of artifacts. If one day, the central repository breaks for whatever reason, developers would be in for a treat :)

Luckily, thanks to the repository URL now defaulting to repo.maven.apache.org, we have the Apache Software Foundation to completely takeover any traffic in case the Sonatype CDN shut down. And I’m assuming they also have a copy of the artifacts.

Phew. So we are good after all.

How do other build systems serve their build dependencies? Is the situation any different there? Not really. Python build tools, for instance, use PyPi which is also backed by a central repository. Ruby Gemfiles use RubyGems. C/C++ build tools typically assume the dependencies have already been installed, e.g. via the package manager of the operating system. Of course the package manager also accesses central repositories. I have yet to find out who runs all these repositories :)

Thanks for reading this post. It was fun to learn about the central repository that we mostly take for granted. Thank you Sonatype for your service!

If you don’t know Maven, it’s a a great build system which powers a lot of open-source projects. Check it out on GitHub. Or check out the The 5 Minute Guide to Maven.


EDIT: As pointed out by Robert Scholte, there are two Maven Central mirrors listed on the maven.org Central Repository page. One at ibiblio.org, the other hosted by Google.

So we are really good after all :)