TL;DR
Similar to our APT Repository Internals and YUM Repository Internals posts, this post aims to illustrate the inner workings of a Maven repository. Read on if you have ever been curious as to how mvn compile
figures out which dependencies to download and how to retrieve them in order to build your project.
Overview
In this post we’ll examine how dependencies are defined and resolved within your maven project, then we’ll dive into how maven repositories make these dependencies available for consumption.
What is a maven dependency?
A maven dependency is an artifact that your project (or Maven itself, in the case of Maven plugins) needs to have during the maven build lifecycle.
These are declared in the <dependencies/>
section of your project’s pom.xml
file like this:
Maven Coordinates
Most dependency declarations consist of groupId
, artifactId
, and version
fields. A group of these key/value pairs is referred to as the Maven Coordinates for a particular dependency and much like geographical coordinates, they allow you precisely specify a particular dependency in an absolute way.
How does maven locate and resolve dependencies?
Unlike other repository formats (APT, YUM Rubygems, there is no main index file that enumerates all possible artifacts available for that repository. Maven uses the coordinates values for a given dependency to construct a URL according to the maven repository layout.
Maven Repository Layout mapping
For primary artifacts (explained below) the URL template looks like:
Rules for $groupId
According to the specification, the rule is:
$groupId is a array of strings made by splitting the groupId’s on “.” into directories.
So for the groupId
value of org.example.subdepartment
, our $groupId
array would be [org, example, subdepartment]
, which when translated into directories, becomes org/example/subdepartment
.
Primary Artifacts
One of the core features of Maven is its ability to handle Transitive Dependencies. That is, to find and download the dependencies of your dependencies, and their dependencies also, recursively, until they are all satisfied.
Just how your own Maven project has a pom.xml
file listing its main dependencies, those dependencies also have a remote pom
file serving a similar purpose. Maven uses this file to figure out what other dependencies to download. When a coordinate does not contain a classifier
, it is considered a primary artifact and is expected to have a pom
available.
Let’s resolve the pom
and jar
for the given coordinates at the beginning of this post:
<dependency>
<groupId>io.packagecloud</groupId>
<artifactId>client</artifactId>
<version>3.0.0</version>
</dependency>
pom
We turn the groupId
of io.packagecloud
into /io/packagecloud
, then construct the rest of the URL with $artifactId
and $versionId
, like so:
jar
Similarly, for the extension of jar
:
Secondary Artifacts
Secondary artifacts, or “attached artifacts”, are dependencies that you want maven to download that are ancillary to your project. Most often they are used to download the javadocs
and/or sources
for a particular dependency. However, unlike a primary artifact, a secondary artifact is not expected to have a remote pom
and has thus never has any dependencies.
They can be specified in the <dependencies/>
section just like primary artifacts:
Or, you can download them using mvn install:install-file
, like so:
The URL template for secondary artifacts is just like the one for primary artifacts, but with an additional $classifier
variable:
javadoc
sources
Checksums
To verify the downloaded artifacts Maven computes the md5
and sha1
checksum for that artifact and compares it to the values found in the checksum files located at $ARTIFACT_URL.md5
, or $ARTIFACT_URL.sha1
, respectively.
NOTE: This is strictly meant as a way to quickly verify downloads, and it is NOT meant to be used for authentication or security purposes. This is also NOT a substitute for using HTTPS, as checksums can be trivially intercepted and modified along with the modified artifacts.
sha1
For example, the sha1
file for our jar
artifact would be located at:
md5
Similarly, the md5
file for our pom
artifact would be located at:
Signed Artifacts
To absolutely ensure the authenticity of downloaded artifacts, you can configure Maven to download and validate the cryptographic signatures for the artifacts and checksums it downloads (if available).
$artifact.asc
The artifact is signed and deployed to a repository at the following URLs:
$checksum.asc
The checksums for those artifacts are also signed and deployed at the following URLs:
What is a maven repository?
A Maven repository is wherever these constructed artifact URLs live. Most of the time, this is a Web server with a /maven2
document root, but it can actually be any protocol Maven has a transport plugin for.
To make it easier for humans to discover artifacts, most Web based repositories will be configured to render virtual directory listings, for instance the Maven Central repository lets you browse the entire org.apache
group this way: http://repo1.maven.org/maven2/org/apache/.
The local repository
Before Maven attempts to download a particular artifact from a remote repository it checks the local repository. This is usually located at $HOME/.m2/repository
. The local repository follows the same standard repository layout as remote repositories.
Remote repositories
Remote repositories are defined in your project’s pom.xml
file under the <repositories/>
section. For example:
You’ll notice that besides a <url/>
and <id/>
attribute, there are two boolean attributes, <releases>
and <snapshots>
.
If you are on Maven 2.x, then this would be <repository/>
and <snapshotRepository/>
, respectively. Previously, <repository/>
definitions were implicitly release repositories, and it was not possible to support both releases and snapshots.
Repository search order
As of Maven 3.x, repositories are searched in the order in which they are declared.
Release and SNAPSHOT repositories
As seen above, there are two features that can be enabled on repositories, even at the same time.
Release repositories
This is enabled by default on all defined repositories and it simply means that this repository should be added to the list of repositories to use for resolving “released” artifacts. These are artifacts that once published to a coordinate, must not be changed.
Because of the heavily cached and distributed nature of maven repositories (think of everyone's local repository and remote mirrors), you are strongly discouraged from deleting and republishing a changed artifact under the same coordinates. Unless every copy of the previous artifact can be purged from all repositories containing it, this make it difficult to ensure that everyone receives the same artifact given the same coordinates.
SNAPSHOT repositories
When a repository has the “snapshot” feature enabled, this means that Maven will add this to the list of repositories to use only when resolving SNAPSHOT
versions of your dependencies.
What are SNAPSHOT versions?
Having to increase the version and permanently release your software every iteration can painfully lengthen your feedback cycles. Maven solves this problem with SNAPSHOT
versions.
SNAPSHOT
version dependencies look just like regular dependencies, except the version will have -SNAPSHOT
appended to it. For example:
The idea is that you can continuously push your latest changes to 3.0.0-SNAPSHOT
and anyone depending on it will get the latest changes every time they build their project. Then, after a few iterations, and everyone is happy the latest state of 3.0.0-SNAPSHOT
, it can be permanently released as 3.0.0
, and rapid development can continue on 3.0.1-SNAPSHOT
.
maven-metadata.xml
In order to determine the the latest artifact to download for a particular SNAPSHOT
version, Maven uses the Standard Repository Layout to locate a maven-metadata.xml
file for that dependency. For example, using our SNAPSHOT dependency above, Maven constructs the following URL:
This file looks like this:
According to version 1.1.0 of the Maven Repository Metadata Model(latest at time of writing), <snapshotVersion>
contains the latest artifact corresponding to this snapshot version.
Using the <value>
of that <snapshotVersion>
as the $version
in our URL construction scheme, we get the following URL for the jar
extension:
Checksums and signatures work as expected:
As more snapshot artifacts are pushed to 3.0.0-SNAPSHOT
, the maven-metadata.xml
will always get updated to reflect the latest <snapshotVersion>
to use.
Unique vs Non-Unique Snapshots
There are two snapshot “styles” that Maven can use.
Unique Snapshots
These are the snapshot versions detailed in the example above, they use a high resolution timestamp as a version and clients must a maven-metadata.xml
file to resolve the latest. This is the only snapshot style supported by Maven 3.
Non-Unique Snapshots
Maven 2 allowed you to set a <uniqueVersion>false</uniqueVersion>
on a repository definition. When this behavior is selected, there is no maven-metadata.xml
file that is used and “-SNAPSHOT” versions are not treated any differently. The artifact is resolved just like any other. Thus, the URL for our example in a non-unique repository context would look like this:
This artifact URL simply gets overwritten every time there is a new version pushed up at those coordinates.
Due to the obvious issues this introduces, this style has been deprecated for a while now and completely unsupported in Maven 3.
Maven Central and the Super pom.xml
In addition to your project pom.xml
, Maven uses a “Super” pom.xml
to inherit some default configuration shared by all Maven installations. This is where the default repository, Maven Central is defined:
That is why you can depend on artifacts hosted at Maven Central without having to define the repository.
Conclusion
Knowing how Maven constructs URLs and resolves dependencies can help you debug issues with your Maven repository. For more information, be sure to check out the official Maven documentation and Maven Source Code.