Search this site


Metadata

Articles

Projects

Presentations

Apache httpd config to cache the main maven repo

At work, I'm experimenting with Maven as a way to manage java package builds and dependencies.

The maven mirror documentation says pretty explicitly "Do not rsync the entire repo" - it recommends the use of caching proxies. We can do this trivially with apache httpd.

# httpd.conf:
ProxyPass /maven/proxy http://repo1.maven.org/maven2

# What http path to cache for
CacheEnable disk /maven/proxy
# Where on disk to store the cached data
CacheRoot   /srv/repo/maven/proxy

CacheDirLength 2
CacheDirLevels 3 

# Override default cache expiration and control
CacheDefaultExpire 2419200
CacheMaxExpire 2419200

# Ignore requests to not serve from cache. Maven data never changes.
CacheIgnoreCacheControl On

# Default max file size is 64K. Set to 1GB.
CacheMaxFileSize 1073741824
The above config will take requests to http://yourserver/maven/main/... and proxy them through to the main maven repo and also cache the fetch local to your webserver so future fetches will be local.

You tell maven to use your local repo in ~/m2/settings.xml:

  <settings>
    <mirrors>
      <mirror>
        <id>local-mirror</id>
        <name>Local maven mirror</name>
        <-- Replace 'repo.local' with whatever your webserver's name is -->
        <url>http://repo.local/maven/proxy</url>
        <mirrorOf>*</mirrorOf>
      </mirror>
    </mirrors>
  </settings>
Now all my maven dependency fetches are going through a local repo and files get cached to disk for future requests.

Since I already had a repo server for local rpm and rubygems, pushing this 5 line httpd config change with puppet was practically a no-op in terms of implementation.

Session affinity and load distribution with Tomcat and Apache

You can scale tomcat webapps somewhat well using session affinity and load distribution. But how? Apache to the rescue.

For each tomcat server, modify the server.xml and change the value for 'jvmRoute' to the ip address of the tomcat server. Example:

  <Engine name="Standalone" defaultHost="localhost" jvmRoute="192.168.0.10">
This affects the last token in your jsessionid cookie. Visiting my tomcat, my cookie gets set to the following:
C40ABF646B07162A621856F459977E9B.192.168.0.10

Use apache's mod_rewrite to use apache as a frontend to your tomcat servers. That is, use apache as a reverse proxy. In your httpd.conf:

RewriteMap SERVERS rnd:/etc/httpd/conf/frontends.conf
RewriteCond "%{HTTP_COOKIE}"          "(^|;\s*)JSESSIONID=\w*\.([0-9.]+)($|;)"
RewriteRule "^(.*)"                   "http://%2:8080%{REQUEST_URI}"  [P,L]
RewriteRule "^.*;jsessionid=\w*\.([0-9.]+)($|;)"  "http://$1:8080%{REQUEST_URI}"  [P,L]
RewriteRule "^(.*)"                    "http://${SERVERS:ALL}:8080%{REQUEST_URI}" [P,L]
This technique is quite similar to the one on tomcat.apache.org in the docs, but I think it's better. Why? Less files to modify when you add or remove tomcat servers means less complexity, less errors and less effort.

  1. RewriteCond "%{HTTP_COOKIE}" "(^|;\s*)JSESSIONID=\w*\.([0-9.]+)($|;)"
    If a jsessionid cookie is found, go to #2 and store match groups (backreferences) as %1, %2, etc.
  2. RewriteRule "^(.*)" "http://%2:8080%{REQUEST_URI}" [P,L]
    Session Affinity: Redirect everything using an internal proxy request to the 2nd group matched in the previous RewriteCond. Since we use the IP as the jvmRoute, that's what is matched, and your request is proxied to the server that gave you your cookie.
  3. RewriteRule "^.*;jsessionid=\w*\.([0-9.]+)($|;)" "http://$1:8080%{REQUEST_URI}" [P,L]
    Session affinity: Tomcat likes to add (who knows why?) ";jsessionid=blah" to the end of the url when it first sets you up the cookie. In case no cookie is found, this will proxy your request to the proper server just like the previous rule.
  4. RewriteRule "^(.*)" "http://${SERVERS:ALL}:8080%{REQUEST_URI}" [P,L]
    Load distribution: Catch-all for anything that didn't have a cookie or jsessionid thing in the url. "ALL" is just a key from the RewriteMap listed below. A random one is chosen and inserted.

Since the server ip is stored in the cookie, apache (using regular expressions) can pull it out and will internally proxy your request through to the proper tomcat server.

That works great for sessions that already exist, but what about for sessions that don't exist? That's what ${SERVERS:ALL} is for. You need something like this in your frontends.conf file:

ALL 192.168.0.10|192.168.0.11
This would be even better if you only used DNS for this. Then, you wouldn't need to update any config files when you added or removed tomcat servers.

If you had the fallback redirect of:

RewriteRule "^(.*)"       "http://${SERVERS:ALL}:8080%{REQUEST_URI}" [P,L]
RewriteRule "^(.*)"       "http://mytomcats.foo.com:8080%{REQUEST_URI}" [P,L]
Apache should redirect internally to "mytomcats.foo.com" which should result in a dns lookup of that hostname. If you have multiple records in that hostname, you get round-robin balancing across all tomcats for new sessions. When you add or remove tomcat servers, you don't have to update any config files.

No config files to change when you add new servers? That makes for healthy, dynamic scaling.

The best way to solve this would be to have tomcat share it's session data, but it uses multicast, and the network where tomcat lives doesn't have multicast routing enabled, so that doesn't seem like an option.

boredom + apache

Here's a silly oneliner that'll attempt to calculate per-file usage from an apache log.

awk '{print $7}' - | perl -e 'while (<>) { chomp; s!^/([^/]+)!/.html_pages!; 
$u = $1; next if ($u !~ s/^~//); @a = getpwuid(getpwnam($u)); $_ = $a[7] . $_;
$f{$_} += (stat($_))[7] }; map { print $f{$_} . " $_\n" if ($f{$_}) } keys(%f)'

Reads the log data from stdin. Output is unsorted. I'd make it smaller but I'm lazy and tired.