You can scale tomcat webapps somewhat well using session affinity and load
distribution. But how? Apache to the rescue.
For each tomcat server, modify the server.xml and change the value for
'jvmRoute' to the ip address of the tomcat server. Example:
<Engine name="Standalone" defaultHost="localhost" jvmRoute="192.168.0.10">
This affects the last token in your jsessionid cookie. Visiting my tomcat, my cookie gets set to the following:
C40ABF646B07162A621856F459977E9B.192.168.0.10
Use apache's mod_rewrite to use apache as a frontend to your tomcat servers. That is, use apache as a reverse proxy. In your httpd.conf:
RewriteMap SERVERS rnd:/etc/httpd/conf/frontends.conf
RewriteCond "%{HTTP_COOKIE}" "(^|;\s*)JSESSIONID=\w*\.([0-9.]+)($|;)"
RewriteRule "^(.*)" "http://%2:8080%{REQUEST_URI}" [P,L]
RewriteRule "^.*;jsessionid=\w*\.([0-9.]+)($|;)" "http://$1:8080%{REQUEST_URI}" [P,L]
RewriteRule "^(.*)" "http://${SERVERS:ALL}:8080%{REQUEST_URI}" [P,L]
This technique is quite similar to the one on
tomcat.apache.org in the docs, but I think it's better. Why? Less files to
modify when you add or remove tomcat servers means less complexity, less errors
and less effort.
-
RewriteCond "%{HTTP_COOKIE}" "(^|;\s*)JSESSIONID=\w*\.([0-9.]+)($|;)"
If a jsessionid cookie is found, go to #2 and store match groups (backreferences) as %1, %2, etc.
-
RewriteRule "^(.*)" "http://%2:8080%{REQUEST_URI}" [P,L]
Session Affinity: Redirect everything using an internal proxy request to the 2nd group
matched in the previous RewriteCond. Since we use the IP as the jvmRoute,
that's what is matched, and your request is proxied to the server that gave
you your cookie.
-
RewriteRule "^.*;jsessionid=\w*\.([0-9.]+)($|;)" "http://$1:8080%{REQUEST_URI}" [P,L]
Session affinity: Tomcat likes to add (who knows why?) ";jsessionid=blah" to the end of the
url when it first sets you up the cookie. In case no cookie is found, this
will proxy your request to the proper server just like the previous rule.
-
RewriteRule "^(.*)" "http://${SERVERS:ALL}:8080%{REQUEST_URI}" [P,L]
Load distribution: Catch-all for anything that didn't have a cookie or
jsessionid thing in the url. "ALL" is just a key from the RewriteMap listed
below. A random one is chosen and inserted.
Since the server ip is stored in the cookie, apache (using regular expressions)
can pull it out and will internally proxy your request through to the proper
tomcat server.
That works great for sessions that already exist, but what about for sessions
that don't exist? That's what ${SERVERS:ALL} is for. You need
something like this in your frontends.conf file:
ALL 192.168.0.10|192.168.0.11
This would be even better if you only used DNS for this. Then, you wouldn't
need to update any config files when you added or removed tomcat servers.
If you had the fallback redirect of:
RewriteRule "^(.*)" "http://${SERVERS:ALL}:8080%{REQUEST_URI}" [P,L]
RewriteRule "^(.*)" "http://mytomcats.foo.com:8080%{REQUEST_URI}" [P,L]
Apache should redirect internally to "mytomcats.foo.com" which should result in
a dns lookup of that hostname. If you have multiple records in that hostname,
you get round-robin balancing across all tomcats for new sessions. When you add
or remove tomcat servers, you don't have to update any config files.
No config files to change when you add new servers? That makes for healthy,
dynamic scaling.
The best way to solve this would be to have tomcat share it's session data, but
it uses multicast, and the network where tomcat lives doesn't have multicast
routing enabled, so that doesn't seem like an option.
Comments: 4 (view comments)
Tags: tomcat, apache, scalability
Permalink: /geekery/session-balancing-across-tomcats-with-apache
posted at: 05:08
-bash-3.1# yum install django
No Match for argument: django
Nothing to do
-bash-3.1# yum install Django
Downloading Packages:
(1/1): Django-0.95.1-1.fc 100% |=========================| 1.5 MB 00:02
Ahh. Clearly.
Comments: 0 (view comments)
Tags: rants, fedora, linux
Permalink: /rants/fedora-yum
posted at: 01:48
I get emails from this site when someone comments.
This morning, this showed up:
Name: Virtual Pharmacy
Email: [snipped]
URL: [snipped]
Hostname: 114.199.36.72.reverse.layeredtech.com (72.36.199.114)
Entry URL: http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2
Comment location: [snipped]
Everyone repeat, what alcohol should be consumed moderately, but what it means? Why to women
recommend to drink more moderately than to men? What is the female alcoholism? WBR LeoP
A quick google search for the strange tail token, "WBR LeoP" reveals a
clear indication that this is comment spam (as if the content didn't give it
away).
The url the spammer used points at pharmacynewsblog.com, which looks like a normal blog.
It's not.
The content is entirely viagra-and-friends related, which is fine. However, examine a simple visible text snippet of the following (this is from the frontpage):
Drug treatment may beat psychotherapy at ...
Google for this phrase and you'll find that it's been plagiarized. But deliciously so:
View source, you'll see:
<p>Drug <b class=ne>joint pain are </b>treatment <BLINK class=ne>of
purchase </BLINK>may <sup class=ne>wellbutrin at </sup>beat <small
class=ne>and paxil vs </small>psychotherapy
The css class 'ne' sets 'display: none' among other properties that make it
stay out of the way of the browser.
This is quite clever, and appears automated.
pharmacynewsblog.com seems to be a somewhat autogenerated spam blog that
takes news postings about viagara and the like and injects random html into it,
with the intention of defeating antispam solutions. Anti-spam engines probably
aren't smart enough to know that it should ignore the text pieces that are
invisible. Who knows.
But, back to the spam comment. I use javascript to poke parts of the comment
form indicating that a javascript-capable browser was used to submit the
comment. If javascript is not detected, the comment is denied.
This comment got through, which means that javascript was enabled, which means
that it was probably a webbrowser that did it.
Here's the apache log snippet:
72.36.199.114 - - [29/Jan/2007:13:01:17 -0500] "GET /blog/geekery/barcamp-sanfrancisco-2.html HTTP/1.1" 200 15903 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:18 -0500] "GET /style.css HTTP/1.1" 200 2584 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:20 -0500] "POST /blog/geekery/barcamp-sanfrancisco-2 HTTP/1.1" 200 16392 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:21 -0500] "GET /style.css HTTP/1.1" 200 2584 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
It didn't fetch any images, but it did pull style sheets, which is strange
behavior if it's a simple spam bot that doesn't care about how a page looks. It
also pulled the blog posting page first, then submitted a comment. Further
indication that this bot is either really clever, or a person is behind the
wheel.
If you search for the ip, 72.36.199.114, the first hit on google is an automagically updated list of
known comment spam hosts.
Comments: 3 (view comments)
Tags: spam, site, web
Permalink: /geekery/comment-spam-got-through
posted at: 13:41
There once was a database named MySQL.
It had a query cache, becuase caching helps performance.
It also had queries you could "prepare" on the server-side, with the hope that
your database server can make some smart decisions what to do with a query
you're going to execute N times during this session.
I told mysql to enable it's caching and use a magic value of 1gb for memory storage. Much to my surprise, I see the following statistic after testing an application:
mysql> show status like 'Qcache_%';
+-------------------------+------------+
| Variable_name | Value |
+-------------------------+------------+
| Qcache_free_blocks | 1 |
| Qcache_free_memory | 1073732648 |
| Qcache_hits | 0 |
| Qcache_inserts | 0 |
| Qcache_lowmem_prunes | 0 |
| Qcache_not_cached | 814702 |
| Qcache_queries_in_cache | 0 |
| Qcache_total_blocks | 1 |
+-------------------------+------------+
8 rows in set (0.00 sec)
Why are so many (all!?) of the queries not cached? Surely I must be doing
something wrong. Reading the doc on caching explained what I can only
understand as a complete lapse of judgement on the part of MySQL developers:
from http://dev.mysql.com/doc/refman/5.0/en/query-cache.html
Note: The query cache is not used for server-side prepared statements. If you're using server-side prepared statements consider that these statement won't be satisfied by the query cache. See Section 22.2.4, C API Prepared Statements.
Any database performance guide anywhere will tell you to use prepared
statements. They're useful from both a security and performance perspective.
Security, becuase you feed the prepared query data and it knows what data types
to expect, erroring when you pass something invalid. It also will handle
strings properly, so you worry less about sql injection. You also get
convenience, in that you don't have to escape your data.
Performance, becuase telling the database what you are about to do lets it
optimize the query.
This performance is defeated, however, if you want to use caching. So, I've got
a dillema! There are two mutually-exclusive (because MySQL sucks) performance-enhancing options available to me: using prepared statements or using caching.
Prepared statements give you two performance benefits (maybe more?). The first,
is the server will parse the query string when you prepare it, and execute the
"parsed" version whenever you invoke it. This saves parsing time; parsing text
is expensive. The second, is that if your database is nice, it will try to
optimize your queries before execution. Using prepared statements will permit
the server to optimize query execution once, and then remember it. Good, right?
Prepared statements improve CPU utilization, in that the cpu can work less
becuase you're teaching the database about what's coming next. Cached query
responses improve disk utilization, and depending on implementation should
vastly outperform most (all?) of the gains from prepared statements. This
assumption I am making is based on the assumption that disk is slow and cpu is
fast.
Cached queries will (should?) cache results of complex queries. This means that
a select query with multiple, complex joins should be cached mapping the query
string to the result. No amount of statement preparation will improve complex
queries becuase they still have to hit disk. Large joins require lots of disk
access, and therefore are slow. Remembering "This complex query" returned "this
happy result" is fast regardless of whether or not it's stored on disk or in
memory. Caching also saves cpu utilization.
I can't believe preparing a query will prevent it from being pulled from the
query cache, but this is clearly the case. Thanks, MySQL, for making a stupid
design decision.
Maybe there's some useful JDBC (oh yeah, the app I'm testing is written in
Java) function that'll give you all the convenience/security benefits of
prepare, but without the server-side bits, and thus let you use the query
cache.
Comments: 2 (view comments)
Tags: mysql, rants, performance
Permalink: /geekery/mysql-prepare-queries-not-cached
posted at: 21:26
I see lots of times where people put their mailing addresses as "foo at bar dot
org" in a hopeful effort to keep spammers from scraping your mailing address.
Heck, mail archive systems often have (and are deployed with) options to
obfuscate email addresses systematically, using the same pattern: foo at bar dot com.
All it does is hurt usability.
Googlng for "* at * dot *" clearly shows lots of matches. It also matches all of the following variants, due to google searches ignoring brackets and such in words:
- foo at bar dot com
- foo [at] bar [dot] com
- foo (at) bar (dot) com
- ... etc ...
Query, scrape, replace 'at' and 'dot' as desired. I now have 54 million email addresses. What now?
Seems like this effort only serves to have people fool themselves as well as to
impede usability. It certainly won't protect you from spam. Why is this method used?
Comments: 2 (view comments)
Tags: spam, mail, google, rants
Permalink: /geekery/anti-spam-obfuscation-easily-defeated
posted at: 22:55
Amazon provides lots of web services. One of these is it's
E-Commerce API
which allows you to search it's vast product database (among other things).
In Pimp, the page for any given listening station shows you the current song
being played. Along with that, I wanted to provide the album cover for the
current track.
You can leverage Amazon's API to search for a given artist and album eventually
leading you to the picture of the album cover. To this end, I wrote a little
python module that lets you search for an artist and album name combination and
will give you a link to the album cover.
So, I wrote albumcover.py as a prototype
to turn an artist and album into a url to the album cover image. It works for
the 20 or so tests I've put through it.
Comments: 1 (view comments)
Tags: python, web services, amazon, pimp, music, web scraping
Permalink: /geekery/pull-album-covers-from-amazon
posted at: 00:52
So, I've been reading docs on python's xml stuff, hoping there's something
simple or comes-default-with-python that'll let me do xpath. Everyone
overcomplicates xml processing. I have no idea why. Python seems to have enough
alternatives to make dealing with xml less painful.
Standard python docs will lead you astray:
kenya(...ojects/pimp/pimp/controllers) % pydoc xml.dom | wc -l
643
Clearly, the pydoc for "xml.dom" has some nice things, right? I mean, documentation is clearly an indication that THE THING THAT IS DOCUMENTED BEING AVAILABLE. Right?
Sounds great. Let's try to use this 'xml.dom' module!
kenya(...ojects/pimp/pimp/controllers) % python -c 'import xml; xml.dom'
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: 'module' object has no attribute 'dom'
WHAT. THE. HELL.
Googling around, it turns out that 'xml' is a fake module that only actually works if you have it the 4Suite modules installed? Maybe?
Why include fake modules that provide complete documentation to modules that do not exist in the standard distribution?
Who's running this ship? I want off. I'll swim if necessary.
As it turns out, I made too-strong of an assumption about python's affinity
towards java-isms. I roughly equated 'import foo' in python as 'import foo.*'
in java. That was incorrect. Importing foo doesn't get you access to things in
it's directory, they have to be imported explicity.
In summary, 'import xml' gets you nothing. 'import xml.dom' gets you nothing.
If you really want minidom's parser, you'll need 'import xml.dom.minidom' or a
'from import' variant.
On another note, the following surprised me. I had a module, foo/bar.py. I
figured 'from foo import *' would grab it. This means 'from xml.dom import *'
doesn't get you minidom and friends.
Perhaps I was hoping for too much, but maybe it's better to import explicitly.
If that's the case ,then why push exceptions that allow '*' to be imported only
from modules, not packages?
Comments: 2 (view comments)
Tags: rants, python, xml
Permalink: /geekery/python-and-xml
posted at: 21:23
It's very hard to believe that 2006 is gone. What a year!
Basic life summary: Graduated from RIT and started working for Google.
This year has been great fun for me. I've had a chance to work on a very wide
range of projects. Some of them were silly, some of them were serious, and some
were useful.
Taking the silly category by storm was my
Yahoo! Hack Day '06
demo of
TastyDrive.
That same day involved my presentation of
keynav. If you missed
Yahoo!'s event, then you certainly missed the amazing concert Beck put on! My
presentation at this event resulted in a kick-ass article about
me in the Wall Street Journal.
Runner-up for silly is definitely pam_captcha, a PAM
module implementing a text-based captcha system. My favorite captcha was
obviously Dance Dance Authentication which received wide angst at the
2006 SPARSA
security competition.
However, pam_captcha ended up being useful in that it caused me to study the
behavior of
brute-force ssh zombies. Good times ;)
Looking over my last year of posts, I am reminded of many project ideas that I
never worked on. Grok,
my expert-like pattern matching tool, has fallen victim to forgetfulness.
Furthermore, many grok-related projects have fallen to the wayside: sysadmin secret
sauce and the obvious children temporal
(ooh, fancy word!?) data storage, grok and
eventdb marraige, and some neat rrdtool
tricks.
Hacks were a-plenty this year. Not all of them received written notes, but some
of the neater ones are my wakeup
script, a hack using squid
and selenium to allow you to unit-test webpages by injecting functionality
(xss using squid), and a long-forgotten touch-screen
keyboard in javascript.
And how can I forget BarCamp? I attended three BarCamp events this year:
New York,
San Francisco, and
Stanford.
Many friends made. These kinds of conferences are absolutely my kind of events.
Signal-to-noise at BarCamp is bliss by comparison to standard computer
conferences.
This year also brought me a new hat, as a FreeBSD src committer so I can
further my work on the mouse system changes.
I miss the free time and opportunities granted as a student. I haven't made up
my mind about the "real world" quite yet, but I'm glad there's no homework.
With that, I say goodbye to 2006. It was a good year. I'm looking forward to 2007!
Comments: 0 (view comments)
Tags: tales-of-olden-time
Permalink: /geekery/year-in-review-2006
posted at: 10:42
|
Search this site
Navigation
Metadata
Home
About
Resume
My Code (SVN)
ARP Security
Dynamic DNS with DHCP
OpenLDAP+Kerberos+SASL
PPP over SSH
SSH Security: /bin/false
Week of Unix Tools
Work Efficiency
fex
firefox tabsearch
firefox urledit
grok
keynav
liboverride
newpsm (FreeBSD)
nis2ldap
pam_captcha
poor man's backup
Solaris audio utility
xboxproxy
xdotool
xmlpresenter
xpathtool
misc scripts
Presentations
Yahoo! Hack Day '06
Unix Essentials
Vi/Vim Essentials
Tag Cloud
Calendar
Friends
BarCamp
Kent Brewster
Tantek Çelik
John Resig
Wesley Shields
Tyler Shields
Technorati
|