Search this site


Page 1 of 2  [next]

Metadata

Articles

Projects

Presentations

SSL handshake latency and HTTPS optimizations.

At work today, I started investigating the latency differences for similar requests between HTTP and HTTPS. Historically, I was running with the assumption that higher latency on HTTPS (SSL) traffic was to be expected since SSL handshakes are more CPU intensive. I didn't really think about the network consequences of SSL until today.

It's all in the handshake.

TCP handshake is a 3-packet event. The client sends 2 packets, the server sends 1. Best case, you're looking at one round-trip for establishing your connection. We can show this empirically by comparing ping and tcp connect times:

% fping -q -c 5 www.csh.rit.edu
www.csh.rit.edu : xmt/rcv/%loss = 5/5/0%, min/avg/max = 112/115/123
Average is 115ms for ping round-trip. How about TCP? Let's ask curl how long tcp connect takes:
% seq 5 | xargs -I@ -n1 curl -so /dev/null -w "%{time_connect}\n" http://www.csh.rit.edu
0.117
0.116
0.117
0.116
0.116
There's your best case. This is because when you (the client) receive the 2nd packet in the handshake (SYN+ACK), you reply with ACK and consider the connection open. Exactly 1 round-trip is required before you can send your http request.

What about when using SSL? Let's ask curl again:

% curl -kso /dev/null -w "tcp:%{time_connect}, ssldone:%{time_appconnect}\n" https://www.csh.rit.edu
tcp:0.117, ssldone:0.408

# How about to google?
% curl -kso /dev/null -w "tcp:%{time_connect}, ssldone:%{time_appconnect}\n" https://www.google.com
tcp:0.021, ssldone:0.068

3.5x jump in latency just for adding SSL to the mix, and this is before we sent the http request.

The reason for this is easily shown with tcpdump. For this test, I'll use tcpdump to sniff https traffic and then use openssl s_client to simply connect to the http server over ssl and do nothing else. Start tcpdump first, then run openssl s_client.

terminal1 % sudo tcpdump -ttttt -i any 'port 443 and host www.csh.rit.edu'
...

terminal2 % openssl s_client -connect www.csh.rit.edu:443
...

Tcpdump output trimmed for content:

# Start TCP Handshake
00:00:00.000000 IP snack.home.40855 > csh.rit.edu.https: Flags [S] ...
00:00:00.114298 IP csh.rit.edu.https > snack.home.40855: Flags [S.] ...
00:00:00.114341 IP snack.home.40855 > csh.rit.edu.https: Flags [.] ...
# TCP Handshake complete.

# Start SSL Handshake.
00:00:00.114769 IP snack.home.40855 > csh.rit.edu.https: Flags [P.] ...
00:00:00.226456 IP csh.rit.edu.https > snack.home.40855: Flags [.] ...
00:00:00.261945 IP csh.rit.edu.https > snack.home.40855: Flags [.] ...
00:00:00.261960 IP csh.rit.edu.https > snack.home.40855: Flags [P.] ...
00:00:00.261985 IP snack.home.40855 > csh.rit.edu.https: Flags [.] ...
00:00:00.261998 IP snack.home.40855 > csh.rit.edu.https: Flags [.] ...
00:00:00.273284 IP snack.home.40855 > csh.rit.edu.https: Flags [P.] ...
00:00:00.398473 IP csh.rit.edu.https > snack.home.40855: Flags [P.] ...
00:00:00.436372 IP snack.home.40855 > csh.rit.edu.https: Flags [.] ...

# SSL handshake complete, ready to send HTTP request. 
# At this point, openssl s_client is sitting waiting for you to type something
# into stdin.

Summarizing the above tcpdump data for this ssl handshake:
  • 12 packets for SSL, vs 3 for TCP alone
  • TCP handshake took 114ms
  • Total SSL handshake time was 436ms
  • Number of network round-trips was 3.
  • SSL portion took 322ms (network and crypto)
The server tested above has a 2048 bit ssl cert. Running 'openssl speed rsa' on the webserver shows it can do a signature in 22ms:
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.022382s 0.000542s     44.7   1845.4
Anyway. The point is, no matter how fast your SSL accelerators (hardware loadbalancer, etc), if your SSL end points aren't near the user, then your first connect will be slow. As shown above, 22ms for the crypto piece of SSL handshake, which means 300ms of the SSL portion above was likely network latency and some other overhead.

Once SSL is established, though, it switches to a block cipher (3DES, etc) which is much faster and the resource (network, cpu) overhead is pretty tiny by comparison.

Summarizing from above: Using SSL incurs a 3.5x latency overhead for each handshake, but afterwards it's generally fast like plain TCP. If you accept this conclusion, let's examine how this can affect website performance.

Got firebug? Open any website. Seriously. Watch the network activity. How many HTTP requests are made? Can you tell how many of those that go to the same domain use http pipelining (or keepalive)? How many initiate new requests each time? You can track this with tcpdump by looking for 'syn' packets if you want (tcpdump 'tcp[tcpflags] == tcp-syn').

What about the street wisdom for high-performance web servers? HAProxy's site says:

"If a site needs keep-alive, there is a real problem. Highly loaded sites often disable keep-alive to support the maximum number of simultaneous clients. The real downside of not having keep-alive is a slightly increased latency to fetch objects. Browsers double the number of concurrent connections on non-keepalive sites to compensate for this."
Disabling keep-alive on SSL connections means every single http request is going to take 3 round-trips before even asking for data. If your server is 100ms away, and you have 10 resources to serve on a single page, that's 3 seconds of network latency before you include SSL crypto or resource transfer time. With keep alive, you could eat that handshake cost only once instead of 10 times.

Many browsers will open multiple simultaneous connections to any given webserver if it needs to fetch multiple resources. Idea is that parallelism gets you more tasty http resources in a shorter time. If the browser opens two connections in parallel, you'll still incur many sequential SSL handshakes that slow your resource fetching down. More SSL handshakes in parallel means higher CPU burden, too, and ultimately memory (per open connection) scales more cheaply than does CPU time - think: above, one active connection cost 22ms of time (most of which is spent in CPU) and costs much more than that connection holds in memory resources and scales better (easier to grow memory than cpu).

For some data, Google and Facebook both permit keep-alive:

% URL=https://s-static.ak.facebook.com/rsrc.php/zPET4/hash/9e65hu86.js
% curl  -w "tcp: %{time_connect} ssl:%{time_appconnect}\n" -sk -o /dev/null $URL -o /dev/null $URL
tcp: 0.038 ssl:0.088
tcp: 0.000 ssl:0.000

% URL=https://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js
% curl  -w "tcp: %{time_connect} ssl:%{time_appconnect}\n" -sk -o /dev/null $URL -o /dev/null $URL
tcp: 0.054 ssl:0.132
tcp: 0.000 ssl:0.000
The 2nd line of output reports zero time spent in tcp and ssl handshaking. Further, if you tell curl to output response headers (curl -D -) you'll see "Connection: keep-alive". This is data showing that at least some of big folks with massive qps are using keep alive.

Remember that new handshakes are high cpu usage, but existing SSL connections generally aren't as they are using a cheaper block cipher after the handshake. Disabling keep alive ensures that every request will incur an SSL handshake which can quickly overload a moderately-used server without SSL acceleration hardware if you have a large ssl key (2048 or 4096bit key).

Even if you have SSL offloading to special hardware, you're still incuring the higher network latency that can't be compensated by faster hardware. Frankly, in most cases it's more cost effective to buy a weaker SSL certificate (1024 bit) than it is to buy SSL hardware - See Google's Velocity 2010 talk on SSL.

By the way, on modern hardware you can do a decent number of SSL handshakes per second with 1024bit keys, but 2048bit and 4096bit keys are much harder:

# 'openssl speed rsa' done on an Intel X5550 (2.66gHz)
rsa 1024 bits 0.000496s 0.000027s   2016.3  36713.2
rsa 2048 bits 0.003095s 0.000093s    323.1  10799.2
rsa 4096 bits 0.021688s 0.000345s     46.1   2901.5

Fixing SSL latency is not totally trivial. The CPU intensive part can be handled by special hardware if you can afford it, but the only way sure way to solve network round-trip latency is to be closer to your user and/or to work on minimizing the total number of round-trips. You can be further from your users if you don't force things like keep-alive to be off, which can save you money in the long run by letting you have better choices of datacenter locations.

Site design refresher / CSS overflow property on IE 7

I updated some of this site's design. The most obvious changes are cosmetic (new banner logo at the top, slight color/layout differences). The least obvious change (hopefully) is that this site is now layed out purely with CSS, not tables.

After refreshing my rotting CSS braincells, I got a layout working properly and was quite happy. Then I tested in IE, and saw my <pre> tags being displayed incorrectly (according to my desire, not necessarily the code, which don't always align):

The above should have been two lines of text with a horizontal scroll bar. I want to say use scroll-bar for text that's too wide but expand vertically without a scroll bar: Never a vertical scroll bar, only ever a horizontal one.

I've spent a bit today scouring the web with not much help. I've randomly permuted css values for overflow, overflow-x, overflow-y, min-height, etc. Having failed that, I read everything I could find from randomly permuted search queries, one of which lead me to a depressingly long detail about IE6's expanding box problems. The page claims (and several others do, also) that IE7 fixes several box expansion problems.

I created a very small demo with minimal CSS to show the problem here: Click here to view the demo. It includes the solution I found, detailed below.

After some other random permutation, I gave up and tried wrapping the pre in a div and applying the overflow properties to the div. It worked. It's 2009 and I still have to deal with weird and obscure browser rendering inconsistencies. I came up with this:

<style>
  .scroll-wide {
    height: auto;
    overflow-x: auto;
    overflow-y: hidden;
    max-width: 500px;
  }

  /* On firefox, pre tags have a top and bottom margin > 0, which makes your 
   * scrolling div have a blank top line, which isn't what
   * we want. Fix one weirdness to find another? I didn't fully investigate.
   * Here's the fix: */
  div pre {
    margin-top: 0;
    margin-bottom: auto;
  }
</style>

...

<div class="scroll-wide">
  <pre>
    stuff
  </pre>
</div>

Solving this with wrap-pre-in-a-div can be automated with jQuery and a CSS definition:

// javascript
$("pre").wrap("<div class='pre-wrap'></div>");

// In CSS
div.pre-wrap {
  /* overflow/height/whatever options */
}
It's still possible I was doing something wrong and that this hack isn't necessary, but I don't know. I'm just glad to have it working now.

PS: If you use Meyer's reset.css, you'll want to include pre { margin-bottom: auto; } , or IE will again clip the bottom of the pre contents with the scrollbar.

Comment spam that got through

I get emails from this site when someone comments.

This morning, this showed up:

Name: Virtual Pharmacy
Email: [snipped]
URL: [snipped]
Hostname: 114.199.36.72.reverse.layeredtech.com (72.36.199.114)
Entry URL: http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2
Comment location: [snipped]

Everyone repeat, what alcohol should be consumed moderately, but what it means? Why to women
 recommend to drink more moderately than to men? What is the female alcoholism? WBR LeoP
A quick google search for the strange tail token, "WBR LeoP" reveals a clear indication that this is comment spam (as if the content didn't give it away).

The url the spammer used points at pharmacynewsblog.com, which looks like a normal blog.

It's not.

The content is entirely viagra-and-friends related, which is fine. However, examine a simple visible text snippet of the following (this is from the frontpage):

Drug treatment may beat psychotherapy at ...
Google for this phrase and you'll find that it's been plagiarized. But deliciously so:

View source, you'll see:

<p>Drug <b class=ne>joint pain are </b>treatment <BLINK class=ne>of
purchase </BLINK>may <sup class=ne>wellbutrin at </sup>beat <small
class=ne>and paxil vs </small>psychotherapy
The css class 'ne' sets 'display: none' among other properties that make it stay out of the way of the browser.

This is quite clever, and appears automated.

pharmacynewsblog.com seems to be a somewhat autogenerated spam blog that takes news postings about viagara and the like and injects random html into it, with the intention of defeating antispam solutions. Anti-spam engines probably aren't smart enough to know that it should ignore the text pieces that are invisible. Who knows.

But, back to the spam comment. I use javascript to poke parts of the comment form indicating that a javascript-capable browser was used to submit the comment. If javascript is not detected, the comment is denied.

This comment got through, which means that javascript was enabled, which means that it was probably a webbrowser that did it.

Here's the apache log snippet:

72.36.199.114 - - [29/Jan/2007:13:01:17 -0500] "GET /blog/geekery/barcamp-sanfrancisco-2.html HTTP/1.1" 200 15903 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:18 -0500] "GET /style.css HTTP/1.1" 200 2584 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:20 -0500] "POST /blog/geekery/barcamp-sanfrancisco-2 HTTP/1.1" 200 16392 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
72.36.199.114 - - [29/Jan/2007:13:01:21 -0500] "GET /style.css HTTP/1.1" 200 2584 "http://www.semicomplete.com/blog/geekery/barcamp-sanfrancisco-2" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
It didn't fetch any images, but it did pull style sheets, which is strange behavior if it's a simple spam bot that doesn't care about how a page looks. It also pulled the blog posting page first, then submitted a comment. Further indication that this bot is either really clever, or a person is behind the wheel.

If you search for the ip, 72.36.199.114, the first hit on google is an automagically updated list of known comment spam hosts.

Squid + Selenium == Win

Selenium cannot be used to test remote sites becuase browsers have cross-site scripting protection which prevents you from modifying the content from other domains. What happens when we fool the browser into believing content comes from a given domain? An extremely simple squid configuration can do just this.

I wrote one test. This test visits technorati's homepage, searches for 'barcamp' and verifies that a text string exists on the search results.

Steps:

  1. Tell Firefox to use my squid proxy as an HTTP proxy
  2. Visit http://www.technorati.com/_selenium/
  3. Run the tests
Simple, right?

The squid proxy intercepts any requests for '/_selenium' and redirects them internally to my selenium web server. This has been tested in IE, Firefox, and Safari with 100% success over vanilla http. HTTPS probably doesn't work for obvious "Duh, it's encrypted" reasons. Squid can fix this aswell with ssl reverse proxying.

If I run my single test, the result is something that looks like the following (Firefox and IE, respectively):

Pyblosxom single-entry page title plugin

The page titles pyblosxom provides are usually great. However, when there is only one entry displayed, I feel it would be better to rely on that entry's title.

I wrote a very short plugin to do just that. Turns out the plugin api for pyblosxom is quite easy to understand, and this hack was only about 10 lines.

pagetitle.py adds a new variable which will contain the standard page title, unless there is only one entry in view. If there is only one entry in view, the page title is augmented with the story title aswell. This makes search engine results and browsers happier, as they can recognize what your page is about by the title. User experience good, also good for search engines.

The new variable you want to use is: $blog_title_or_entry_title

If you want to get a better idea of what this plugin does, you can click the permalink below to view only this entry. The page title (in the url bar) should now reflect this entry's title.

download pagetitle.py

The CSH Bawls Programming Competition

Yesterday, I participated in a 12-hour coding-binge competition. It started at 7pm Friday night and ran until 7am Saturday morning. It was fueled by Computer Science House and Bawls, both sponsors of the event. Needless to say, I haven't gotten much sleep today.

The competition website is here. Go there if you want to view this year's objectives.

The Dream Team consisted of John Resig, Darrin Mann, Matt Bruce, and myself. Darrin, Resig, and I are all quite proficient at web development, so we decided this year we would represent ourselves as "Team JavaScript" - and do everything possible in javascript. Bruce is not a programmer, but I enlisted his graphical art skills because I figured with our team doing some web-based project, we definitely needed an artist.

After reviewing all the objectives, we came up with a significant modification upon the Sudoku objective. The sudoku objective was a problem that lacked much room for innovation, so we went further and instead of solving Sudoku, wrote a web-based version of an extremely popular game in Second Life. The contest organizer approved of our new objective, so we did just that.

Resig worked on game logic, I worked on chat features, Darrin worked on scoring and game generation, and Bruce worked on the interface graphics. Becuase our tasks were all mostly unrelated, we could develop them independently. Most of the game was completed in about 6 hours, and the remainder of the time was spent fixing bugs, refactoring, and some minor redesign.

The backends were minimal. The chat backend was only 70 lines of perl, and the score backend was 9 lines of /bin/sh. Everything else was handled in the browser. We leveraged Resig's jQuery to make development faster. Development went extremely smooth, a testament to the "Dream Team"-nature of our team, perhaps? ;)

The game worked by presenting everyone with the same game - so you can compete for the highest score. You could also chat during and between games, if you wanted to.

A screenshot can be found here. At the end of the competition, we only had one known bug left. That bug didn't affect gameplay, and we were all tired, so it didn't get fixed. There were a few other issues that remained unresolved that may or may not be related to our code. Firefox was having issues with various things we were doing, and we couldn't tell if it was our fault or not.

Despite the fact that I probably shouldn't have attended the competition due to scholastic time constraints, I was glad I went. We had a blast writing the game.

We may get some time in the near future to improve the codebase and put it up online so anyone can play. There are quite a few important features that need to be added before it'll be useful as a public game.

Xvfb + Firefox

Resig has a bunch of unit tests he does to make sure jQuery works properly on whatever browser. Manually running and checking unit test results is annoying and time consuming. Let's automate this.

Update (May 2010): See this post for more details on automating xserver startup without having to worry about display numbers (:1, :2, etc).

Combine something simple like Firefox and Xvfb (X Virtual Frame Buffer), and you've got a simple way to run Firefox without a visible display.

Let's start Xvfb:

startx -- `which Xvfb` :1 -screen 0 1024x768x24

# Or with Xvnc (also headless)
startx -- `which Xvnc` :1 -geometry 1024x768x24

# Or with Xephyr (nested X server, requires X)
startx -- `which Xephyr` :1 -screen 1024x768x24

This starts Xvfb running on :1 with a screen size of 1024x768 and 24bits/pixel color depth. Now, let's run firefox:
DISPLAY=:1 firefox

# Or, if you run csh or tcsh
env DISPLAY=:1 firefox
Seems simple enough. What now? We want to tell firefox to go to google.com, perhaps.
DISPLAY=:1 firefox-remote http://www.google.com/
Now, let's take a screenshot (requires ImageMagick's import command):
DISPLAY=:1 import -window root googledotcom.png
Lets see what that looks like: googledotcom.png

While this isn't complicated, we could very easily automate lots of magic using something like the Selenium extension, all without requiring the use of a visual display (Monitor). Hopefully I'll find time to work on something cool using this soon.

Problems with screen scraping and other website interaction automation is that it almost always needs to be done without a browser. For instance, all of my screen scraping adventures have been using Perl. Browsers already know how to speak to the web, so why reinvent the wheel?

Firefox has lots of javascript-magic extensions such as greasemonkey and Selenium to let you execute browser-side javascript and activity automatically. Combine these together with Xvfb, and you can automate lots of things behind the scenes.

Tie this back to unit tests. Instead of simply displaying results of unit tests, have the page also report the results to a cgi script on the webserver. This will let you automatically test websites using a web browser and have it automatically report the results back to a server.

xulrunner

I've been working on various touchscreen-related projects off-and-on and I've always wanted to use Firefox (Gecko, really) as the primary interface. HTML and JavaScript are quick and easy to hack together to provide a simple interface for touching and tapping goodness. However, the only touchscreen-based system I have access to has a 133mHz processor on it and Firefox takes about 15 minutes to start up.

I knew about the GRE (Gecko Runtime Environment) but I've never bothered learning about Mozilla's Embedded API to write my own application. Someone at Mozilla decided to do just that and more. I found this project today, called XULRunner.

XULRunner is a fantastic application that provides Gecko, XUL, JavaScript, et al; all without a fancy browser wrapping it. This is, in my opinion, perfect for embedded applications that may need/want web-browsing capability.

Applications for XULRunner are devilishly easy to write if you know XUL or have a server to serve webpages. A XULRunner tutorial mentions having to set a "home page" for your XULRunner application, it uses this:

pref("toolkit.defaultChromeURI", 
     "chrome://applicationName/content/startPage.xul");
I didn't want to write a full XUL application. I wanted to use my existing HTML/JavaScript-based web kiosk pages. So, what do I use?
pref("toolkit.defaultChromeURI",
     "http://www.csh.rit.edu/~psionic/projects/kioskweb/demo/");
Start XULRunner, and a window pops up with my webpage in it. Fantastic!

I haven't had a chance to test XULRunner on the touchscreen system yet, but seeing as how it lacks much of the code that makes firefox slightly bulky I'm hoping it will startup and run much faster. We'll see soon!

If you're looking to write a web-based application that uses XUL or even simply HTML+JavaScript, give XULRunner a look.

javascript animation

LINKS IN THIS DO NOT WORK

I've made some more cool changes to my javascript effects library thing. You can see a demo of the two new effects here:

http://kenya.csh.rit.edu/static/test.html

Pimp's web interface is coming along smoothly. There are so many layers to this application it hurts!The server is python with json, xmlrpc, sqlite, and a few other pieces. The client is xhtml with javascript and css. There are so many places something can go wrong, whee!

I'll probably push my javascript stuff to this site as soon as it's finished. In the meantime, you can always look through my subversion repo.

templating with xhtml, xpath, and python

I've been gradually researching interesting ways to go about templating pages for Pimp 4.0 (rewrite in python). I've come to the conclusion that regexp replacement is hackish. Using a big templating toolkit is too much effort for now. However, I've come up with a solution I've yet to test thorougly, but the gist of it is:

Use an XML DOM parser to get a DOM-ified version of the webpage. Use XPath to find elements I want to modify and do so as necessary. Poof, templating. A sample template is layout.html

The following python will parse it and insert "Testing" into the content div.

#!/usr/local/bin/python

import sys
from xml.dom import minidom
from xml import xpath

if __name__ == '__main__':
        foo = minidom.parse("layout.html")

        # Append a text node to the element with 'id="content"'
        div = xpath.Evaluate("//*[@id='content']", foo.documentElement)
        div[0].appendChild(foo.createTextNode("Testing"))
        foo.writexml(sys.stdout)
It seems pretty simple. I'm probably going to come up with a simple-ish xml/xpath way of doing templating. We'll see how well it actually works later on, but for now it seems like a pretty simple way of doing templating. Move the complicated parts (complex xpath notions) to a templating class with an "insert text" or somesuch method and poof, simple templating. Even for complex situations where I may need to produce a table it is easy to provide a default node-tree for replicating. The particular DOM implementation I am using provides me a wonderful cloneNode() method with which to do this.

Ofcourse, if you know of any other simpler ways of doing templating in python (or in general) definitely let me know :)