Sunday, May 13, 2012

Node-Spdy v3 From Scratch

‹prev | My Chain | next›

I am uncertain that I can easily troubleshoot the my Firefox SPDY woes any further, so tonight, I throw a test site up in the hope that someone else can solve my problem. I will make this available at https://test.spdybook.com:3000 (the SSL cert is invalid).

The unstable version of node.js is required for this. Happily, this means that I do not need to bother with SPDY-capable versions of openssl—the unstable 0.7 series is the first that will bundle a proper SPDY-capable version of openssl.

First, I download and install the latest unstable version of node.js:
cd ~/Downloads
wget http://nodejs.org/dist/v0.7.8/node-v0.7.8.tar.gz

cd ~/src
tar zxf ~/Downloads/node-v0.7.8.tar.gz
cd node-v0.7.8
./configure --prefix=$HOME/local/node-v0.7.8
make
make install
Not only does the unstable branch of node.js include the correct openssl, but it also includes npm, making the process of getting up and running incredibly fast.

I already have node-spdy checked out, so I fetch my recent changes and switch to the spdy-v3 branch:
cd ~/repos
cd node-spdy
git fetch origin
git co -b spdy-v3 -t origin/spdy-v3
The rest of the install is similar to when I first got started with spdy/3 on node.js. I link this node-spdy package as "the" node-spdy that npm will install instead of downloading:
cstrom@li97-191> npm link
npm http GET https://registry.npmjs.org/mocha
npm http 200 https://registry.npmjs.org/mocha
npm http GET https://registry.npmjs.org/mocha/-/mocha-0.10.2.tgz
npm http 200 https://registry.npmjs.org/mocha/-/mocha-0.10.2.tgz
npm http GET https://registry.npmjs.org/growl
npm http GET https://registry.npmjs.org/debug
npm http GET https://registry.npmjs.org/commander
npm http 200 https://registry.npmjs.org/debug
npm http GET https://registry.npmjs.org/debug/-/debug-0.7.0.tgz
npm http 200 https://registry.npmjs.org/growl
npm http GET https://registry.npmjs.org/growl/-/growl-1.4.1.tgz
npm http 200 https://registry.npmjs.org/commander
npm http GET https://registry.npmjs.org/commander/-/commander-0.5.2.tgz
npm http 200 https://registry.npmjs.org/growl/-/growl-1.4.1.tgz
npm http 200 https://registry.npmjs.org/debug/-/debug-0.7.0.tgz
npm http 200 https://registry.npmjs.org/commander/-/commander-0.5.2.tgz
mocha@0.10.2 ./node_modules/mocha
├── growl@1.4.1
├── commander@0.5.2
└── debug@0.7.0
../../local/node-v0.7.8/lib/node_modules/spdy -> /home/cstrom/repos/node-spdy
Instead of generating a new express.js site, I copy the one that I have been using locally:
➜  express-spdy-test  scp -r app.js keys package.json public routes views linode:repos/express-spdy-test
There is not too much difference between this and a generated site (npm install -g express; express express-spdy-test), but I do have a couple of large images in there that can push flow control in spdy/3 to the breaking point.

Now I complete the npm link of my local node-spdy package (it's just "spdy" in npm):
cd ~/repos/express-spdy-test
npm link spdy
  ./node_modules/spdy -> /home/cstrom/local/node-v0.7.8/lib/node_modules/spdy -> /home/cstrom/repos/node-spdy
I have to force install express.js (and its companion connect.js) since neither are warranted to run against unstable node.js:
npm install express@2.5.9 --force
npm install connect@2.1.2 --force
Finally, I can fire up the server:
node app
Express server listening on port 3000 in development mode
When I connect with Chrome to https://test.spdybook.com:3000, I am greeted with the invalid SSL certificate as expected:


After clicking "Proceed anyway", my spdy/3 conversation proceeds apace. Except...


The second image—the image incidentally with which Firefox was struggling—fails to load.

And, sure enough, the SPDY tab in chrome://net-internals shows that there is an error in there:
...
SPDY_SESSION_RECV_DATA
--> flags = 0
--> size = 1300
--> stream_id = 9
SPDY_SESSION_RECV_DATA
--> flags = 0
--> size = 1176
--> stream_id = 9
SPDY_SESSION_CLOSE
--> description = "bytes_read is 0."
--> status = -100
Gah! Did I ever try the most recent patch in Chrome or did I just assume that I worked? I don't know, but I will investigate that tomorrow. It seems at least plausible since the Firefox errors that I have been seeing were also on stream #9 (the second image). It seems likely that Firefox was not wrong after all.


Day #385

6 comments:

  1. Chris, I'm going to throw this one back to you after having done some research. Thanks for making that server available on port 3000.

    I see an error on both images, fwiw.

    According to my analysis of the logs and packet traces the server is generating a TCP reset while downloading the image with stream id of 7.

    Firefox generates 2 window update messages. The SSL decoder actually doesn't quite let me point at which packets they are, but I strongly suspect they are packet #98 (delta 65660) and #167 (delta 66320). That makes sense from FF's algorithm point of view.

    packet #216 in the trace is the TCP reset generated by the server. It has a seq num that points at packet 167 as being the problem. (i.e. your server received 167 (the second window update) and freaked out).

    From there we abandon the connection (due to having been reset by the server) and make a new one. That's why you see the second SETTINGS frame, its on a different TCP connection entirely.

    The packet capture, logfile, and pre-master key information for wireshark are all here http://www.ducksong.com/misc/4strom/

    fwiw, this test did help me find and fix a problem on a spdy/3 error path - so that's awesome!

    separately, we should talk more about appropriate window size. I think we disagree on the purpose of flow control as applied to end user clients and its interaction with tcp.

    ReplyDelete
    Replies
    1. Thanks for the help! FWIW, the SSL certs are taken from node-spdy: https://github.com/indutny/node-spdy/tree/master/keys

      I think that FF is incorrect when it sends those two WINDOW_UPDATEs (and, indeed those are the packets that you suspected). By virtue of the SETTINGS frame, each of FF's initial receive windows should be 256mb. That is, node-spdy should be able to send up to 256 worth of data before flow-control kicks in and WINDOW_UPDATEs are necessary. Since my images are far short of that, FF should not be sending node-spdy a WINDOW_UPDATE.

      The TCP/IP reset occurs when node-spdy sees a WINDOW_UPDATE for a stream that was already closed (#7). Node-spdy probably ought to attempt a RST_STREAM at that point instead.

      Delete
    2. Chris - progress! that's great.

      " That is, node-spdy should be able to send up to 256 worth of data before flow-control kicks in and WINDOW_UPDATEs are necessary. Since my images are far short of that, FF should not be sending node-spdy a WINDOW_UPDATE. "

      the server is certainly entitled to send 256mb of data without an update, that's what the initial window means. but the client should not let the window run down to empty without sending an update - that just ensures a 1rtt pause in data flow every 256mb. Instead, the receiver tops off the window back to its original value once in a while - ensuring uninterrupted data flow. I will eventually probably do it less often, but for now I wanted that code tested :)

      take note that it is perfectly legal to actually issue deltas for an amount greater than has been received - i.e. growing the window to a value greater than the initial size.

      "The TCP/IP reset occurs when node-spdy sees a WINDOW_UPDATE for a stream that was already closed (#7). Node-spdy probably ought to attempt a RST_STREAM at that point instead."

      its actually an unavoidable situation.. so imo you could either just ignore it or send INVALID_STREAM if you like. The reason I say it is unavoidable is that there is a delay between the server closing a stream and the client seeing that fin.. of course during that delay the client will sometimes send a window update.. and then there is a delay in the window update getting back to the server, so even if the server hadn't closed the stream before it might do so while the update is in transit.. none of that is a problem - the only purpose of flow control is to constrain the sending rate of a stream.. if the stream has no more data to send then the relevant messages can be safely ignored without changing the future sending algorithm :)

      make sense?

      Delete
    3. "the server is certainly entitled to send 256mb of data without an update, that's what the initial window means. but the client should not let the window run down to empty without sending an update - that just ensures a 1rtt pause in data flow every 256mb."

      Actually, this is exactly what Chrome does.

      This is not as bad as it might first seem. When one stream pauses for a WINDOW_UPDATE, other streams whose receive windows are non-zero continue (or resume) sending DATA.

      Given that most resources are on the order of the default window (64kb), there really ought to be no more than a handful of RTTs incurred by WINDOW_UPDATEs for a typical SPDY session.


      "Instead, the receiver tops off the window back to its original value once in a while - ensuring uninterrupted data flow. I will eventually probably do it less often, but for now I wanted that code tested :)"

      I would argue that virtually no web page in existence has resources larger that 256mb, obviating the need to keep the receive window above zero. Besides a few RTTs while transferring a 256mb image seems like the proverbial drop in a bucket.

      Still, I believe that you are correct in that this is in accordance with the spec, so I need to handle it better.


      "take note that it is perfectly legal to actually issue deltas for an amount greater than has been received - i.e. growing the window to a value greater than the initial size. "

      Ah, good point.


      "its actually an unavoidable situation.. so imo you could either just ignore it or send INVALID_STREAM if you like. The reason I say it is unavoidable is that there is a delay between the server closing a stream and the client seeing that fin.. of course during that delay the client will sometimes send a window update.. and then there is a delay in the window update getting back to the server, so even if the server hadn't closed the stream before it might do so while the update is in transit.. none of that is a problem - the only purpose of flow control is to constrain the sending rate of a stream.. if the stream has no more data to send then the relevant messages can be safely ignored without changing the future sending algorithm :)"

      Again, a fair point. I will update node-spdy to ignore these rather than resetting the stream.

      FWIW it is not _entirely_ unavoidable — I do not believe that either Chrome or spdylay would do this. Both wait until reception of receive_window bytes before issuing a WINDOW_UPDATE. If the server completes tranmission of a resource before the receive window is exhausted, then no WINDOW_UPDATE need be issued.

      That said, I believe that FF is behaving per the spec so I need to account for it.

      Delete
    4. It might help to think of a spdy receiver window as a tcp "rwin" and a window update as a "tcp ack". The idea is to get a consistent flow of data from sender to receiver and a regularly spaced ack clock going the other way. 0 sized windows just stall things. Much of the point of spdy is to get rid of silly waits like that.

      "When one stream pauses for a WINDOW_UPDATE, other streams whose receive windows are non-zero continue (or resume) sending DATA."

      Any data is better than no data, granted!

      But if there is another stream it is by definition of lower priority to the server sending it (otherwise the scheduler would not have selected the stream with a 0 window) - that's not a good outcome when the transport is dictating priorities instead of informing them. And of course some of the time there simply will be no alternative stream and the bandwidth opportunity is squandered.

      Delete
    5. For posterity's sake, I was wrong about Chrome exhausting the receive window before sending WINDOW_UPDATES. It does seem to get into this state after a few rwin's worth of data, but, at first, it attempts to send WINDOW_UPDATES after receiving only 1/2 an rwin's worth of data.

      Delete