Saturday, May 26, 2012

Understanding HTTP protocol by using telnet

After few incentives during GeeCON and other conversations and conferences I decided to dive into HTTP details finally. I.e. to read the HTTP protocol in the raw form of RFC2616

Actually I wanted to understand the GET, POST, PUT, DELETE methods but before I got there I found it pretty amusing to hack a bit using telnet. And that post is exactly about this: hacking HTTP using telnet.

Prerequisites

I assume that you have access to:

  • telnet from command line
  • Chrome web browser. In principle you could use Firefox with Firebug plugin installed but I will focus on Chrome

First hack

On the command line write:

 
telnet most-recently-used.blogspot.com http
The google server should answer with something like
Trying 209.85.148.132...
Connected to blogspot.l.google.com.
Escape character is '^]'.

and then paste this magic into the terminal
GET /2012/05/geecon-2012-review.html HTTP/1.1
Host: most-recently-used.blogspot.com

And tap Return/Enter once or twice.

This will request a google server to fetch for us a document called /2012/05/geecon-2012-review.html

The HTTP protocol specifies that the response starts with header like the one you can see at the very beginning of the server's response:

HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Set-Cookie: blogger_TID=136dec288ddf5a27; HttpOnly
Expires: Sat, 26 May 2012 15:34:12 GMT
Date: Sat, 26 May 2012 15:34:12 GMT
Cache-Control: private, max-age=0
Last-Modified: Sat, 26 May 2012 15:34:11 GMT
ETag: "cf88fe06-51da-4158-aa3f-9d374ae09058"
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Server: GSE
Transfer-Encoding: chunked
What follows is already a html file of an article of my blog.

Go ahead and check what happends if you:
  • tell the google server that you speak the predecessor of HTTP/1.1 protocol, namely HTTP/1.0 (diff or vimdiff is your friend here)
  • make a typo in the host name
  • change GET method to HEAD, PUT, DELETE or anything else

Second hack

Now, I always wondered what is the difference between refreshing a page using Ctrl+F5 and F5. Let's
telnet www.w3.org http
GET /Protocols/rfc2616/rfc2616-sec9.html HTTP/1.1
Host: www.w3.org
If-Modified-Since: Wed, 01 Sep 2004 13:24:52 GMT
Which means: get this document for us if it was modified since the date given. Of course it wasn't modified and therefore the answer is a bare HTTP header saying:
HTTP/1.1 304 Not Modified
Date: Sat, 26 May 2012 15:58:52 GMT
Server: Apache/2
Connection: close
ETag: "40d7-3e3073913b100"
Expires: Sat, 26 May 2012 21:58:52 GMT
Cache-Control: max-age=21600

Browser details

Now that we know that browsers do no magic but communicate with servers using a simple protocol we may wish to inspect what requests they actually do.

  • open Chrome
  • open new tab
  • launch Javascript Console
  • in the console open Network tab
  • direct your browser's tab to www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
Have a closer look on the rfc2616.html document and you will see similar things that we made telnet to write for us. The browser puts much more informaction into the header. See what's the difference if you refresh the page using F5 and Ctrl+F5. Did you notice the following lines in the header?
If-None-Match: "40d7-3e3073913b100"
If-Modified-Since: Wed, 01 Sep 2004 13:24:52 GMT
The reason for that is: if we press F5, the browser believes it has the right version of the file to be rendered and asks the server if the file changed since a date specified. If it didn't only a short response message is sent back. Otherwise whole html file would be sent back. If, however, Ctrl+F5 is pressed there is none of the lines above and therefero the server eagerly serves html content right away.

Summary

Now go ahead and use Chrome's Javascript Console (Network tab) to see more examples of requests done in your name by the browser!

No comments:

Post a Comment