Creating a PHP Web Proxy Server - Part 1

Aim
This tutorial aims to create a web proxy server from scratch. It is split into 5 sections. A web proxy does not require any configuration settings such as proxy ip and port on your browser. Whereas conventional proxy servers found on corporate firwalls require these settings. Basically a web proxy is a web-application such as a php script that receives a request, creates a new connection to the requested resource, gets the data and then sends the data back to the client.

Assumptions.
While coding such a proxy is fairly straightforward, the complexity comes from HTML and the HTTP protocol itself. You will need basic php skills, some basic regular expression skills and a rudinentary understaning of html and http.

Versions used in this example
Sofware/Component
Image
Apache 2.2
N/A
php 5.2.1.2
N/A
Links to these files can be found here

How does a php web based proxy work? The proxy is actually a php application/web page.
1. Accept a url from a webpage (get request)
2. Open a cURL connection to the url you received.
3. Read the html.
4. Replace all the links in the html file to point to your php application
5. Return the output to the client.

  • As an example, let us assume that a user wishes to connect to a page at http://mysite.com/out.html. The contents of the file are given below.
     1. <html>
     2. <head>
     3.   <link rel="stylesheet" type="text/css" 
     4.     href="mystyle.css"/>
     5.   <script type="text/javascript" 
     6.     src="http://codediaries.com/jslib/ads.js"></script>
     7. </head>
     8. <body>
     9.   <h1>PHP Proxy</h1>
    10.   This is a test page to show
    11.   what a php web-page proxy does.
    12.   <br/>
    13.   You can find more information at 
    14.   <a href="http://codediaries.blogspot.com">Code diaries</a>
    15.   <br/>
    16.   <a href="help.html">Help</a> | 
    17.   <a href="contact.html">Contact</a>
    18. </body>
    19. </html>
    Hide line numbers

  • Let us also assume that the php proxy is hosted at http://127.0.0.1/php/proxy.php. So you give it the url you want to view, like below (in your browser).
    http://127.0.0.1/php/proxy.php?urlP=http://mysite.com/out.html

  • Your proxy.php then recieves the url and opens up a cURL connection to the host and downloads the page.

  • Then you use regular expressions or some other method to change all the urls to point to your proxy. The urls are usually base64 encoded to make them 'safe'
     1. <html>
     2. <head>
     3.   <link rel="stylesheet" type="text/css" 
     4.     href="http://127.0.0.1/php/prxprod/proxy?urlE=aHR0cDovLzEyNy4wLjAuMS9waHAvYi9teXN0eWxlLmNzcw__"/>
     5.   <script type="text/javascript" 
     6.     src="http://127.0.0.1/php/prxprod/proxy?urlE=aHR0cDovL2NvZGVkaWFyaWVzLmNvbS9qc2xpYi9hZHMuanM_"></script>
     7. </head>
     8. <body>
     9.   <h1>PHP Proxy</h1>
    10.   This is a test page to show
    11.   what a php web-page proxy does.
    12.   <br/>
    13.   You can find more information at 
    14.   <a href="http://127.0.0.1/php/prxprod/proxy?urlE=aHR0cDovL2NvZGVkaWFyaWVzLmJsb2dzcG90LmNvbQ__">Code diaries</a>
    15. 
    16.   <br/>
    17.   <a href="http://127.0.0.1/php/prxprod/proxy?urlE=aHR0cDovLzEyNy4wLjAuMS9waHAvYi9oZWxwLmh0bWw_">Help</a> | 
    18.   <a href="http://127.0.0.1/php/prxprod/proxy?urlE=aHR0cDovLzEyNy4wLjAuMS9waHAvYi9jb250YWN0Lmh0bWw_">Contact</a>
    19. </body>
    20. </html>
    Hide line numbers

    As you will notice all the urls have been replaced by something like
    http://127.0.0.1/php/prxprod/proxy?urlE=aHR0cDovLzEyNy4wLjAuMS9waHAvYi9teXN0eWxlLmNzcw__
    The string after the urlE actually contains the fully qualified url. If we look at line 4, the aHR0cDo... strin contains http://mysite.com/mysite.css encoded in base 64.

  • Now send this changed output back to the requesting browser.

  • Now when your script receives a request such as the one below,
    http://127.0.0.1/php/prxprod/proxy?urlE=aHR0cDovLzEyNy4wLjAuMS9waHAvYi9teXN0eWxlLmNzcw__
    You will have to base64 decode the string and make a cURL request to the site.


As you can see by rewriting all the urls, all requests for css and js files go through your proxy. Also, all links will now go via your proxy too.

In the next part we will look at the base cURL proxy skeleton to get us started.

5 comments:

kobeleysen said...

where are the other parts ??

greetz
kobe

righteous said...

The entire project is now available for download at http://www.tidytutorials.com/2010/12/apachephp-web-proxy.htm

denvi said...

I'd really like to have the complete tutorial, is it available too, or just the end result (which isn't what i;m looking for, I want the learning experience)?

Rgds,
Dennis

Sander said...

Hello,

The complete script isn't available anymore, can you reupload it?

Sander

Naviya Nair said...

Very interesting and good Explanation
ASP NET Training
ASP NET Training
ASP NET Online Training
C-Sharp Training
Dot Net Training in Chennai
Online .Net Training


MVC Training
WCF Training
Web-API Training
LINQ Training
Entity Framework
Training

Dot Net Interview Questions