Wednesday, October 12, 2011

Url Rewriting with ASP.NET


People often ask me for guidance on how they can dynamically "re-write" URLs and/or have the ability to publish cleaner URL end-points within their ASP.NET web applications.  This blog post summarizes a few approaches you can take to cleanly map or rewrite URLs with ASP.NET, and have the option to structure the URLs of your application however you want.
Why does URL mapping and rewriting matter?
The most common scenarios where developers want greater flexibility with URLs are:
1) Handling cases where you want to restructure the pages within your web application, and you want to ensure that people who have bookmarked old URLs don't break when you move pages around.  Url-rewriting enables you to transparently forward requests to the new page location without breaking browsers.
2) Improving the search relevancy of pages on your site with search engines like Google, Yahoo and Live.  Specifically, URL Rewriting can often make it easier to embed common keywords into the URLs of the pages on your sites, which can often increase the chance of someone clicking your link.  Moving from using querystring arguments to instead use fully qualified URL's can also in some cases increase your priority in search engine results.  Using techniques that force referring links to use the same case and URL entrypoint (for example: weblogs.asp.net/scottgu instead of weblogs.asp.net/scottgu/default.aspx) can also avoid diluting your pagerank across multiple URLs, and increase your search results.
In a world where search engines increasingly drive traffic to sites, extracting any little improvement in your page ranking can yield very good ROI to your business.  Increasingly this is driving developers to use URL-Rewriting and other SEO (search engine optimization) techniques to optimize sites (note that SEO is a fast moving space, and the recommendations for increasing your search relevancy evolve monthly).  For a list of some good search engine optimization suggestions, I'd recommend reading the SSW Rules to Better Google Rankings, as well as MarketPosition's article on how URLs can affect top search engine ranking.
Sample URL Rewriting Scenario
For the purpose of this blog post, I'm going to assume we are building a set of e-commerce catalog pages within an application, and that the products are organized by categories (for example: books, videos, CDs, DVDs, etc).
Let's assume that we initially have a page called "Products.aspx" that takes a category name as a querystring argument, and filters the products accordingly.  The corresponding URLs to this Products.aspx page look like this:
http://www.store.com/products.aspx?category=books
http://www.store.com/products.aspx?category=DVDs
http://www.store.com/products.aspx?category=CDs
Rather than use a querystring to expose each category, we want to modify the application so that each product category looks like a unique URL to a search engine, and has the category keyword embedded in the actual URL (and not as a querystring argument).  We'll spend the rest of this blog post going over 4 different approaches that we could take to achieve this.
Approach 1: Use Request.PathInfo Parameters Instead of QueryStrings
The first approach I'm going to demonstrate doesn't use Url-Rewriting at all, and instead uses a little-known feature of ASP.NET - the Request.PathInfo property.  To help explain the usefulness of this property, consider the below URL scenario for our e-commerce store:
http://www.store.com/products.aspx/Books
http://www.store.com/products.aspx/DVDs
http://www.store.com/products.aspx/CDs
One thing you'll notice with the above URLs is that they no longer have Querystring values - instead the category parameter value is appended on to the URL as a trailing /param value after the Products.aspx page handler name.  An automated search engine crawler will then interpret these URLs as three different URLs, and not as one URL with three different input values (search engines ignore the filename extension and just treat it as another character within the URL). 
You might wonder how you handle this appended parameter scenario within ASP.NET.  The good news is that it is pretty simple.  Simply use the Request.PathInfo property, which will return the content immediately following the products.aspx portion of the URL.  So for the above URLs, Request.PathInfo would return "/Books", "/DVDs", and "/CDs" (in case you are wondering, the Request.Path property would return "/products.aspx").
You could then easily write a function to retrieve the category like so (the below function strips out the leading slash and returning just "Books", "DVDs" or "CDs"):

    Function GetCategory() As String

        If 
(Request.PathInfo.Length 0Then
            Return 
""
        
Else
            Return 
Request.PathInfo.Substring(1)
        
End If

    End Function
Sample Download: A sample application that I've built that shows using this technique can be downloaded here.  What is nice about this sample and technique is that no server configuration changes are required in order to deploy an ASP.NET application using this approach.  It will also work fine in a shared hosting environment.
Approach 2: Using an HttpModule to Perform URL Rewriting
An alternative approach to the above Request.PathInfo technique would be to take advantage of the HttpContext.RewritePath() method that ASP.NET provides.  This method allows a developer to dynamically rewrite the processing path of an incoming URL, and for ASP.NET to then continue executing the request using the newly re-written path.
For example, we could choose to expose the following URLs to the public:

http://www.store.com/products/Books.aspx
http://www.store.com/products/DVDs.aspx
http://www.store.com/products/CDs.aspx
This looks to the outside world like there are three separate pages on the site (and will look great to a search crawler).  By using the HttpContext.RewritePath() method we can dynamically re-write the incoming URLs when they first reach the server to instead call a single Products.aspx page that takes the category name as a Querystring or PathInfo parameter instead.  For example, we could use an an Application_BeginRequest event in Global.asax like so to do this:
    void Application_BeginRequest(object sender, EventArgs e) {

        
string fullOrigionalpath Request.Url.ToString();
       
        if 
(fullOrigionalpath.Contains("/Products/Books.aspx")) {
            Context.RewritePath(
"/Products.aspx?Category=Books");
        
}
        
else if (fullOrigionalpath.Contains("/Products/DVDs.aspx")) {
            Context.RewritePath(
"/Products.aspx?Category=DVDs");
        
}
    } 
The downside of manually writing code like above is that it can be tedious and error prone.  Rather than do it yourself, I'd recommend using one of the already built HttpModules available on the web for free to perform this work for you.  Here a few free ones that you can download and use today:
These modules allow you to declaratively express matching rules within your application's web.config file.  For example, to use the UrlRewriter.Net module within your application's web.config file to map the above URLs to a single Products.aspx page, we could simply add this web.config file to our application (no code is required):

<?xml version="1.0"?>
<configuration>

  
<configSections>
    
<section name="rewriter" 
             requirePermission
="false"
             type
="Intelligencia.UrlRewriter.Configuration.RewriterConfigurationSectionHandler, Intelligencia.UrlRewriter" />
  </
configSections>
 
  
<system.web>
     
    
<httpModules>
      
<add name="UrlRewriter" type="Intelligencia.UrlRewriter.RewriterHttpModule, Intelligencia.UrlRewriter"/>
    </
httpModules>
   
  
</system.web>

  
<rewriter>
    
<rewrite url="~/products/books.aspx" to="~/products.aspx?category=books" />
    <
rewrite url="~/products/CDs.aspx" to="~/products.aspx?category=CDs" />
    <
rewrite url="~/products/DVDs.aspx" to="~/products.aspx?category=DVDs" />
  </
rewriter> 
  
</configuration> 
The HttpModule URL rewriters above also add support for regular expression and URL pattern matching (to avoid you having to hard-code every URL in your web.config file).  So instead of hard-coding the category list, you could re-write the rules like below to dynamically pull the category from the URL for any "/products/[category].aspx" combination:

  <rewriter>
    
<rewrite url="~/products/(.+).aspx" to="~/products.aspx?category=$1" />  </rewriter>  
This makes your code much cleaner and super extensible.
Sample Download: A sample application that I've built that shows using this technique with the UrlRewriter.Net module can be downloaded here
What is nice about this sample and technique is that no server configuration changes are required in order to deploy an ASP.NET application using this approach.  It will also work fine in a medium trust shared hosting environment (just ftp/xcopy to the remote server and you are good to go - no installation required).
Approach 3: Using an HttpModule to Perform Extension-Less URL Rewriting with IIS7
The above HttpModule approach works great for scenarios where the URL you are re-writing has a .aspx extension, or another file extension that is configured to be processed by ASP.NET.  When you do this no custom server configuration is required - you can just copy your web application up to a remote server and it will work fine.
There are times, though, when you want the URL to re-write to either have a non-ASP.NET file extension (for example: .jpg, .gif, or .htm) or no file-extension at all.  For example, we might want to expose these URLs as our public catalog pages (note they have no .aspx extension):

http://www.store.com/products/Books
http://www.store.com/products/DVDs
http://www.store.com/products/CDs
With IIS5 and IIS6, processing the above URLs using ASP.NET is not super easy.  IIS 5/6 makes it hard to perform URL rewriting on these types of URLs within ISAPI Extensions (which is how ASP.NET is implemented). Instead you need to perform the rewriting earlier in the IIS request pipeline using an ISAPI Filter.  I'll show how to-do this on IIS5/6 in the Approach 4 section below.
The good news, though, is that IIS 7.0 makes handling these types of scenarios super easy.  You can now have an HttpModule execute anywhere within the IIS request pipeline - which means you can use the URLRewriter module above to process and rewrite extension-less URLs (or even URLs with a .asp, .php, or .jsp extension).  Below is how you would configure this with IIS7:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>

  
<configSections>
    
<section name="rewriter"
             requirePermission
="false"
             type
="Intelligencia.UrlRewriter.Configuration.RewriterConfigurationSectionHandler, Intelligencia.UrlRewriter" />
  </
configSections>
 
  
<system.web>
     
    
<httpModules>
      
<add name="UrlRewriter" type="Intelligencia.UrlRewriter.RewriterHttpModule, Intelligencia.UrlRewriter" />
    </
httpModules>
   
  
</system.web>

  
<system.webServer>

    
<modules runAllManagedModulesForAllRequests="true">
      
<add name="UrlRewriter" type="Intelligencia.UrlRewriter.RewriterHttpModule" />
    </
modules>

    
<validation validateIntegratedModeConfiguration="false" />

  </
system.webServer>

  
<rewriter>
    
<rewrite url="~/products/(.+)" to="~/products.aspx?category=$1" />
  </
rewriter>
  
</configuration>
Note the "runAllManagedModulesForAllRequests" attribute that is set to true on the <modules> section within <system.webServer>.  This will ensure that the UrlRewriter.Net module from Intelligencia, which was written before IIS7 shipped, will be called and have a chance to re-write all URL requests to the server (including for folders).  What is really cool about the above web.config file is that:
1) It will work on any IIS 7.0 machine.  You don't need an administrator to enable anything on the remote host.  It will also work in medium trust shared hosting scenarios.
2) Because I've configured the UrlRewriter in both the <httpModules> and IIS7 <modules> section, I can use the same URL Rewriting rules for both the built-in VS web-server (aka Cassini) as well as on IIS7.  Both fully support extension-less URLRewriting.  This makes testing and development really easy.
IIS 7.0 server will ship later this year as part of Windows Longhorn Server, and will support a go-live license with the Beta3 release in a few weeks.  Because of all the new hosting features that have been added to IIS7, we expect hosters to start aggressively offering IIS7 accounts relatively quickly - which means you should be able to start to take advantage of the above extension-less rewriting support soon.  We'll also be shipping a Microsoft supported URL-Rewriting module in the IIS7 RTM timeframe that will be available for free as well that you'll be able to use on IIS7, and which will provide nice support for advanced re-writing scenarios for all content on your web-server.
Sample Download: A sample application that I've built that shows using this extension-less URL technique with IIS7 and the UrlRewriter.Net module can be downloaded here
Approach 4: ISAPIRewrite to enable Extension-less URL Rewriting for IIS5 and IIS6
If you don't want to wait for IIS 7.0 in order to take advantage of extension-less URL Rewriting, then your best best is to use an ISAPI Filter in order to re-write URLs.  There are two ISAPI Filter solutions that I'm aware of that you might want to check-out:
  • Helicon Tech's ISAPI Rewrite: They provide an ISAPI Rewrite full product version for $99 (with 30 day free trial), as well as a ISAPI Rewrite lite edition that is free.
  • Ionic's ISAPI Rewrite: This is a free download (both source and binary available)
I actually don't have any first-hand experience using either of the above solutions - although I've heard good things about them.  Scott Hanselman and Jeff Atwood recently both wrote up great blog posts about their experiences using them, and also provided some samples of how to configure the rules for them.  The rules for Helicon Tech's ISAPI Rewrite use the same syntax as Apache's mod_rewrite.  For example (taken from Jeff's blog post):

[ISAPI_Rewrite]
# fix missing slash on folders
# note, this assumes we have no folders with periods!
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [RP]

# remove index pages from URLs
RewriteRule (.*)/default.htm$ $1/ [I,RP]
RewriteRule (.*)/default.aspx$ $1/ [I,RP]
RewriteRule (.*)/index.htm$ $1/ [I,RP]
RewriteRule (.*)/index.html$ $1/ [I,RP]

# force proper www. prefix on all requests
RewriteCond %HTTP_HOST ^test\.com [I]
RewriteRule ^/(.*) http://www.test.com/$1 [RP]

# only allow whitelisted referers to hotlink images
RewriteCond Referer: (?!http://(?:www\.good\.com|www\.better\.com)).+
RewriteRule .*\.(?:gif|jpg|jpeg|png) /images/block.jpg [I,O]
Definitely check out Scott's post and Jeff's post to learn more about these ISAPI modules, and what you can do with them.
Note: One downside to using an ISAPI filter is that shared hosting environments typically won't allow you to install this component, and so you'll need either a virtual dedicated hosting server or a dedicated hosting server to use them.  But, if you do have a hosting plan that allows you to install the ISAPI, it will provide maximum flexibility on IIS5/6 - and tide you over until IIS7 ships.
Handling ASP.NET PostBacks with URL Rewriting
One gotcha that people often run into when using ASP.NET and Url-Rewriting has to-do with handling postback scenarios.  Specifically, when you place a <form runat="server"> control on a page, ASP.NET will automatically by default output the "action" attribute of the markup to point back to the page it is on.  The problem when using URL-Rewriting is that the URL that the <form> control renders is not the original URL of the request (for example: /products/books), but rather the re-written one (for example: /products.aspx?category=books).  This means that when you do a postback to the server, the URL will not be your nice clean one.
With ASP.NET 1.0 and 1.1, people often resorted to sub-classing the <form> control and created their own control that correctly output the action to use.  While this works, it ends up being a little messy - since it means you have to update all of your pages to use this alternate form control, and it can sometimes have problems with the Visual Studio WYSIWYG designer.
The good news is that with ASP.NET 2.0, there is a cleaner trick that you can use to rewrite the "action" attribute on the <form> control.  Specifically, you can take advantage of the new ASP.NET 2.0 Control Adapter extensibility architecture to customize the rendering of the <form> control, and override its "action" attribute value with a value you provide.  This doesn't require you to change any code in your .aspx pages.  Instead, just add a .browser file to your /app_browsers folder that registers a Control Adapter class to use to output the new "action" attribute:


You can see a sample implementation I created that shows how to implement a Form Control Adapter that works with URLRewriting in my sample here.  It works for both the Request.PathInfo and UrlRewriter.Net module approaches I used in Approach 1 and 2 above, and uses the Request.RawUrl property to retrieve the original, un-rewritten, URL to render.  With the ISAPIRewrite filter in Approach 4 you can retrieve the Request.ServerVariables["HTTP_X_REWRITE_URL"] value that the ISAPI filter uses to save the original URL instead.
My FormRewriter class implementation above should work for both standard ASP.NET and ASP.NET AJAX 1.0 pages (let me know if you run into any issues).
Handling CSS and Image Reference Correctly
One gotcha that people sometime run into when using Url Rewriting for the very first time is that they find that their image and CSS stylesheet references sometimes seem to stop working.  This is because they have relative references to these files within their HTML pages - and when you start to re-write URLs within an application you need to be aware that the browser will often be requesting files in different logical hierarchy levels than what is really stored on the server.
For example, if our /products.aspx page above had a relative reference to "logo.jpg" in the .aspx page, but was requested via the /products/books.aspx url, then the browser will send a request for /products/logo.jpg instead of /logo.jpg when it renders the page.  To reference this file correctly, make sure you root qualify CSS and Image references ("/style.css" instead of "style.css").  For ASP.NET controls, you can also use the ~ syntax to reference files from the root of the application (for example: <asp:image imageurl="~/images/logo.jpg" runat="server"/>