- im am b0red
- |
- Veteran Legendary Member
Posted by: dazarobbo
Posted by: LordOfBlah51
What do file extensions do in URLs anyways? If you can remove them, do they have any purpose?Technically nothing since it's just a line of text, but it depends on how the server decides to interpret it.
For most web servers, the default action is to map the URL* to the file system from a special directory created for the requested host. In Apache, this is the "htdocs" folder by default, and, IIRC, is the www folder for IIS. When this mapping occurs, the URL is checked against the file system path. For instance, if the URL was "/test/somefile.html", you could assume there may be a directory in the root directory named "test" and within that there may be file named "somefile.html".
So from that perspective, the file extension in the URL does nothing functional, but just attempts to be matched against a file in a directory that really does exist with that path, name, and extension.
However, because they do nothing functional (except for any parameters, but that's a different story), they can be manipulated and mapped to different internal directories on the server. So a URL like "/games/halo" (which is what you would see) could potentially be mapped internally to another file like "/gamepages/halo.html". They could even be mapped outside of the root folder (which becomes a security issue).
Without knowledge of this, it's possible to reach a false conclusion about what software the server uses too. For example, on just about every bungie.net page, the URL is usually pointing to a file with a ".aspx" extension. By seeing these, you could come to the conclusion that bungie.net uses ASP.NET (because .aspx is the extension used by ASP.NET web forms). But then, if you look at a website like http://mobile.bungie.co/ which also ".aspx" in the URL, you could reach the same conclusion, even though it doesn't use ASP.NET at all. It's tricking you.
So in that sense, it's pretty much impossible to know what's actually happening on the server-side. It could also be impersonating another server type altogether...
It's a complicated world.
*This is a regular URL minus the protocol, host, and the hash and anything past it (if it exists). For instance, a URL like "http://www.bungie.net/Forums/posts.aspx?postID=74080344 &viewreplies=true#end" gets sent as "/Forums/posts.aspx?postID=74080344&viewreplies=true " in the request line. The http portion disappears and gets mapped to port 80 (well-known port for HTTP), the host gets removed and put in the host header (this is also how multiple domains can exist on the same address), and the hash fragment is only relevant to the client-side.Well I learned something today!