services by tectonics
Over the past ten years the amount of content including web pages, images, video and documents has grown substantially. Today more than 200 million searches are performed each day across the top five search engines including Google, Yahoo, MSN, AOL and Ask.com. Currently search engines are the number one reason users become exposed to new web sites. For this reason, with all the consumer searches being performed it is imperative for web sites to be properly cataloged within the various search indexes.
For many web sites it is necessary to restrict access to content based upon a variety of criteria. These include access to membership, consumer identification, age or location validation and many other potential criteria. Even though interaction with the content requires a degree of prior authentication it may still be necessary to allow the content to be properly indexed by search engines to attract new potential users as well as facilitate more effective searching using modern search engine software.
For many sites which require various methods of authentication about the user to access content once they have been authenticated their browser is tracked either with a browser cookie, server side session value or other type of authentication key. This key then allows the user to access their allowed content while the key remains valid. The issue with search engine spiders and crawlers is that they do not provide the web sites with any authentication information and immediately are restricted from accessing content from behind the authentication page.
Tectonic Concepts has pioneered a technique to allow indexing of web content by search engine spiders and crawlers which is usually restricted by various authentication methods. (See Image Below) This technique identifies each visitor accessing the web content as either a true visitor or a search engine. If the visitor is not identified as a spider or crawler and has not already been authenticated the visitor will be immediately redirected to the authentication page passing along any query string information provided by the visitor. Once authenticated the user will be tagged with a browser cookie, server side session or authentication key and allowed to pass and view their desired content. Finally, if the visitor is identified as a spider or crawler they will be allowed to pass and properly index the available content. When this happens the search engine will display a list of pages within their search results based upon the user’s specified search criteria. Once the user accesses the pages indexed within the search engine they will be redirected to the authentication page until they have provided the appropriate credentials. Through this technique, Tectonic Concepts also includes an exception list of web pages and other content that is excluded from this process and restricted from indexing without the proper authentication. The biggest challenge with this process is securing other web content not usually programmatically parsed by the web server. This includes PDF's, MS Word, Excel, Powerpoint, MP3 and other file formats. The best method is to deliver these files through a http handler or other type of file download method such as filedownload.aspx?file=1234. This method provides a variety of benefits including the ability to restrict access based upon authentication as well as track by user who has accessed which files.
The core component of this technique is an updatable library of identified spiders and crawlers which are tracked by their unique browser identification information. Although the Microsoft .NET framework versions 1.1 and 2.0 include methods to identify potential crawlers via browser details, this identification is based upon an array of known crawlers identified at the time of the framework version release. Unfortunately new crawler and spider versions are updated and released on a regular basis and don’t always allow for a positive match within the framework methods. Tectonics’ updateable library continously allows the web site to recognize new spiders and crawlers and allow their entry into the site to properly catalog the available information.
Through the use of SEO friendly techniques for restricted access content these sites can greatly improve their reach within search engine results and benefit from the 200 million daily online searches to identify new users to promote their content and brand.