Search Engine Optimization

Search engines are one of the most important and frequently used tools for finding information on the world wide web. Search Engine Optimization refers to a variety of techniques and strategies for improving the ranking and visibility of websites within search engine listings.

In this tutorial you will learn about:
Why optimizing websites for search engines is important
 
  • On average, at least 45% of visitors to utah.edu use the search engine for information retrieval or site navigation.
    source: University of Utah Office of Information Technology
 
  • 62% of search engine users click on a result within the first page of results.
  • 90% of users click on a result within the first three pages of results.
  • 41% of users attempt to refine their search after the first results page if initial search is unsuccessful.
  • 88% of users attempt to refine their search after the first three pages if initial search is unsuccessful.
  • 72% of Google users favor organic results over paid listings.
    source: iProspect Search Engine User Behavior Study (April 2006)
 
  • The first organic result receives 28% of all user clicks.
  • Users tend to click on one of the top three results with little analysis, but
    pay closer attention to title and description information on results 4-10.
    source: Enviro
 
How Search Engines Work
 
  • Search engines run automated programs, called “robots" or "spiders" that use the hyperlink structure of the web to "crawl" the pages and documents that make up the World Wide Web.
  • Once a page has been crawled, its contents can be indexed - stored in a giant database of documents that makes up a search engine's "index".
  • When a request for information comes into the search engine, the engine retrieves all the documents from its index that match the query.
  • Once the search engine has determined which results are a match for the query, the engine's algorithm (a mathematical equation commonly used for sorting) runs calculations on each of the results to determine which is most relevant to the given query.
  • [For more information, see http://www.seomoz.org/articles/bg2.php ]
 
Definitions of Common Terms
 
  • Algorithm – A mathematical formula or equation for solving problems such as sorting large data sets.
  • Directory – A categorized, descriptive list of website links, usually created and compiled by human editors.
  • Domain name - A hostname (such as www.utah.edu) that provides a more memorable identifier than an IP address.
  • Index - A search engine's database of web page content.
  • IP (Internet Protocol) address - A unique number (such as 155.99.1.3) that web browsers use to identify and communicate with web servers.
  • Keyword – A term or phrase entered as a query into a search engine.
  • Organic results - Search results compiled from the search engine index.
  • Robots & spiders – Automated programs which crawl the internet collecting and indexing web pages.
  • Paid placement results - Sponsored search results paid for by commercial entities and placed near the top of the organic results.
  • SEO – Search Engine Optimization
  • SEM – Search Engine Marketing
  • SERP – Search Engine Results Pages
  • URL (Uniform Resource Locator) - Synonym for a website address such as a domain name or I. P. Address.
 
Accessibility
  In order to add a website URL to its index, a search engine must be able to access the site. Accessibility roadblocks are technologies or page elements past which a search engine spider cannot crawl. Robots exclusion and redirects are also important ways to manage how search engines access and index websites.
 
  • Dynamic URLs and querystrings: URLs containing querystring elements such as & or ? to dynamically retrieve data may not allow access to search engine crawlers, so the content of web pages with dynamically generated URLs may not be searchable.
  • Secure sockets layer (SSL): Search engine crawlers are unable to access web pages encrypted using SSL protocols.
  • Javascript: Search engine crawlers do not follow links or page navigation written using javascript.
  • Cookies and session IDs: Search engine crawlers do not accept cookies or work with session identifiers. Web pages requiring cookies or session IDs for access will not be searchable.
  • Roadblocks (Lynux browser/viewer, firefox browser)
  • Spider limits: Most search engine crawlers limit the page size or number of characters they will crawl. Decrease the size of large web pages by moving javascript and CSS to external files.
  • Broken links: Search engine crawlers don't crawl past broken links.
  • Sitemaps: A sitemap is a web page that lists and links to all of the pages of a site. Search engine crawlers can easily and effectively index a site using a sitemap. Sitemaps are especially useful for sites with content that is otherwise inaccessible due to dynamic URLs, SSL, or other roadblocks. Limit the number of links in a sitemap to fewer than 100 or build sitemaps around groups of pages.
  • Non-HTML documents: Documents such as Word, Excel, PowerPoint, and Adobe PDF can be indexed by search engine crawlers. Assign a metadata title to Adobe and Microsoft documents using the File>Properties dialog.
  • Canonical URLs: A search engine will consider http://utah.edu and http://www.utah.edu to be different websites. If both URL forms serve up the same pages, search engines will consider them to be duplicate content, and dramatically deduct the relevance score of both. Use Server (301) redirects to point alternate form URLs to the canonical URL without relevance penalties.
  • Redirects: Redirect instructions tell web browsers and crawlers to move on to a new or revised URL. Server 301 redirects are server-side permanent redirect instructions which search engine spiders will follow. Server 302 redirects are server-side temporary redirect instructions, and most search engines will ignore them. Meta-refresh and javascript redirects are often used unethically to "cloak" content, and most search engine crawlers ignore them.
  • Robots exclusion: The Robots Exclusion Protocol is a method that allows site administrators to indicate to visiting robots which parts of their site should not be visited. Robots can be specifically admitted or excluded on a site-wide, directory by directory, or page by page basis, using the robots.txt file or robots meta tag.
 
Indexes and Directories
 
  • Getting a site indexed into a crawler-based search engine is best accomplished by being well linked to from other sites in that search engine's index. Some search engines charge to be manually indexed. Submitting to most directories will also result in indexing with the major search engines.
  • Registering with Google Webmaster Tools (formerly Google SiteMaps) is another way to ensure inclusion in the Google index.
  • Getting indexed into the university search engine requires registering with the university webmaster and being compliant with university web policy. In order for any web page to be indexed by the university's Ultraseek search engine, the root URL for that site must be included in one of the search engine collections.
  • Directories are search engines powered by human beings. Human editors compile all the listings that directories have. The two principal directories include Yahoo! Directory and the Open Directory Project (DMOZ). Thousands of smaller specialty directories also exist.
  • Submitting to the Yahoo! and DMOZ directories is useful for sites with highly competitive target content, and for sites without quality inbound links. Submitting to specialty directories is especially useful for sites with unique content matched specifically to the target audience of a particular directory.
  • Before being submitted to a directory, sites should be completed, written in correct HTML with no broken links, and viewable on a range of browsers, operating systems, and screen resolutions. A twenty-five word or less description should be prepared which includes the two or three keywords the site is intended to target, but should avoid marketing style language.
Usability & Search Friendly Design
  Usability is simply how easy a site is to understand and navigate. Many of the same principles which contribute to the usability of a site by human visitors also makes a site more accessible by search engines. Defining your target content and target audience is an important step in optimizing your website.
 
  • Target audience: The potential visitors and users of your site who may benefit from the content provided on your site.
  • Target content: The information, products or services you offer, and the processes required to provide them to your users.
  • Landing page: Any point of entry into your site.
  • Conversion: The end goal of a visit to your site, or what you hope the users of your site will do. This may be a purchase, course registration, file download, or simply a page view.
  • Conversion path: The process or steps required to guide a visitor from the point of entry to your site to the point of conversion.
  Visitors to websites often enter at pages other than the site's home page. Understanding where within your site potential visitors may land is crucial to optimizing your site for their visit. Usability includes structuring your site so that your target users can easily move from landing page to conversion.
 
Meta-data
 

Meta-data refers to information about a website contained in the website code but not displayed by browsers. Meta-tags are the individual HTML elements which help search engines classify and rank web page content.

 
  • HTML page title: The HTML page title is one of the most critical factors for optimizing a web page for top search engine results. A unique page title should be crafted for each landing page within a site. The page title should correspond closely to the target content of the page.
  • Description tag: The HTML description tag contains information about a web page and its contents. HTML descriptions are often displayed as part of a search engine's results page. Most search engines place low importance on description tags. The description tag should correspond closely to the target content of the page.
  • Keywords tag: The keywords tag allows for additional placement of keywords into the HTML header. Most search engines place little or no importance on the keywords tag.
  • Other HTML elements: Other HTML elements include the H1 header tag, image alt tags, and body text. Very little importance is given to these elements, but including them could help to slightly increase placement in search results pages for highly competitive terms.
Inbound Links and URL Popularity
  The popularity of a site on the web is an important factor for establishing ranking within search engine results pages. Every link to your site from another site is considered a "vote." The more links to your site, the more votes you receive, and the higher your site is ranked by the search engine.
 

Search engine algorithms consider the quality of every inbound link, as well as the quantity. A few links from highly ranked sites will count more than many links from low ranked or dubious sites.

 

Remote anchor text refers to the words used to describe where the link is pointing. Search engines understand the relationship between the source and destination of the link, and the text contained in the link. Links containing text which don't match the content found at the destination are ignored or discounted.

  • An example of effective remote anchor text might be:
    Follow this link for quality running shoes
  • A poorly worded remote anchor link would be:
    For quality running shoes click here
  The age and reliability of a website domain name may also play a role in how well a site is ranked by the search engines. Minimally optimized pages within older, more popular sites may rank higher than well optimized pages within new sites.
"Black Hat" Search Engine Optimization
  Unscrupulous optimization techniques designed to gain an unfair advantage, or for purposes other than legitimate information retrieval, are called "Black Hat" techniques, and are usually penalized by the search engines.
 
  • Invisible text: Using white text on white backgrounds or other methods to hide keywords in order to increase rank or mislead crawlers.
  • Keyword stuffing: Adding extra keywords to meta-data or alt-tags.
  • Duplicate pages: Serving the same content from multiple hostnames.
  • Domain cloaking: Serving up different content to search engines than is served up to ordinary users. This technique is often used by the adult entertainment industry.
  • Link farms: Publishing web pages containing hundreds of links to your site in order to inflate inbound link popularity.
  Most search engines, including the university's Ultraseek search engine, set a spam detection threshold for keyword repetition. The following table shows the number of allowable repeated keywords for different HTML elements:
 
Elements
Weight

Spam Detection Threshold
(per 100 words)

<title>
8
2
<meta name="description">
4
4
<meta name="keywords">
4
4
<img alt>
1
2
<body> text
 
8
Resources for Webmasters
  University Webmaster
 
  Online Resources
 
  Books & Periodicals
 
  • The ABC of SEO - David George
  • Search Engine Optimization - Grappone & Couzin
  • Search Engine Optimization for Dummies - Peter Kent