There is a lot of confusion out there surrounding how search engines like Google actually work. Some people think that the search engine looks at the individual words contained in a web page and ranks pages according to that. The truth is there is a lot more that goes into it.

I will try to give you a precise answer on how they do it, but I will actually fail miserably because answering such a question precisely would take volumes of text and require very elaborate mathematical formulas and algorithms. But, let’s give it a try. To start…

What Are Search Engines?

Search engines are basically just super-complicated databases that have been programmed with the ability to learn from their mistakes and improve their results as they go along. We use search engines by typing in keywords to describe what we are looking for and our engine of choice will search its database to find pages that contain those keywords. 

Google makes small changes to the algorithm as often as 3 times every day, fixing little things here and there. Additionally, they roll out major core algorithm updates several times a year.

Web Pages = Library & Search Engine = Librarian

A simple way to think about it is to compare search engines with librarians – they both sort information and provide you with a list of resources according to what best suits your needs.

How do librarians work? They probably ask you what are the topics that interest you and then try to find books on those topics and put them in an order starting with the most relevant material first. The task of a librarian is relatively simple – it can be easily boiled down to:

1) Listen to the query

2) Retrieve a list of books on the topic

3) Sort the list and present it to the user

How do search engines work? They ask you what topics interest you and try to give you a list of pages that they know contain information on those topics. Sounds simple, but the devil is in the details. The key difference between the capabilities of search engines and librarians (besides having superhuman capabilities) is that while librarians can sort through books, search engines can sort through all possible web pages and present them in a manner that best matches your interests using information that is accumulated over time.

In order to serve up what it believes to be the most relevant information, search engines need to be able to scan through potentially billions of pages and sort them according to what they think is the most relevant *for you*. You and I may be able to enter the same search term and get different results depending on a multitude of factors. What factors affect search results? Well, let’s take a look.

What is the Meaning of Your Query?

First, the algorithm tries to understand the meaning behind your query. That is why it’s better to search for sentences rather than just a phrase. For example, if you search for “dog,” the results will likely just serve up information on what a dog is and not necessarily what you were looking for. However, the algorithm also tries to guess what you’re looking for and instead of just matching a keyword, it will look for other relevant media such as images, videos, etc. It will also provide suggestions to help further narrow your search.

On the other hand, if your search is “dog breeds,” you will see a list of articles on different dog breeds. The difference between the two searches is the meaning behind them, which poses a problem to search engines – they need to figure out what you mean by “dog” or “dog breeds.”

How Do Search Engines Decide What You Mean?

In order to determine that, search engines look at web pages and try to figure out what words on those pages are most relevant. Let’s take a step back for a second – so how do search engines know which words are more important than others? What you need to keep in mind is that search engines are indexing billions of web pages. In order to create an effective index, they have developed a set of algorithms and techniques to reduce the amount of data needed to store and quickly sort through it using minimal computing power.

So What Does the Index Do?

Indexes don’t contain a lot of information – just a list of words, some identifying information on web pages, and a key metric called PageRank. The basics of search engines revolve around the way they measure how important a page is based on the number and quality of links that point to it. You can think of it as a popularity contest, but instead of voting for the contestant that you think is the most popular, a web page gets a vote based on how many people link to it, visit it, share it, etc.

What is the Relevance of the Pages?

The next thing an algorithm tries to determine is whether a page is relevant to your search. If the page doesn’t seem to contain helpful information related to what you’re looking for, it probably belongs at the bottom of your results. The algorithm has a number of ways to determine how relevant something is including:

  • Keywords – How many times does a word appear on the page? What percentage of words on that page are those keywords? Does it appear near other words that are relevant to the topic of the page?
  • Page Layout – Is the relevant information easy to find within the page? If the algorithm can’t find it easily then it may not hold much value.
  • Popularity – How many people link to it? Does that page receive a lot of traffic? Is it shared on social media and is someone important endorsing it?
  • User Feedback – A site’s ranking can be affected by how long a user stays on the page, whether they continue on to other areas of your site (this is measured by what is called Bounce Rate), and several other factors.

What If There Are No Relevant Results?

It’s not uncommon to see some sites at the top of your results that aren’t necessarily relevant to your search. They’re often there because something on their site is related to a word in your query. It may not be a direct match, but search engines use a number of factors to determine how relevant something is.

However, algorithms try to maintain an objective view of subject matter, and as such, avoid the influence of subjective opinions. Algorithms are based almost entirely on math, which makes them very predictable and largely unbiased.

Does the Site Offer Quality Content?

Quality matters not only for search engines but also for users. Sites that offer useful and entertaining information will get a higher ranking than those that don’t meet this threshold. The search engine algorithm will try to understand what makes a site valuable and informative by looking at:

  • Are the pages on your site similar in content to one another? For example, if you have ten pages on your site all talking about the same subject, it probably indicates that you don’t have much to say on the topic.
  • Are there grammatical or spelling errors? If so, does this make the information seem less valuable and reliable?
  • Does the content provide a different perspective than other results? Or is it simply a ripoff of existing information from other sources?
  • Are there a lot of ads on the page? Does this make it seem spammy and therefore less valuable to users?
  • Is the website engaging in deceptive practices? Google’s webmaster guidelines lay out a number of tactics that they consider to be spammy. This includes tricking the search engine in order to get a higher ranking, using hidden text and links, cloaking, etc.

These are just a few questions an algorithm might ask when assessing site quality, but you can imagine how complex this may become. In fact, because the algorithms are limited in their ability to understand the information in the same way a human does, Google and other search engines have humans who manually review sites, as well. In fact, these evaluators are highly trained and have a nearly 200-page search quality evaluator guidelines manual.

Does the Site Offer a Good User Experience?

User experience (UX) refers to how easily a user can navigate your site, the aesthetics of the site, and how quickly they can find what they’re looking for. Of course, search engines have built their own algorithms to assess a site’s UX.

Google places a heavy emphasis on UX both in design and content. Some of the questions the algorithm seeks to answer are:

  • Does the site automatically adjust to properly accommodate different screen sizes and browsers?
    Does the page load quickly or is it bogged down with bloated coding?
  • Does the site have an effective mobile presence?
  • Is it easy to navigate the site through internal links, or is there a lot of clicking between different pages that ultimately lead back to each other?
  • Is it easy to find what you’re looking for on the site (and does this content seem relevant)?
  • Does the site engage users? Good UX includes a number of user engagement signals such as whether people are clicking on links, using the back button, bookmarking your site, etc.

Tailoring Search Results For Users

Finally, your browsing history plays a pivotal role in how the algorithm tries to predict the best results for you. Google is constantly trying to improve its ability to guess what you’re looking for.

If you’re logged into Google, they are able to tailor your search experience based on data they have about you including:

  • Your location
  • Your past searches
  • Any contextual clues from a variety of sources

While this is helpful, it can also be cause for concern. Since search engines are using so much information about users to tailor results, it’s important for them to anonymize the data they collect. This way, an individual person’s browsing history doesn’t affect how they’re shown results. It also prevents someone you know (who may appear in your browsing history) from affecting the way you’re served results.

The Future Of Search Engines – MUM is the Word

While search engine algorithms are constantly changing and becoming more sophisticated, there is still a lot of room for improvement. In fact, researchers believe that in the future we will be able to use AI technology to ask questions directly to our computers and get answers.

We already use natural language processing for queries when searching. BERT (Bidirectional Encoder Representations from Transformers) was developed by Google but will soon be replaced by MUM (Multitask Unified Model), currently the most powerful NLP in the world. This technology is able to process and analyze information more than 1,000 times faster than BERT. MUM will also help Google branch out from just being able to understand the text on a page to include analyzing text in images.

As technologies such as AI and quantum computing continue to evolve, search algorithms as we know them are going to evolve alongside them. In fact, Google has already started to integrate AI into their search algorithms by using it to predict what users are searching for when they’re typing queries in the search bar. The future certainly looks bright for search algorithms.

Need Help Understanding How Your Site Fits Into Search Algorithms?

If you’re unsure of how well your site is performing in search results, it’s a good idea to connect with a Search Engine Optimization specialist who understands the ins and outs of search engines. I can help you understand which aspects of your site may not be working optimally when it comes to search, and how you can make changes to increase your rankings. To get started, contact me today!