Semantic SEO Metrics & How to Use Them

It's been almost 2 decades since Google and other search engines moved away from the purely lexical "bag of words" model of search. Search today is "things not strings" and relies heavily on vectors, embeddings, and entities. These are the things that make up what we call Semantic Search - but what does that even mean? Lexical metrics are metrics that essentially count the words on the page. Things like keyword density or TF-IDF and BM25 fall into this category - while semantic metrics use these embeddings to measure the meaning of text. AI models and GEO take advantage of these metrics too, so it's important to understand them.

This guide explains modern lexical and semantic metrics in three ways:

- Plain English: what it actually measures.

- Technical: the algorithm under the hood.

- What to do with it: how should you change your content, for both classic search engine ranking (SEO) and GEO (Generative Engine Optimization, i.e. Google AI Overviews, Claude, Grok, Perplexity, ChatGPT, Bing, Copilot).


SERPrecon scores your site against each URL on the SERP across all of these signals - some semantic, some lexical, some structural, some peer-relative and then creates an easy to understand to optimize your content for today's search engines. The goal isn't to measure any single number. It's that you can see where you compete and where you're being beat.

Semantic SEO success starts with understanding the metrics.

Lexical & Semantic SEO Metric Defintions:

Cosine Similarity

Cosine similarity measures how closely the meaning of the query matches the meaning of the page's content, even when the words aren't the same. This is what people are talking about when you hear words like vectors or embeddings. Every phrase is mapped in a multi-dimensional space (like a graph) and scored on how closely they appear to each other on this graph. SERPrecon calculates the cosine similarity of your title tag, each chunk of your webpage, and your whole page.

Technical definition: We embed the query with Gemini's embedding model, chunk the page into passages, embed each chunk, and take the maximum cosine similarity over all chunks, as well as the average.

How to Use Cosine Similarity in SEO * GEO:
Modern search relies heavily on semantic match and vectors. Google's neural retrieval (RankBrain, MUM, BERT ) and every AI answer engine work this way using cosine similarity or a version of it. If your cosine similarity is under 80%, it might as well be 0. To improve cosine similarity, write a passage that directly and fully addresses the query in the searcher's actual language. For GEO this is doubly important: AI engines pull and cite specific passages, not whole pages.

BM25

BM25 is the classic keyword match score: how often the query terms appear in the page, weighted by how rare those terms are across the SERP. It's the successor to keyword density and TF-IDF and is the still the standard first step in every search engine or retrieval system because it's quick and easy to calculate (relative to other scores.) SERPrecon (like a search engine) computes the BM25 of your title and body text separately.

Technical Definition:
BM25 measures importance, density, and length of a word within content. BM25 calculates a score by looking at how rare a search term is, how often it appears in your post, and how long that post is compared to everything else on the site. Crucially, it realizes that mentioning a keyword 50 times isn't 50 times better than mentioning it once, and it gives a 'bonus' to shorter, more concise pages that get straight to the point."

In math terms, it's BM25(D, Q) = Σ [ IDF(q) ( (f (k1 + 1)) / (f + k1 (1 - b + b (L / avgL))) ) ]

  • IDF(q): inverse document frequency

  • f: How many times the word appears on the page.

  • L / avgL: The length of this page compared to the site average.

  • k1 & b: Settings that control keyword saturation and length penalties.

A fun fact of this math is there's no minimum or maximum BM25. All scores are relative to the other documents being scored - so there's no real number to aim for. Just be better than your competitors.

How to use BM25 in SEO & GEO:
Lexical metrics still matter. combined with cosine similarity, BM25 is the first step of retrieval from the index. If you aren't retrieved, you can't be ranked. To optimize BM25, match users' actual query phrasing in your body copy at least a few times, sprinkled naturally. Don't keyword-stuff: BM25's saturation curve means the 10th occurrence helps almost nothing, and the machine learning models that come after BM25 will surely penalize the awkward writing.


Title Trigrams

Title trigram coverage comes directly from the Yandex source code leak. It's a measure they literally use to gauge title relevance. This metric measures what percentage of the query's character-level snipps (3-letter overlapping pieces called ngrams) actually appear in your page title. It catches partial matches, plurals, and morphology that exact-word matching misses.

Technical Definition:
Character trigrams of the query intersected with character trigrams of the title. The math is (matched/total_query_trigrams) as a percentage. Different from titleBM25 (word-level) trigram is character-level, so the words "ranking" and "rankings" would overlap heavily.

How to Use Title Character Trigrams in SEO & GEO:
Get the query or close lexical variants into your title. Where titleBM25 needs full word matches, trigram coverage rewards partial overlap. Both metrics agreeing means your title is well-aligned. Trigram high but titleBM25 low usually means you have the characters but not the words - a slightly different (and often weaker) form of relevance.

Title & Copy Intent

SERPrecon uses 3 main search intents based on what the searcher is trying to do:

  1. learn something (Informational)

  2. buy/do something (Transactional)

  3. decide between options (Commercial)

While other tools use a basic method of "does the query contain one of these terms?" we try to do better. SERPrecon's approach is based on cosine similarity and machine learning and actual queries. That means SERPrecon can measure intent for content that it's never seen before - even across languages! SERPrecon measures the intent of your title and body copy. They should match.

Technical Definition:
SERPrecon uses a proprietary method involving embeddings and machine learning to classify intent - so we can't share many of the details here. In testing, it aligns much better than the "query contains" methods other tools use.

How to use Intent in SEO / GEO:
Match your page format to the intent of both the query and the search result. If Google is showing all informational results, your transactional page has no chance of ranking. Informational queries deserve guides and explainers; Commercial queries deserve comparisons and "best X" tables; Transactional queries deserve product/checkout pages. Example: A blog post ranking for a Transactional query is fragile. It'll lose ground the moment a competitor publishes something more directly purchase-oriented.

Evolve Your SEO Tools

Start Your 7-Day Free Trial

Spam Score

SERPrecon's spam score is another proprietary metric that's based on real information retrieval research. This score is a rough heuristic for how repetitive or templatey the text feels based on how easy it is to compress.

Technical Explanation:
We do some math to your content to detect synonyms and such, then "compress" the text to see how small it can get. Text that compresses a lot is very repetitive, and likely spammy. This is based on some research published by search engines.

How to use Spam Score in SEO & GEO:
Modern search is hostile to AI-generated boilerplate, repeated phrasing, and template-y output. If your spam score is unusually high vs. ranking peers, your content reads as machine-generated even if it isn't. Vary sentence structure, replace stock phrases ("In today's fast-paced world…"), cut redundant intros, and avoid sections that all start the same way. Also, ease up on the keyword stuffing and synonym spinning.

Bigrams NPMI

Bigrams are two-word phrases in your text that occur together more often than statistical chance says they should. It's a way to measure the page's signature phrases. They're used all over the place in search algorithms.

Technical Explanation:
NPMI stands for Normalized Pointwise Mutual Information scored over all body bigrams. NPMI quantifies the strength of association between two terms by scaling their co-occurrence frequency against their individual probabilities.

How to use Bigrams NPMI in SEO & GEO:
Bigrams are useful for identifying the terminology of a topic. Your high-NPMI bigrams should match the topic field's vocabulary. If you're writing about retrieval, you'd expect "vector search", "semantic match", "rank fusion". If those phrases aren't bubbling up, you're either writing too generally or using the wrong jargon. AI engines pick up entity-bearing bigrams as strong topic indicators. We present the bigrams from all your competitors so you can easily incorporate them into your writing.

Entities

Entities are just named things (people, places, organizations, products, concepts) that the page mentions, with a measure of how central each is to the content. SERPrecon uses Google's own APIs to extract entities from content - so no technical explanation is given here. The data comes right from the search engine itself!

How to use Entities in SEO & GEO:
Entity coverage is the single most undervalued GEO signal. AI engines build internal knowledge graphs around entities and cite content that helps them connect and confirm entity facts. If your competitors mention 30 entities and you mention 5, you're not in the conversation. Add proper nouns, name specific products and organizations, cite the people involved. Don't just copy the same entities your competitors use - going above and beyond can help you with other semantic scoring metrics.

PageRank

We'd be remiss if we pretended that only on page signals matter. That's why we've included PageRank in our reports. PageRank (named after Larry Page) was Google's foundational algorithm: a page is important if other important pages link to it. A vote of confidence, weighted by the voter's own authority. These days Google and other search engines use variants of PageRank like Nearest Seed - but since we can't calculate that without their seed data, PageRank is the best we have.

While other tools calculate domain authority or other proprietary metrics, SERPRecon does its best to calculate a value that's close to the actual pagerank formula.

Technical Explanation:
SERPrecon's PageRank comes from a 3rd party vendor. SERPrecon does not crawl the web.

How to use PageRank in SEO & GEO:
Earn links from authoritative, topically-related sites through original research worth citing, tools/data others quote, and genuinely useful content (not link farms). The best way to increase PageRank is to do something newsworthy or so cool that other can't stop talking about. In other words, real marketing. AI engines also factor in source authority when deciding what to cite, often using their own internal authority scores that correlate strongly with link-based signals like PageRank. A page that ranks well on content alone but has no inbound links will lose to a peer that has both.

Conclusion

No single signal wins the SERP. The pattern that matters is consistency across signals: a page that wins on cossim, BM25, title BM25, coherence, and title-body alignment is genuinely well-aimed. A page that wins on one and loses on the rest is fragile — it's exploiting one channel and will collapse if the SERP shifts.

The most actionable signals for most pages, in rough order:

1. Your overall SERPrecon Content score. This number is a fusion ranking of all our metrics (sometimes called RRF) that approximates how a search engine combines all the data.

2. Top matching passage — read it and ask "would I cite this? Does it match the user intent?"

3. Cosine Similarity— If you aren't retrieved, you can't be ranked. You have to be relevant to be retrieved.

All of these semantic metrics matter. Take a look at the competition report in SERPrecon and see where you're lacking compared to who ranks, and then use this guide to improve your scores.

Insights-powered Solutions

Get Started with SERPrecon

Evolve your SEO. Leave outdated metrics behind and start analyzing your website the way a search engine does. Start your zero-risk 7-day trial today.