A vulnerability with embedding Google search results on your web site has recently appeared in the news. This has been an exploit for a while and there are some ways to prevent it and fix an existing problem.
Any web site that displays a user’s unfiltered search term may be inadvertently hosting nefarious content and allowing Google to index it.
Let’s say you have a web site that lets users search your pages using Google. Your search result page displays results from pages in your own site that Google has indexed. This page also shows a “you searched for” block above the results which shows the user’s search term.
This is innocuous when the search term pertains to your web site (e.g., “where can I adopt a puppy” or “where can I find cancer treatment”).
But it is nefarious when the search term has explicit information that a bad actor means to have indexed by Google. To a Google bot, it appears that your site is hosting this nefarious content, and your site will show as the source of this information in Google search results.
Note that the indexable content is the search term. The search itself does not have to be successful or return any results.
I don’t know the exact mechanism a bad actor would use to index such content. It is likely possible by making a web page with links to search results on other sites.
The most effective way to prevent search result pollution is to not write out a user’s search term. Even Google does not show a user’s search term outside of the search input element.
Another way to prevent the problem is to add a robots.txt file to your web site that tells Google not to crawl the search results page:
User-agent: *
Disallow: /my-search-page/
You can use an alternative to a robots.txt file, too.
This approach is no guarantee, though: it relies on Google to honor the disallow rule. If Google decides to ignore your disallow rule, your site will be vulnerable.
After blocking the Google bot from your search page and removing the element writing out a user’s search term, send a page removal request to Google. This part is a moving target as Google changes the tools and rules around such requests now and then. Good luck!