Well, I'm not 100% happy with the solution I've come up with, but it at least makes the problem better. I cobbled this together from several places, so here are suggestions merged in one place for the benefit of others having the same issue.
1) The Google crawler looks for a robots.txt file when accessing your site. My first thought was to modify my existing robots.txt to block https. In researching how to do that, I learned Google treats http and https as two separate sites and you need separate robots.txt files for each of them.
I created a file called robots-ssl.txt to provide the commands for https links. It contains this text to prevent indexing any links:
Code:
User-agent: *
Disallow: /
2) Next step is to direct crawlers to the robots.txt file specific for https links instead of the one used for http. I found this recommended code to place in the .htaccess file to do so:
Code:
# REDIRECT SEARCH ENGINES TO robots.txt FILE FOR SECURE PAGES
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt [NC,L]
Unfortunately, this command doesn't seem to work on Bluehost. When I try accessing
Code:
https://exampleSiteHere.com/robots.txt
the link ends up here:
Code:
https://exampleSiteHere.com/~example1/~example1/robots.txt
which produces a 404 error.
I managed to get around this problem by making the .htaccess code more explicit:
Code:
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ http://www.exampleSiteHere.com/robots_ssl.txt [NC,L]
3) My hope is the above steps will block crawlers from finding https links and remove existing ones from their search results. However, I still have hundreds of existing bad links to contend with. Since I don't want my visitors clicking on links full of errors, I found this tip to redirect them from https to http links using .htaccess. There are lots of ways to do this, and either of these seem to work:
Code:
# REDIRECT HTTPS TO HTTP (method 1)
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^(.*)$ http://www.exampleSiteHere.com/$1 [R=301,L]
# REDIRECT HTTPS TO HTTP (method 2)
RewriteCond %{HTTPS} on
RewriteRule ^(.*)$ http://www.exampleSiteHere.com/$1 [R=301,L]
=====
I'm still getting some warnings and errors in Google Webmaster Tools that make me wonder if all this will work long-term, but at least it seems to help.