a
Go to content Go to navigation Go to search

A Little SERP Love from the Google Mother Ship

December 17th, 2006 by metapilot

I noticed a couple of hits coming from Google’s direction over the past week and since I haven’t paid much attention to metapilot.com’s indexation for some time, so I figured it was time to take a look.Google:
site:www.metapilot.com shows 4700+ results and it looks like about 35 of them are my actual pages–not the previous owner’s. Of those 35, about 15 of them are from the blog

Yahoo
The site: command shows something interesting. Back in September, Yahoo results for this search showed only pages from my site, it now shows results in the thousands –the same as Google does. Unlike Google, however, no cache version of all those old pages are available. Of the 6,400+ results, about 65 of them are mine and about 63 of those are from the blog. Another interesting point is that while the home page is listed as the first result for this search (thank goodness, at least for that), it is pages from the old site that populate results 2 through 33 with some of the blog pages kicking in between 34 and 39. This is interesting because Yahoo is known to list the pages in this search in order of importance.

MSN
MSN shows 69 results for the site:www.metapilot.com search and 39 of them are from the blog.

Waiting………….

September 2nd, 2006 by metapilot

With Google steady at the 9000 to 12000 (old-site, supplimental) pages showing for the site:metapilot.com search and up until today showing only two pages from my site (at the very bottom of the all the other supplimental pages), and Yahoo up the page count of the new site by page or two a week, I getting figitty.Google had shown one of the blog pages in that search, but today that is gone and only the home page, indexed Aug. 21, is showing. I’m not at all sure how google found that blog page since it wasn’t until at least a week later that I put any sort of public link to the blog. The only thing I can think of is that in messing around during the installation, and working on making a static home page (coming) I had an index.php page in the root along with the index.html page and somehow that got picked up? Thing is it wasn’t even the index.php page that was indexed, it was an archive page. Anyhow, It’s gone at least for the moment.

Since I am getting figitty, I re-crawled the site with my trusty dusty sitemap tool (I use the one over at auditmypc.com) so that my sitemaps included all the pages currently in the blog. I have two sitemaps, urllist.txt and sitemap.txt because urllist.txt used to be the only filename that Yahoo look when you submitted a sitemap as a feed–that was before Yahoo Site Explorer, which doesn’t clearly define a specific file name. Sitemap.txt is the file name Google suggests if you are using a text file as the sitemap you submit to Google Webmaster Tools and rather than telling Google to look for a file created spefically for Yahoo Sitemapts, I make one with that name too. For the time being it is faster and easier to have these two sitemaps than to figure out the ideal way to have a single one. (Did you get all that?)

So the new sitemaps were submitted to Google and Yahoo yesterday and the domain submitted to MSN, as well. I’m hoping this jogs some changes in the index in the next week or so. Of course, I won’t be able to be sure that this is what caused them but at least, it’s gives the felling of doing something to help push the process along.

How Things Look in Google Sitemaps & Yahoo Site Explorer

September 1st, 2006 by metapilot

Following Google’s lead, Yahoo has came out with their own version of a sitemaps tool and rolled into the the Yahoo Site Explorer Beta tool. There is a lot of debate over the value of Google’s sitemap tool and Yahoo doesn’t really make any advances over it. One nice thing about it, though, is that you can now easily get the last-crawled date for any page that you have listed in “My Sites” (you have to have a free yahoo acount in order to access My Sites information).
In order to set up a new site in Yahoo Site Explorer, enter the site’s URL and click “Add My Site”, afterwhich you’re presented with links to Manage and Authenticate your new site. The coolest thing about the tool is that you can communicate to Yahoo that you want it to visit your site and it will go there and grab your feed in “real time”. Your feed can be RSS, Atom, a txt file or a compressed text file (.gz only) and by real time, I mean that you might have to wait a few minutes for it to refresh your screen and show you that it’s verified your feed exists.

The feed is the conduit through/by which you are telling Yahoo about URLs you want it to crawl. Google Sitemaps adds an additional conduit, or feed choice– a .xml file with which you can make your list of URLs dynamic by running a python script on your web server and I expect to see something of that nature coming from Yahoo in the near future. Very basically, though, all you need for Yahoo Site Explorer is a .txt file with a list all the URLs on your site that you wnat crawled (one URL per line) named urllist.txt.. Upload it to your root directory and after you click on “Manage Site” in Yahoo Site Explorer, type “urllist.txt” into the field and off goes the bot to check it out.
Before you can get to the “good” information about your site, you have to authenticate your site. This lets Yahoo know that you currently have access to the site’s rood directory, which means you’re likely to worthy of knowing the any little insights Yahoo Site Explorer might provide you. Whey you click on the “Authenticate” link, you can choose to download an authentication file (which you can save directly to your root directory, if you want) or make your own authentication file with the file name and contents presented. Once the file is placed in your root directory, click “Authenticate” and your site gets put into a pending authenication que until Yahoo crawls the feed. Within 24 hours, I could see that my status was no longer “Pending” but rather, I was now a “processed” site.

On to the real business, Yahoo has racheted up the number of indexed pages to 12. It’s good to see things filling in there. Over on the marginally more useful Google Webmaster Tools, I can see that the index is ranking the old site for some odd keywords, however zero traffic comes from any of his old link partners or search engine listings–at least not from anyone directly clicking on a link.

Google Maintains Cache of Banned Domain’s Content but Doesn’t Show it During Ban

August 26th, 2006 by metapilot

For some reason, I thought 1300 pages was the max size of that spam site that was at my domain name before me but I see that as of today, the count’s up to over 8000 pages–

Google Screenshot

and growing every time I go back to look (10 minutes after I wrote the previous sentence, the count is over 11,00 pages–

Google Screenshot

(Note the new link to Google’s video search)

Now that I think about it though, these different numbers are most likely due to results coming from different data centers.Obviously, Google maintains all of this page information in the index even though the domain’s been banned for some time. It is appearing that when Google turns the domain back on, the results start back up as though the ban hadn’t existed. What I can’t tell, though, is whether Google continued to crawl that previous site while the ban was in place or if it stopped crawling it once the ban kicked in.

In any case, as the those old pages continue to populate the site:domain search in Google I’m seeing that it is showing the cache for these pages and it shows the “View as HTML” link for the couple of dozen PDF files that had been indexed from the site.

Here’s an example of the fine writing style that filled up those (now) 12,000-plus pages:
example of text from spam site

I’d guess this is from early or very low quality content auto-generation software that was probably built into the spam site creation software used to deploy the site.

When is being Listed in Google Supplimentals a Good Thing?

August 22nd, 2006 by metapilot

I’ll tell you–it’s when your page is in transition from a page that is not listed in the regular index to one that is listed in the regular index.It was just two days ago that I was sorta thinking that I might start seeing some action for the “site:www.metapilot.com” search within a week or so, and voila!, here it is–5 days early.
google08-21-2006-2.gif

Google Reinclusion - Not Banned In Google Either

August 20th, 2006 by metapilot

As of today, there is evidence that there is not an all out ban on the new domain name with Google. I like to set up alerts in Google (I like the real-time option) so that I know when there has been a change in the in output from the Google algos for site’s I’m working on and today, I received my email that there was a change in the output for metapilot.com. Since the alert is for “metapilot” which covers both domains, I have to check the source code of the alert emails that I receive to see which domain the included link is alert is referring to (probably should just change it to “metapilot.com”).

Here’s the alert:

Read the rest of this entry »