# # robots.txt 2013.04.25 # disallow ip-web-crawler.com. It crawls way too fast and while # it claims to obey robtos.txt directives, it does not. # If it doesn't obey the disallow, then an iptables drop # 50.31.96.6 - 50.31.96.12 could be used # # robots.txt 2013.04.17 # add some dissallow stuff for specific file extensions. # Somehow I missed it before. # # robots.txt 2013.04.04 # disallow Sosospider. Any web crawler that is too stupid to know the # difference between upper and lower case is not worthy. # # robots.txt 2013.02.28 # disallow Exabot. I wonder if the resulting search engine # database is the reason I get so many forged referrer # hits. # # robots.txt 2012.10.08 # disallow WBSearchBot. # # robots.txt 2012.09.02 # disallow SearchmetricsBot. It is mentally challenged. # # robots.txt 2012.05.03 # disallow TurnitinBot. It is mentally challenged. # # robots.txt 2012.03.29 # disallow EC2LinkFinder. I do not know if it obeys robots.txt, but I wll try. # For sure it ignores most robots.txt directives. It copies everything, hogging # bandwidth. # It is time to think of a generic deny, to cover all these new bots. # # robots.txt 2012.03.13 # disallow SWEBot. It is not polite and disobaeys robots.txt file. # # robots.txt 2012.01.29 # disallow aiHitBot # Try a useragent "InfoPath" and "InfoPath.2" dissallow. (Another MS thing.) # I am trying to get rid of what appears to be a tracking site. # 80.40.134.103, .104, .120, seem to track 92.9.131.199 and 92.9.150.29 and ... # 80.40.134.XXX does read the robots.txt file. # # robots.txt 2012.01.04 # SISTRIX crawler does not behave well. It ignores meta tags and some robots.txt directives. # Disallow it. # # robots.txt 2011.12.01 # Try to get rid of Ezooms bot, although it is not clear what its exact user agent name is. # (Days later: "User-agent: Ezooms" seems to work, but it takes a few days.) # It ignores meta tags, and has become generally annoying. # # robots.txt 2011.09.26 # Until now I have allowed Baiduspider. But it has gone mental and also ignores some meta tags. # Disallow it. # A new robot, AhrefsBot, does not behave or obey meta tags. # Disallow it. # # robots.txt 2011.06.19 # # robots.txt 2011.04.12 # Googlebot is so very very severely mentally challenged. # It ignores the NOFOLLOW meta tag. # Try to block useless content from being indexed via, yet another, # block command. # # It is still looking for pages that haven't been there for over a year now. # (see 2010.04.29) # # robots.txt 2010.10.14 # Eliminate crawl delay for Yahoo slurp (see 2007.03.13) # # robots.txt 2010.09.20 # TwengaBot is severely mentally challenged. Try global disallow for it. # Googlebot is still annoying and accessing pages it shouldn't. # # robots.txt 2010.04.29 # Googlebot is very severely mentally challenged. # Add disallow directives for directories that are not even there, # and haven't been for over 5 weeks now. # This is merely to try to get around having my request to delete the # non-existant directories from the search database being denied. # # robots.txt 2010.04.16 # Add specific directives for exabot, including a crawl delay. # Reduce the slurp (Yahoo) crawl delay (which it doesn't seem to obey anyhow). # Disallow googlebot-image. # # robots.txt 2010.04.13 # disallow taptubot, the mobile device crawler # # robots.txt 2010.04.01 # Yet another attempt to get web crawlers not to index old versions of index.html files. # All old version are called index_0???.html. # # robots.txt 2010.03.19 # Archives have been moved to a seperate directory. Add disallow directive. # # robots.txt 2010.02.10 # The Yandex web crawler behaves in a very strange manor. Block it. # Ask Robots not to copy PDF files. # # robots.txt 2009.12.07 # Fix some syntax based on feedback from http://tool.motoricerca.info/robots-checker.phtml # # robots.txt 2009.12.04 # There are still issues with googlebot. I don't want old versions of index.html # type pages indexed, but I do want the photoshop elements generated pages indexed. # Try some new directives. # # robots.txt 2009.09.09 # Googlebot is not ignoring the rebuilt directory and is obtaining .MOV videos. # Add some more googlebot specific directives. # # robots.txt 2009.07.27 # Googlebot directives are case sensitive. Add .JPG to .jpg ignore directives. # Googlebot is not ignoring old index pages as global directive indicates to. Try a googlebot # specific directive. # # robots.txt 2009.04.12 # Some robots, for example googlebot, obey global directives as well as googlebot specific directives. # Other robots, for example slurp (Yahoo) and msnbot, only obey their specific directives. # The robots.txt standard is rather weak, incomplete, and generally annoying. # Add tons of the same specific directives to each robot area. # Try to change no index Christmas pages to include a wildcard. # # robots.txt 2008.12.03ser-agent: * Disallow: /administrator/ Disallow: /bin/ Disallow: /cache/ Disallow: /cli/ Disallow: /components/ Disallow: /includes/ Disallow: /installation/ Disallow: /language/ Disallow: /layouts/ Disallow: /libraries/ Disallow: /logs/ Disallow: /modules/ Disallow: /plugins/ Disallow: /tmp/ # Block the Cuil (twiceler) robot entirely. # # robots.txt 2008.11.23 # The majestic robot comes in bursts at a high rate. Just block it. # The Cuil robot comes to much. Try to slow it down. # # robots.txt 2008.07.03 # Now msnbot has started to grab images. Try to stop it. # Googlebot is grabbing PNG files. Try to stop it. # # robots.txt 2007.11.20 # Try to disallow the panscient.com web crawler. #ser-agent: * Disallow: /administrator/ Disallow: /bin/ Disallow: /cache/ Disallow: /cli/ Disallow: /components/ Disallow: /includes/ Disallow: /installation/ Disallow: /language/ Disallow: /layouts/ Disallow: /libraries/ Disallow: /logs/ Disallow: /modules/ Disallow: /plugins/ Disallow: /tmp/ # robots.txt 2007.08.23 # Still search engine pages do not agree with contents of robots.txt file. # Add specific disallow for ~doug/rebuilt. # - put global user agent lines after specific ones. # - next will be to repeat global lines in each specific agent area. # # robots.txt 2007.05.03 # Now Googlebot has started to grab images. Try to stop it. # For whatever reason, google is mainly showing my re-built directory. It # never seems to go back to the higher level page that now has meta tags # telling it not to index those pages. Put in a global disallow. # Add some other global disallows, that I got behind on. # # robots.txt 2007.03.13 # stupid yahoo slurp comes all the time now. It supports a non-standard delay command. # so add the command. The web site doesn't state the units of measure. # # robots.txt 2007.02.11 # yahoo, slurp seems to now obey the non-standard ignore this type of file wildcard usage # try it. # # robots.txt 2006.12.29 # Delete instructions for directories that don't exist anymore # # robots.txt 2004:12:21 # Try to eliminate yahoo.com grabbing images. # Can only think of global deny. # Can not find Yahoo name, try one shown below. # # robots.txt 2004:11:16 # Try to eliminate alexa.com grabbing images. # InkTomi comes too often, can them entirely. # # robots.txt 2004:07:16 # Try to eliminate picsearch.com grabbing images. # # robots.txt 2004:07:09 # Try to eliminate altavista grabbing images. # # robots.txt for www.smythies.com 2003:12:21 # User-agent: panscient.com Disallow: / User-agent: vscooter Disallow: / User-agent: psbot Disallow: / User-agent: ia_archiver Disallow: / User-agent: MJ12bot Disallow: / User-agent: twiceler Disallow: / User-agent: Yandex Disallow: / User-agent: taptubot Disallow: / User-agent: Googlebot-Image Disallow: / User-agent: TwengaBot Disallow: / User-agent: sitebot Disallow: / User-agent: Baiduspider Disallow: / User-agent: AhrefsBot Disallow: / User-agent: Ezooms Disallow: / User-agent: sistrix Disallow: / User-agent: aiHitBot Disallow: / User-agent: InfoPath Disallow: / User-agent: InfoPath.2 Disallow: / User-agent: swebot Disallow: / User-agent: EC2LinkFinder Disallow: / User-agent: TurnitinBot Disallow: / User-agent: SearchmetericsBot Disallow: / User-agent: WBSearchBot Disallow: / User-agent: Exabot Disallow: / User-agent: Sosospider Disallow: / User-agent: ip-web-crawler.com Disallow: / User-agent: Slurp Disallow: /*.jpg Disallow: /*.JPG Disallow: /*.png Disallow: /*.PDF Disallow: /*.pdf Disallow: /disclaimer.html Disallow: /security.html Disallow: /poweredby.html Disallow: /about_smythies.html Disallow: /unused_link.html Disallow: /old_pages.html Disallow: /index_0* Disallow: /*index_0*$ser-agent: * Disallow: /administrator/ Disallow: /bin/ Disallow: /cache/ Disallow: /cli/ Disallow: /components/ Disallow: /includes/ Disallow: /installation/ Disallow: /language/ Disallow: /layouts/ Disallow: /libraries/ Disallow: /logs/ Disallow: /modules/ Disallow: /plugins/ Disallow: /tmp/ Disallow: /digital_camera/ Disallow: /lab/ Disallow: /xmas_* Disallow: /~doug/archives/ User-agent: Googlebot Disallow: /*.jpg$ Disallow: /*.JPG$ Disallow: /*.png$ Disallow: /*.PDF$ Disallow: /*.pdf$ Disallow: /index_0*$ Disallow: /*index_0*$ Disallow: /xmas_* Disallow: /~doug/archives/ Disallow: /~doug/2010.01.23/ Disallow: /~doug/2007.11.20/ Disallow: /~doug/2004.06.26/ Disallow: /digital_camera/ Disallow: /old_pages.html Disallow: /unused_link.html Disallow: /disclaimer.html Disallow: /security.html Disallow: /about_smythies.html Disallow: /poweredby.html Disallow: /*.MOV Disallow: /*.mov Disallow: /*.AVI Disallow: /*.avi Disallow: /DSCN*.htm Disallow: /lectures/ Disallow: /library/ Disallow: /join/ Disallow: /alpineclub/ Disallow: /publications/ Disallow: /notices/ User-agent: msnbot Disallow: /*.jpg$ Disallow: /*.JPG Disallow: /*.png$ Disallow: /*.PDF$ Disallow: /*.pdf$ Disallow: /disclaimer.html Disallow: /security.html Disallow: /poweredby.html Disallow: /about_smythies.html Disallow: /unused_link.html Disallow: /old_pages.html Disallow: /index_0* Disallow: /*index_0*$ Disallow: /digital_camera/ Disallow: /lab/ Disallow: /xmas_* Disallow: /~doug/archives/ User-agent: * Disallow: /*.jpg Disallow: /*.JPG Disallow: /*.png Disallow: /*.PDF Disallow: /*.pdf Disallow: /disclaimer.html Disallow: /security.html Disallow: /poweredby.html Disallow: /about_smythies.html Disallow: /unused_link.html Disallow: /old_pages.html Disallow: /index_0* Disallow: /*index_0*$ Disallow: /digital_camera/ Disallow: /lab/ Disallow: /xmas_* Disallow: /~doug/archives/ User-agent: * Disallow: /_mm/ Disallow: /_notes/ Disallow: /_baks/ Disallow: /MMWIP/ Disallow: /*.LCK Disallow: /*.bak Disallow: /*.csi Disallow: /*.mno User-agent: googlebot Disallow: *.csi User-agent: * Crawl-delay: 2 ser-agent: * Disallow: /administrator/ Disallow: /bin/ Disallow: /cache/ Disallow: /cli/ Disallow: /components/ Disallow: /includes/ Disallow: /installation/ Disallow: /language/ Disallow: /layouts/ Disallow: /libraries/ Disallow: /logs/ Disallow: /modules/ Disallow: /plugins/ Disallow: /tmp/
Following keywords were found. You can check the keyword optimization of this page for each keyword.
(Nice to have)