# The FULL URL to the DSpace sitemaps # The https://library.oapen.org will be auto-filled with the value in dspace.cfg # XML sitemap is listed first as it is preferred by most search engines Sitemap: https://library.oapen.org/sitemap Sitemap: https://library.oapen.org/htmlmap ########################## # Default Access Group # (NOTE: blank lines are not allowable in a group record) ########################## User-agent: * # Disable access to Discovery search and filters Disallow: /discover Disallow: /search-filter Disallow: /handle/*/*/discover Disallow: /handle/*/*/search-filter Disallow: /mapping # # Optionally uncomment the following line ONLY if sitemaps are working # and you have verified that your site is being indexed correctly. Disallow: /browse Disallow: /handle/20.500.12657/*/browse # # If you have configured DSpace (Solr-based) Statistics to be publicly # accessible, then you may not want this content to be indexed Disallow: /statistics # # You also may wish to disallow access to the following paths, in order # to stop web spiders from accessing user-based content Disallow: /contact Disallow: /feedback Disallow: /forgot Disallow: /login Disallow: /register Crawl-delay: 10 ############################## # Section for misbehaving bots # The following directives to block specific robots were borrowed from Wikipedia's robots.txt ############################## # advertising-related bots: User-agent: Mediapartners-Google* Disallow: / # Crawlers that are kind enough to obey, but which we'd rather not have # unless they're feeding search engines. User-agent: UbiCrawler Disallow: / User-agent: DOC Disallow: / User-agent: Zao Disallow: / # Some bots are known to be trouble, particularly those designed to copy # entire sites. Please obey robots.txt. User-agent: sitecheck.internetseer.com Disallow: / User-agent: Zealbot Disallow: / User-agent: MSIECrawler Disallow: / User-agent: SiteSnagger Disallow: / User-agent: WebStripper Disallow: / User-agent: WebCopier Disallow: / User-agent: Fetch Disallow: / User-agent: Offline Explorer Disallow: / User-agent: Teleport Disallow: / User-agent: TeleportPro Disallow: / User-agent: WebZIP Disallow: / User-agent: linko Disallow: / User-agent: HTTrack Disallow: / User-agent: Microsoft.URL.Control Disallow: / User-agent: Xenu Disallow: / User-agent: larbin Disallow: / User-agent: libwww Disallow: / User-agent: ZyBORG Disallow: / User-agent: Download Ninja Disallow: / # Misbehaving: requests much too fast: User-agent: fast Disallow: / # # If your DSpace is going down because of someone using recursive wget, # you can activate the following rule. # # If your own faculty is bringing down your dspace with recursive wget, # you can advise them to use the --wait option to set the delay between hits. # #User-agent: wget #Disallow: / # # The 'grub' distributed client has been *very* poorly behaved. # User-agent: grub-client Disallow: / # # Doesn't follow robots.txt anyway, but... # User-agent: k2spider Disallow: / # # Hits many times per second, not acceptable # http://www.nameprotect.com/botinfo.html User-agent: NPBot Disallow: / # A capture bot, downloads gazillions of pages with no public benefit # http://www.webreaper.net/ User-agent: WebReaper Disallow: / # CLOCKSS system has permission to ingest, preserve, and serve this open access Archival Unit
Se han encontrado las siguientes palabras clave. Comprueba si esta página está bien optimizada para cada palabra clave en concreto.
(Deseable)