# MetarCentral Aviation Weather - Robots.txt # Optimized for crawl budget and content quality # Last updated: 2026-02-16 User-agent: * # === ALLOW: Quality Content === Allow: / Allow: /airport/ Allow: /airports/ Allow: /learn/ Allow: /calculator/ Allow: /region/ Allow: /country/ Allow: /about Allow: /privacy Allow: /disclaimer Allow: /fir/ Allow: /aircraft/ Allow: /sitemap*.xml # === DISALLOW: API & System Files === Disallow: /api/ Disallow: /admin/ Disallow: /scripts/ Disallow: /includes/ Disallow: /setup/ Disallow: /vendor/ Disallow: /cache/ Disallow: /public/ Disallow: /docs/ Disallow: /sql/ Disallow: /tests/ Disallow: /*.php$ Disallow: /*.log$ Disallow: /*.sql$ # === DISALLOW: Display/Utility Pages === Disallow: /display/ Disallow: /display? # === DISALLOW: Query Parameters (Crawl Budget) === # Nearby weather parameter creates duplicate content Disallow: /airport/*?metar= Disallow: /*?metar= # Historical date parameters create infinite crawl paths Disallow: /airport/*/historical? Disallow: /weather/*/historical? Disallow: /*?date= Disallow: /*?time= # Language parameters (only English supported) Disallow: /*?lang= # Multiple query parameters Disallow: /*?*&*&* # Print/mobile/format variations Disallow: /*?print= Disallow: /*?mobile= Disallow: /*?format= Disallow: /*?debug= Disallow: /*?preview= # === DISALLOW: Low-Value Sub-Pages Without Weather === # TAF/historical/NOTAM pages depend on weather data availability # Individual pages set noindex headers, but block crawling to save budget Disallow: /airport/*/taf? Disallow: /airport/*/charts? Disallow: /airport/*/notam? # === LEGACY URLs: Redirect to canonical === # /weather/metar/ and /weather/taf/ redirect to /airport/ pages Disallow: /weather/metar/ Disallow: /weather/taf/ # === SEARCH ENGINE SPECIFIC RULES === User-agent: Googlebot # Allow Googlebot reasonable crawl rate Allow: /airport/ Allow: /learn/ Allow: /calculator/ User-agent: Bingbot Crawl-delay: 2 Allow: /airport/ Allow: /learn/ User-agent: YandexBot Crawl-delay: 3 # === BLOCK: Aggressive SEO Crawlers === User-agent: AhrefsBot Crawl-delay: 30 User-agent: SemrushBot Crawl-delay: 30 User-agent: DotBot Crawl-delay: 30 User-agent: MJ12bot Disallow: / User-agent: BLEXBot Disallow: / User-agent: DataForSeoBot Disallow: / # === AI CRAWLERS: Allow Content, Block API === # Reference: https://metarcentral.com/llms.txt User-agent: GPTBot Allow: / Allow: /llms.txt Allow: /llms-full.txt Allow: /.well-known/ Disallow: /api/ User-agent: ChatGPT-User Allow: / Allow: /llms.txt Allow: /llms-full.txt Allow: /.well-known/ Disallow: /api/ User-agent: Claude-Web Allow: / Allow: /llms.txt Allow: /llms-full.txt Allow: /.well-known/ Disallow: /api/ User-agent: anthropic-ai Allow: / Allow: /llms.txt Allow: /llms-full.txt Allow: /.well-known/ Disallow: /api/ User-agent: PerplexityBot Allow: / Allow: /llms.txt Allow: /llms-full.txt Allow: /.well-known/ Disallow: /api/ User-agent: Applebot-Extended Allow: / Allow: /llms.txt Allow: /llms-full.txt Allow: /.well-known/ Disallow: /api/ User-agent: cohere-ai Allow: / Allow: /llms.txt Allow: /llms-full.txt Allow: /.well-known/ Disallow: /api/ User-agent: Google-Extended Allow: / Allow: /llms.txt Allow: /llms-full.txt Allow: /.well-known/ Disallow: /api/ # === SITEMAP === Sitemap: https://metarcentral.com/sitemap-index.xml