There is a new file emerging alongside robots.txt and sitemap.xml that forward-thinking businesses are adding to their websites: llms.txt. While robots.txt tells search engine crawlers what they can and cannot access, llms.txt is designed to communicate directly with large language models, providing structured information about your organisation that helps AI systems understand and accurately represent your business.
This guide explains what llms.txt is, how to create one, whether your business should implement it, and the specific considerations for UK regulated sectors.
What Is llms.txt?
llms.txt is a proposed standard for a plain text file placed at the root of a website (example.com/llms.txt) that provides structured, machine-readable information about an organisation specifically for consumption by large language models. The concept was popularised by Jeremy Howard and has gained traction among AI researchers and the GEO community.
The idea is straightforward. Search engines have robots.txt to understand crawling permissions. AI models need a similar mechanism to understand what an organisation is, what it does, and how it wants to be represented. llms.txt fills that gap.
A typical llms.txt file contains key information about the organisation in a format that is easy for AI models to parse: the company name, a concise description, key services, location, regulatory status, and links to the most important pages on the site. It is written in Markdown format, making it both human-readable and easily consumed by language models.
The file is not yet a formal W3C standard, but it is gaining adoption among technology companies, professional services firms, and organisations that take AI visibility seriously. Several major AI labs have acknowledged the concept, and there is growing consensus that some form of website-to-LLM communication protocol will become standard.
How to Create an llms.txt File
Creating an llms.txt file is straightforward. Here is the practical structure that works for most UK businesses.
Start with the header. The file begins with a Markdown H1 heading containing your organisation name, followed by a brief description.
# MarGen
> MarGen is a UK-based AI visibility agency specialising in Generative Engine Optimisation (GEO) for B2B brands in regulated sectors.
## About
- Founded: 2022
- Location: United Kingdom
- Sectors: Financial Services, Legal, Healthcare, Technology
- Services: GEO Strategy, AI Citation Auditing, Entity Authority Building
## Key Pages
- [Homepage](https://www.margen.net/)
- [What is GEO](https://www.margen.net/what-is-generative-engine-optimisation-geo-the-complete-guide-for-uk-businesses/)
- [AI Visibility Audit](https://www.margen.net/#audit)
- [GEO for Financial Services](https://www.margen.net/the-definitive-guide-to-geo-for-financial-services/)
## Contact
- Website: https://www.margen.net
- Email: [email protected]
Keep it concise and factual. The purpose of llms.txt is not marketing copy. It is structured information that helps AI models build an accurate entity model of your business. Stick to verifiable facts, avoid superlatives, and focus on clarity.
Include regulatory information. For UK regulated businesses, include relevant regulatory body registrations, licence numbers, and compliance frameworks. This helps AI models associate your business with the trust signals that matter for regulated sector queries.
Link to your most authoritative pages. The key pages section should link to your most comprehensive, highest-authority content. These are the pages you want AI models to prioritise when learning about your organisation.
Deploy the file at your domain root. Place the file at yourdomain.com/llms.txt so it is discoverable. Some implementations also include an llms-full.txt that provides more detailed information for models that want deeper context.
Why Regulated Businesses Should Pay Attention
For UK businesses operating in regulated sectors, llms.txt addresses a specific and growing risk: AI models misrepresenting your services, qualifications, or regulatory status.
Financial services firms regulated by the FCA have a particular interest. If an AI model incorrectly states your firm’s permissions, misrepresents your services, or confuses you with another entity, the consequences extend beyond lost business into potential regulatory territory. An llms.txt file that clearly states your FCA registration number, your permitted activities, and your service boundaries helps AI models represent you accurately.
Legal practices regulated by the SRA face similar concerns. Incorrect AI citations about practice areas, jurisdictional coverage, or partner qualifications could create professional liability issues. Providing structured, authoritative information directly to AI models reduces this risk.
Healthcare providers registered with the CQC benefit from clearly stating their registration status, service scope, and clinical specialisms. In a sector where AI hallucinations about medical capabilities could have serious consequences, proactive information provision is a sensible risk management step.
Professional services firms more broadly benefit from establishing clear entity boundaries. If your firm has a common name or operates in a sector with many similarly named competitors, llms.txt helps AI models disambiguate your entity from others.
Does llms.txt Actually Work?
This is the practical question every business owner asks, and the honest answer is: it is early days, but the signals are positive.
AI models do crawl and process root-level files. The same way search engines learned to look for robots.txt and sitemap.xml, AI crawlers and training data pipelines are increasingly looking for llms.txt. Several AI platforms have confirmed they process the file when it is available.
The downside risk is essentially zero. Creating and deploying an llms.txt file takes an hour at most. The file does not interfere with any existing website functionality. If it helps even marginally with AI accuracy, the return on investment is excellent.
Early adopters gain compounding advantage. As AI models encounter your llms.txt across multiple training cycles, the structured information it contains becomes increasingly embedded in their understanding of your entity. Businesses that implement now will have several training cycles of advantage over those that wait.
It signals sophistication to AI systems. The presence of an llms.txt file indicates that a business is actively managing its AI presence. While we cannot confirm this directly influences citation probability, it is consistent with the broader pattern of AI models favouring well-structured, well-maintained web presences.
Complementary Files to Consider
llms.txt works best as part of a broader AI-readiness strategy for your website.
robots.txt should be reviewed to ensure AI crawlers like GPTBot, ClaudeBot, and PerplexityBot can access your content. Blocking these crawlers while deploying llms.txt sends contradictory signals.
Schema.org structured data on your pages provides a different but complementary form of machine-readable information. While llms.txt provides organisation-level context, Schema.org provides page-level context.
sitemap.xml ensures all your important pages are discoverable. AI crawlers use sitemaps just as search engine crawlers do.
security.txt is another root-level file that demonstrates organisational maturity and is increasingly expected by both search engines and AI systems.
Together, these files create a comprehensive communication layer between your website and the AI systems that are increasingly determining how your brand is discovered and represented.
Not sure whether your website is properly configured for AI discovery? Request your free AI Visibility Audit and we will review your technical setup alongside your citation performance across every major AI platform.