Your SaaS Vendors Are Training AI on Your Data

Last week, Atlassian quietly flipped a switch. Starting August 2026, Jira and Confluence will collect customer data by default to train the company’s AI products. Free and Standard tier users cannot opt out of metadata collection at all. Even if you do opt out, previously collected data may persist in training datasets for up to seven years.

Atlassian is not alone. Microsoft, Salesforce, and Google have all adopted similar policies in recent months. The pattern is clear: your SaaS vendors are building AI products, and your data is the fuel. The question is whether you know it — and whether you have a plan.

TL;DR

Atlassian will collect Jira and Confluence data by default from August 2026 to train its AI models — free and standard tier users cannot opt out of metadata collection
This is part of a broader industry pattern: Microsoft, Salesforce, and Google are all training AI on customer data with varying degrees of transparency
Metadata alone — story points, sprint dates, SLA values, workflow names — can reveal sensitive project structures and business performance patterns
CTOs need a SaaS data governance audit now: review every vendor’s AI training policies, opt out where possible, and classify data sensitivity by platform
The EU AI Act’s high-risk obligations take effect in August 2026, adding regulatory urgency to getting your data house in order

What Atlassian Is Actually Collecting

The policy splits collection into two buckets. Metadata covers de-identified signals: readability scores, task classifications, semantic similarity metrics, story points, sprint end dates, and Jira Service Management SLA values. In-app data covers the content itself: Confluence page titles and bodies, Jira issue titles, descriptions, comments, custom emoji names, status names, and workflow names.

The tiering is where it gets interesting — and frustrating. Enterprise customers get both types turned off by default with full opt-out controls. Premium users get in-app data off by default but cannot opt out of metadata. Free and Standard users? Metadata collection is mandatory. No toggle. No appeal.

This creates a two-tier privacy system where the level of control over your own data is directly tied to how much you pay. For startups and SMEs on lower tiers — exactly the teams likely to have sensitive early-stage product data in Jira — the options are limited.

Why Metadata Is Not Harmless

The instinctive reaction might be: “It’s just metadata, it’s de-identified, it’s fine.” That reaction is wrong.

Metadata like story points, sprint velocity, and SLA response times can reveal how your team operates, how fast you ship, where your bottlenecks are, and how you prioritise work. Workflow names and custom statuses expose your internal processes. Semantic similarity metrics hint at the nature of your product. Taken together, this telemetry paints a detailed picture of your organisation’s engineering culture, capacity, and priorities — all without touching a single line of your actual code or content.

For companies building competitive products, or for agencies handling client work across multiple organisations, this is not a theoretical risk. It is a data governance gap that most teams have not even considered.

The Broader Pattern

Atlassian’s move is not an outlier. It is the latest in a cascade of SaaS vendors quietly adjusting their data policies to feed AI ambitions:

Microsoft has been training Copilot features using data from Microsoft 365 tenants, with controls buried in admin settings that many organisations have never reviewed
Salesforce updated its terms to allow customer data to be used for AI model improvement, with opt-out mechanisms that require navigating multiple admin panels
Google Workspace introduced Gemini features trained on workspace data, with enterprise controls available only on specific plan tiers
Notion, Slack, and countless others have updated privacy policies with similar AI training clauses

The common thread is that controls exist — but they are not enabled by default, they are often tier-gated, and they require active discovery by someone on your team who knows to look for them. This is design friction as a data collection strategy.

What CTOs and Founders Should Do Right Now

1. Audit Your SaaS Stack’s AI Policies

Start with your core tools: project management, communication, documentation, CRM, and cloud infrastructure. For each vendor, answer three questions:

Does this vendor use customer data to train AI models?
Is data collection on or off by default?
Can we opt out — and is opt-out available on our current plan tier?

Document the answers. You will be surprised how many vendors have changed their policies in the last twelve months without a prominent notification.

2. Classify Data Sensitivity by Platform

Not all SaaS data carries the same risk. Your Jira board describing a stealth product launch is more sensitive than your company’s shared recipe channel on Slack. Map each platform to the type of data it holds and the business impact of that data being used in training sets.

Pay special attention to tools that hold client data. If you are an agency or consultancy, your clients’ information flowing into a vendor’s AI training pipeline is a contractual and reputational risk, not just a privacy one.

3. Opt Out Where You Can — and Escalate Where You Cannot

For each tool where opt-out is available, exercise it now. Do not wait for the policy to take effect. For tools where opt-out is tier-gated, evaluate whether the upgrade cost is justified by the data risk — or whether it is time to evaluate alternatives.

For Atlassian specifically: if you are on a Free or Standard plan and the metadata collection concerns you, your options are to upgrade to Enterprise, migrate to a self-hosted or alternative platform, or accept the risk and document that decision.

4. Update Your Vendor Assessment Process

Add AI data training policies to your vendor evaluation checklist. When evaluating new SaaS tools, ask:

What data do you collect for AI training?
Is collection on or off by default?
What are the opt-out mechanisms and retention periods?
How do you handle data deletion requests for training datasets?

Make these questions as routine as asking about uptime SLAs and GDPR compliance.

5. Prepare for the EU AI Act

The EU AI Act’s high-risk obligations take effect in August 2026 — the same month Atlassian’s new policy activates. If your organisation operates in the EU or serves EU customers, the regulatory stakes for understanding how your data flows into AI systems just got significantly higher. Ignorance of your vendors’ AI training practices will not be a defence.

Self-Hosting Is Not Always the Answer — But It Is Worth Revisiting

The reflexive response to these policies is “just self-host everything.” That is rarely practical for small teams. But the calculus has shifted. Tools like Plane (open-source Jira alternative), Outline (Confluence alternative), and Gitea offer credible self-hosted options that did not exist or were not mature enough even two years ago.

The right approach is selective self-hosting: identify which tools hold your most sensitive data and evaluate whether a self-hosted alternative is viable for those specific use cases. You do not need to self-host everything — just the tools where data sovereignty matters most.

The Bottom Line

The SaaS industry’s pivot to AI has fundamentally changed the implicit bargain of cloud software. You are no longer just paying for a tool — you are contributing training data to your vendor’s AI products. For some organisations that trade-off is acceptable. For others, particularly those handling client data, competitive intelligence, or regulated information, it is a risk that demands active management.

The worst position to be in is not knowing. Audit your stack, exercise your opt-outs, and make deliberate decisions about where your data goes. The vendors will not do this for you.

At REPTILEHAUS, we help teams navigate these decisions — from SaaS architecture reviews and self-hosted infrastructure setup to data governance frameworks that scale with your business. If you are unsure where your data is going, get in touch. We will help you find out.

📷 Photo by FlyD on Unsplash

Your SaaS Vendors Are Training AI on Your Data — Here’s What to Do About It

TL;DR

What Atlassian Is Actually Collecting

Why Metadata Is Not Harmless

The Broader Pattern

What CTOs and Founders Should Do Right Now

1. Audit Your SaaS Stack’s AI Policies

2. Classify Data Sensitivity by Platform

3. Opt Out Where You Can — and Escalate Where You Cannot

4. Update Your Vendor Assessment Process

5. Prepare for the EU AI Act

Self-Hosting Is Not Always the Answer — But It Is Worth Revisiting

The Bottom Line

Continue reading

Filter

The HTML-in-Canvas API Is Here: What It Means for Interactive Web Development in 2026

We pitched for one of our startups at the Digital Freedom Festival in Riga Latvia

MCP: The Protocol Connecting AI Agents to Everything

Let us craft your next digital masterpiece

Get to know us

Case studies

Journal

Services

Contact Us

[email protected]

Special Offer Packages

Get a Website for €1500

Schedule a call

© 2026. Website built by REPTILE.HAUS Freelance Developer Dublin.