Safe Web Research

Public-source research must be permission-aware.

This is a Crawl4AI-style adapter plan for future user-authorized and public research intake. Live crawling is disabled until dependencies, hosting, queueing, robots handling, storage, and legal review are complete.

allowed with permission

User-owned websites

Allowed when the user confirms ownership or authorization and rate limits are respected.

allowed with permission

Public docs and READMEs

Allowed for research summaries when robots.txt, license terms, and provider terms allow access.

allowed with permission

Public product pages

Allowed for market research notes when collection is transparent and rate limited.

blocked

Paywalled or private data

Blocked. Do not bypass paywalls, logins, platform rules, or access controls.

blocked

Social platform mass scraping

Blocked. Use official APIs, public datasets, or user-provided exports where permitted.

needs review

Government or public data

Review source terms, attribution, freshness, and permitted use before collection.

Research source intake placeholder

URL review

User-provided URL required

Permission checkbox

Owner confirms permission before collection

Crawl depth

1-2 levels until reviewed

Rate limit

slow, respectful, and source-specific

Storage

summary placeholders only until database/RLS review

Live crawling

disabled

User-owned websites

Default: allowed with permission

company homepage, owned landing page, owned help docs

Public docs

Default: allowed with permission

official documentation, public README, release notes

Public product pages

Default: allowed with permission

pricing page, feature page, public changelog

Restricted/private sources

Default: blocked

paywalled pages, private accounts, login-required dashboards

Public data sites

Default: needs review

government data, public statistics, open datasets