Reviewed-on: #1 ## Merged The Issue #1 foundation documentation PR has been merged into `develop`. ## Summary This merge completes the first foundation pass for Arkive. It adds the initial project identity, contribution workflow, monorepo-first architecture decision, architecture overview, roadmap, and first-draft policies for source acceptance, licensing, and AI answer behavior. ## Included * `README.md` * `CONTRIBUTING.md` * `CHANGELOG.md` * `governance/decision_records/ADR-0001-use-monorepo-first.md` * `docs/architecture/overview.md` * `docs/roadmap/initial-roadmap.md` * `meta/policies/source_acceptance_policy.md` * `meta/policies/license_policy.md` * `meta/policies/ai_answer_policy.md` ## Notes This merge intentionally does not add production code, ingestion pipelines, metadata schemas, document collections, generated indexes, model files, deployment bundles, or `AGENTS.md`. `AGENTS.md` remains reserved for a separate follow-up issue so LLM-agent rules can be reviewed independently. Issue #1 can now be closed as completed. |
||
|---|---|---|
| docs | ||
| governance/decision_records | ||
| meta/policies | ||
| CHANGELOG.md | ||
| CONTRIBUTING.md | ||
| README.md | ||
Arkive
Arkive is an open-source project to build a curated, legally redistributable, multilingual, offline-first knowledge archive for survival, disaster recovery, and long-term civilization rebuilding.
The long-term goal is to preserve practical knowledge that can help people survive, recover, repair, grow food, restore infrastructure, teach others, and rebuild essential capabilities when normal systems are damaged, unavailable, or unreliable.
Arkive is intended to combine structured source curation, offline access, searchable documentation, and eventually citation-backed AI assistance.
What Arkive is
Arkive is intended to become:
- a curated practical knowledge archive;
- a legally redistributable offline documentation collection;
- a source-transparent research and recovery tool;
- a multilingual access layer over trusted source material;
- a foundation for citation-backed retrieval-augmented generation;
- a long-term project for preserving useful knowledge in durable formats.
The archive may eventually cover fields such as water purification, sanitation, first aid, medicine, food preservation, agriculture, seed saving, animal husbandry, shelter, carpentry, mechanics, electricity, radio, chemistry, materials science, textiles, education, governance, disaster response, mapping, forestry, manufacturing, toolmaking, mining, metallurgy, childbirth and women’s health, and community resilience.
What Arkive is not
Arkive is not a random dump of survival PDFs.
Arkive is not intended to treat the AI model as the source of truth.
Arkive is not a replacement for trained professionals, emergency services, medical care, engineering review, legal advice, or local expertise.
Arkive is not a place to bundle copyrighted documents just because they are available online.
Arkive is not trying to build every possible feature at once. The project starts with clear structure, policies, and boundaries before implementation work.
Core principles
Arkive is guided by these principles:
- Accuracy over quantity.
- Practical knowledge over theory.
- Redundancy for critical topics.
- Source transparency.
- Reproducibility.
- Offline-first design.
- Legal redistribution.
- Long-term maintainability.
- Modularity.
- Human survival and recovery.
Source documents are authoritative
Arkive treats source documents as the authority.
AI systems may eventually help users search, summarize, translate, compare, and navigate the archive, but the AI must not replace the original source material. Users should be able to inspect the documents and passages behind any generated answer.
For safety-critical topics, answers should be cautious, source-backed, and clear about uncertainty.
Offline-first philosophy
Arkive should remain useful without internet access.
The project should support direct browsing, search, document access, and future local AI-assisted workflows. AI may improve usability, but the archive must not become dependent on cloud services or remote APIs.
Offline access is especially important for disaster recovery, rural communities, developing regions, preparedness use, and low-connectivity environments.
Legal redistribution requirement
Arkive should only bundle documents when their license or public-domain status allows redistribution.
A document being available online does not automatically mean Arkive can redistribute it.
When a source cannot be redistributed, Arkive may still track metadata, references, acquisition notes, or external links when appropriate, but unknown-license documents must not be bundled into redistributable releases.
AI and RAG direction
Arkive is expected to support retrieval-augmented generation in the future.
The intended AI role is:
- retrieve relevant source passages;
- summarize source-backed information;
- provide citations;
- translate when useful;
- explain material in simpler language;
- refuse or qualify answers when no reliable source is available.
The assistant should follow a no-source-no-answer principle for factual and safety-critical claims.
Monorepo-first structure
Arkive starts as a structured monorepo.
The repository is organized with clean top-level boundaries so parts of the project can be split later if needed. Early development prioritizes stable policies, metadata contracts, source handling, pipeline design, runtime boundaries, and documentation over premature repository splitting.
Current top-level areas may include:
meta/— policies, metadata definitions, and source governance;data/— curated source records, manifests, and future dataset organization;pipeline/— future ingestion, extraction, validation, and indexing tools;runtime/— future offline search and AI-assisted runtime components;deploy/— future packaging and deployment support;packages/— future shared packages or reusable components;docs/— architecture, roadmap, usage, and design documentation;governance/— decision records and project governance notes;tests/— future validation and test coverage.
Current status
Arkive is in its earliest foundation stage.
The current work focuses on defining the project identity, contribution workflow, architectural direction, source acceptance policy, license policy, and AI answer policy.
No production archive, ingestion pipeline, search runtime, or AI assistant should be assumed to exist yet.
Implementation direction
Arkive is expected to start as a Python-first project for ingestion, processing, validation, indexing, and runtime prototyping because of Python’s strong document-processing, data, machine-learning, and retrieval-augmented generation ecosystem.
The project should avoid unnecessary language lock-in. Stable data contracts, documented boundaries, and transparent source handling are more important than tying every future component to one programming language.
Performance-sensitive or deployment-focused components may later be implemented in Rust or another compiled language if a real need appears.
License
The project license and content licensing model will be defined as the project foundation matures.
Code, metadata, documentation, and source documents may require different license handling. Source document licenses must be tracked separately from the Arkive project code license.