No description
Find a file
Jack Frostbyte 668c3cf367 Merge pull request 'Add Arkive project foundation documents' (#1) from feature/issue-1-project-foundation into main
Reviewed-on: #1

## Merged

The Issue #1 foundation documentation PR has been merged into `develop`.

## Summary

This merge completes the first foundation pass for Arkive.

It adds the initial project identity, contribution workflow, monorepo-first architecture decision, architecture overview, roadmap, and first-draft policies for source acceptance, licensing, and AI answer behavior.

## Included

* `README.md`
* `CONTRIBUTING.md`
* `CHANGELOG.md`
* `governance/decision_records/ADR-0001-use-monorepo-first.md`
* `docs/architecture/overview.md`
* `docs/roadmap/initial-roadmap.md`
* `meta/policies/source_acceptance_policy.md`
* `meta/policies/license_policy.md`
* `meta/policies/ai_answer_policy.md`

## Notes

This merge intentionally does not add production code, ingestion pipelines, metadata schemas, document collections, generated indexes, model files, deployment bundles, or `AGENTS.md`.

`AGENTS.md` remains reserved for a separate follow-up issue so LLM-agent rules can be reviewed independently.

Issue #1 can now be closed as completed.
2026-06-03 19:15:17 -04:00
docs Add Arkive project foundation documents 2026-06-03 18:56:19 -04:00
governance/decision_records Add Arkive project foundation documents 2026-06-03 18:56:19 -04:00
meta/policies Add Arkive project foundation documents 2026-06-03 18:56:19 -04:00
CHANGELOG.md Add Arkive project foundation documents 2026-06-03 18:56:19 -04:00
CONTRIBUTING.md Add Arkive project foundation documents 2026-06-03 18:56:19 -04:00
README.md Add Arkive project foundation documents 2026-06-03 18:56:19 -04:00

Arkive

Arkive is an open-source project to build a curated, legally redistributable, multilingual, offline-first knowledge archive for survival, disaster recovery, and long-term civilization rebuilding.

The long-term goal is to preserve practical knowledge that can help people survive, recover, repair, grow food, restore infrastructure, teach others, and rebuild essential capabilities when normal systems are damaged, unavailable, or unreliable.

Arkive is intended to combine structured source curation, offline access, searchable documentation, and eventually citation-backed AI assistance.

What Arkive is

Arkive is intended to become:

  • a curated practical knowledge archive;
  • a legally redistributable offline documentation collection;
  • a source-transparent research and recovery tool;
  • a multilingual access layer over trusted source material;
  • a foundation for citation-backed retrieval-augmented generation;
  • a long-term project for preserving useful knowledge in durable formats.

The archive may eventually cover fields such as water purification, sanitation, first aid, medicine, food preservation, agriculture, seed saving, animal husbandry, shelter, carpentry, mechanics, electricity, radio, chemistry, materials science, textiles, education, governance, disaster response, mapping, forestry, manufacturing, toolmaking, mining, metallurgy, childbirth and womens health, and community resilience.

What Arkive is not

Arkive is not a random dump of survival PDFs.

Arkive is not intended to treat the AI model as the source of truth.

Arkive is not a replacement for trained professionals, emergency services, medical care, engineering review, legal advice, or local expertise.

Arkive is not a place to bundle copyrighted documents just because they are available online.

Arkive is not trying to build every possible feature at once. The project starts with clear structure, policies, and boundaries before implementation work.

Core principles

Arkive is guided by these principles:

  1. Accuracy over quantity.
  2. Practical knowledge over theory.
  3. Redundancy for critical topics.
  4. Source transparency.
  5. Reproducibility.
  6. Offline-first design.
  7. Legal redistribution.
  8. Long-term maintainability.
  9. Modularity.
  10. Human survival and recovery.

Source documents are authoritative

Arkive treats source documents as the authority.

AI systems may eventually help users search, summarize, translate, compare, and navigate the archive, but the AI must not replace the original source material. Users should be able to inspect the documents and passages behind any generated answer.

For safety-critical topics, answers should be cautious, source-backed, and clear about uncertainty.

Offline-first philosophy

Arkive should remain useful without internet access.

The project should support direct browsing, search, document access, and future local AI-assisted workflows. AI may improve usability, but the archive must not become dependent on cloud services or remote APIs.

Offline access is especially important for disaster recovery, rural communities, developing regions, preparedness use, and low-connectivity environments.

Arkive should only bundle documents when their license or public-domain status allows redistribution.

A document being available online does not automatically mean Arkive can redistribute it.

When a source cannot be redistributed, Arkive may still track metadata, references, acquisition notes, or external links when appropriate, but unknown-license documents must not be bundled into redistributable releases.

AI and RAG direction

Arkive is expected to support retrieval-augmented generation in the future.

The intended AI role is:

  • retrieve relevant source passages;
  • summarize source-backed information;
  • provide citations;
  • translate when useful;
  • explain material in simpler language;
  • refuse or qualify answers when no reliable source is available.

The assistant should follow a no-source-no-answer principle for factual and safety-critical claims.

Monorepo-first structure

Arkive starts as a structured monorepo.

The repository is organized with clean top-level boundaries so parts of the project can be split later if needed. Early development prioritizes stable policies, metadata contracts, source handling, pipeline design, runtime boundaries, and documentation over premature repository splitting.

Current top-level areas may include:

  • meta/ — policies, metadata definitions, and source governance;
  • data/ — curated source records, manifests, and future dataset organization;
  • pipeline/ — future ingestion, extraction, validation, and indexing tools;
  • runtime/ — future offline search and AI-assisted runtime components;
  • deploy/ — future packaging and deployment support;
  • packages/ — future shared packages or reusable components;
  • docs/ — architecture, roadmap, usage, and design documentation;
  • governance/ — decision records and project governance notes;
  • tests/ — future validation and test coverage.

Current status

Arkive is in its earliest foundation stage.

The current work focuses on defining the project identity, contribution workflow, architectural direction, source acceptance policy, license policy, and AI answer policy.

No production archive, ingestion pipeline, search runtime, or AI assistant should be assumed to exist yet.

Implementation direction

Arkive is expected to start as a Python-first project for ingestion, processing, validation, indexing, and runtime prototyping because of Pythons strong document-processing, data, machine-learning, and retrieval-augmented generation ecosystem.

The project should avoid unnecessary language lock-in. Stable data contracts, documented boundaries, and transparent source handling are more important than tying every future component to one programming language.

Performance-sensitive or deployment-focused components may later be implemented in Rust or another compiled language if a real need appears.

License

The project license and content licensing model will be defined as the project foundation matures.

Code, metadata, documentation, and source documents may require different license handling. Source document licenses must be tracked separately from the Arkive project code license.