Academic Wiki
Process academic materials (lectures, tutorials, finals) into a structured, interlinked Obsidian wiki. This skill combines the LLM Wiki pattern with academic study workflows.
Related Skills
- problem-set-synthesis — Create interleaved problem sets for exam preparation (use when user wants practice problems, study sets, or mastery materials)
Core Pattern
Raw sources (immutable) → Wiki (LLM-generated markdown) → Schema (this skill)
The LLM owns the wiki layer — creating source pages, concept explanations, and entity profiles. You curate sources and guide emphasis.
Directory Structure
wiki/
├── inbox/ # New files awaiting processing
├── raw/ # Original PDFs (immutable, processed)
│ ├── lectures/
│ ├── tutorials/
│ └── finals/
├── wiki/
│ ├── index.md # Content catalog
│ ├── log.md # Chronological record
│ ├── sources/ # Source summaries (lectures, tutorials, finals)
│ ├── concepts/ # Topic explanations
│ ├── entities/ # People, courses, institutions
│ └── synthesis/ # Cross-cutting analyses, problem sets, study materials
└── assets/ # Images, diagrams
The Four Layers
1. Sources (wiki/sources/)
Summaries of individual lectures, tutorials, and exams.
- One page per source
- Links to course entity and concepts covered
- Example:
FAD1014 L21 — Introduction to Series.md
2. Concepts (wiki/concepts/)
Topic explanations that may span multiple sources.
- One page per concept
- Links to all sources that mention it
- Example:
Taylor & Maclaurin Series.md
3. Entities (wiki/entities/)
People, courses, institutions.
- Course pages list all lectures and concepts
- Lecturer pages list courses taught
- Example:
FAD1014 - Mathematics II.md
4. Synthesis (wiki/synthesis/)
Cross-cutting analyses, problem sets, exam prep, comparative studies.
- Combines multiple concepts
- Study materials and practice sets
- Example:
FAD1014 Mastery Set — Interleaved Mathematics II.md
Workflow
Phase 1: Survey
When the user provides academic materials:
- List all files — use
globorbash findto discover PDFs, ZIPs, Markdown files - Check inbox — look in
inbox/folder for new materials - Extract ZIPs — unzip to a staging area
- Sample content — read first page of key PDFs to identify subjects
- Categorize — sort by subject (Maths, Physics, Chemistry, etc.)
- Handle empty files — remove or flag empty/corrupted files
Phase 2: Content Extraction (Image-First)
PRIMARY METHOD: Direct Image Processing
Read PDFs directly as images using the file read tool. The read tool can process PDF files and return them as visual content.
Read: /path/to/lecture.pdf
Why image-first:
- Preserves mathematical notation, diagrams, and formatting exactly as presented
- Captures lecturer signatures, slide layouts, and visual annotations
- Handles scanned documents and complex layouts better than text extraction
- No loss of structural context (bullet points, tables, equations rendered visually)
Process:
- Read the PDF file directly — the tool will render pages as images
- Process content visually — transcribe equations, diagrams, and text
- For multi-page PDFs, read sequentially if needed
- Capture all visual information including:
- Slide headers and titles
- Mathematical equations (transcribe to LaTeX)
- Diagrams and graphs (describe or note for asset extraction)
- Lecturer signatures and attributions
- Important highlights and annotations
FALLBACK METHOD: Text Extraction
Only use text extraction when direct image processing is unavailable or fails:
# Fallback 1: pdftotext
pdftotext "/path/to/lecture.pdf" -
# Fallback 2: Python
python3 -c "
from pypdf import PdfReader
reader = PdfReader('/path/to/lecture.pdf')
for page in reader.pages:
print(page.extract_text())
"
Tips for image processing:
- Read all pages — academic content often spans multiple pages
- Transcribe math symbols carefully to LaTeX ($...$ inline, $...$ display)
- Look for lecturer signatures at end of slides
- Note any diagrams that should be extracted to assets/
- Handle complex layouts by describing structure
Phase 3: Parallel Ingestion
For multiple files in inbox: Process concurrently using subagents.
Spawn one subagent per file (or per subject group):
Task per lecture:
- Read existing wiki/index.md for structure
- Create source page for the lecture
- Create/update concept pages for key topics
- Create/update entity pages (lecturers, courses)
- Update course entity with links
Each subagent should:
- Use Obsidian Flavored Markdown
- Include YAML frontmatter with title, date, tags, course
- Add wikilinks to related pages
- Follow existing wiki conventions
- Use LaTeX for mathematical formulas ($...$ for inline, $$...$$ for display)
Phase 4: Index & Log
After subagents complete:
- Update index.md — add new sources, concepts, entities
- Append to log.md — record what was ingested with format:
[date] ingest | description - Verify links — ensure all wikilinks resolve
- Archive sources — move processed PDFs from
inbox/toraw/lectures/
Page Templates
Source Page (Lecture)
---
title: FAD1014 L1-L2 — Integration
date: 2026-04-27
tags:
- source/lecture
- subject/mathematics
- status/seedling
course: "<a href="/pasum/entities/fad1014-mathematics-ii/">FAD1014 - Mathematics II</a>"
lecturer: "<span class="broken-link" title="Page not found: Dr Name">Dr Name</span>"
---
# FAD1014 L1-L2 — Integration
Brief summary of lecture content.
## Key Points
- Point 1
- Point 2
## Key Equations (for math lectures)
$$E = hf = \frac{hc}{\lambda}$$
## Links
- <a href="/pasum/concepts/integration-techniques/">Integration Techniques</a>
- <a href="/pasum/entities/fad1014-mathematics-ii/">FAD1014 - Mathematics II</a>
Concept Page
---
title: Integration Techniques
date: 2026-04-27
tags:
- concept
- subject/mathematics
- status/seedling
aliases:
- Integration
---
# Integration Techniques
Explanation of the concept.
## Methods
1. Substitution
2. By Parts
## Related
- <a href="/pasum/entities/fad1014-mathematics-ii/">FAD1014 - Mathematics II</a>
Entity Page (Course)
---
title: FAD1014 - Mathematics II
date: 2026-04-27
tags:
- entity/course
- subject/mathematics
institution: "<span class="broken-link" title="Page not found: University Name">University Name</span>"
---
# FAD1014 — Mathematics II
Course description.
## Lectures
- <span class="broken-link" title="Page not found: FAD1014 L1-L2 — Integration">FAD1014 L1-L2 — Integration</span>
## Concepts
- <a href="/pasum/concepts/integration-techniques/">Integration Techniques</a>
Synthesis Page (Problem Set)
---
title: "FAD1014 Mastery Set — Interleaved Topics"
date: 2026-04-28
course: FAD1014 Mathematics II
tags: [mathematics, interleaved-practice, synthesis, mastery]
---
# [Course]: Interleaved Mastery Problem Set
Cross-cutting practice problems combining multiple topics.
## Problem 1: [Name]
(a) [Part using concept A]
(b) [Part using concept B]
(c) [Part connecting A and B]
## Related
- <span class="broken-link" title="Page not found: Concept A">Concept A</span>
- <span class="broken-link" title="Page not found: Concept B">Concept B</span>
- <span class="broken-link" title="Page not found: Source Lecture">Source Lecture</span>
Karpathy Guidelines Applied
- Surgical changes — each subagent touches only their subject area
- Simplicity first — minimal viable wiki structure, expand as needed
- Goal-driven — define success: all sources catalogued, all links resolve
- Surface assumptions — if a PDF's subject is unclear, ask the user
- Concurrent processing — use parallel subagents for multiple files
Subagent Prompt Template
You are building the [SUBJECT] section of the academic wiki at [WIKI_PATH]
READ FIRST:
1. Read wiki/index.md to understand structure
2. Read existing source pages for format
3. Check for existing course/concept pages
LECTURE CONTENT:
Read the PDF file(s) directly to process them as images. Do NOT extract text via pdftotext or Python scripts. Process the visual content directly, transcribing equations to LaTeX and noting diagrams.
YOUR TASK:
Create pages for [COURSE CODE] [LECTURE TITLE]:
**Source Pages** (in wiki/sources/):
- Create: <span class="broken-link" title="Page not found: Course L## — Topic">Course L## — Topic</span>
- YAML frontmatter with title, date, tags, course link, lecturer
- Brief summary of lecture
- Key points with wikilinks to concepts
- Key equations in LaTeX
- Example problems if present
- Lecturer attribution (identify from signature)
**Concept Pages** (in wiki/concepts/):
- Create/update concept pages for key topics covered
- Include definitions, formulas, examples
- Link back to source page
**Entity Pages** (in wiki/entities/):
- Update course page to add lecture link
- Create/update lecturer entity if new
**Log Update**:
- Append to wiki/log.md with ingestion record
Return summary of pages created/modified.
Example Execution
Scenario 1: Processing Inbox with Multiple Files
User: "Do the same for the current inbox. Do it concurrently"
Agent:
- List inbox contents — find 3 files (2 PDFs, 1 MD)
- Check MD file — empty, remove it
- Extract text from both PDFs concurrently
- Spawn 2 subagents in parallel (one per PDF)
- Each subagent creates source page + concept pages
- Update index.md with all new pages
- Archive PDFs to raw/lectures/
- Append to log.md: "[date] ingest | 2 lecture PDFs → 2 source pages, 4 concept pages"
Scenario 2: Bulk Import from Downloads
User: "I have a bunch of lecture PDFs in ~/Downloads/lectures/ for my PASUM courses"
Agent:
- Survey ~/Downloads/lectures/ — find 50 PDFs across Maths, Physics, Chemistry
- Spawn 3 subagents (one per subject) to process in parallel
- Each subagent creates ~15 source pages + 5 concept pages + 2 entity pages
- Update index.md with all new pages
- Append to log.md: "[date] ingest | 50 lecture PDFs → 45 source pages, 12 concept pages"
Scenario 3: Creating Synthesis Materials
User: "I want to study Mathematics II for 5 days. Create interleaved problems covering Geometry, Series, Taylor-Maclaurin, and Trigonometric Integration."
Agent:
- Load problem-set-synthesis skill
- Read relevant tutorials to understand question formats
- Read concept pages for formulas
- Design 15 interleaved problems
- Create file in
wiki/synthesis/ - Update index.md synthesis section
- Append to log.md: "[date] create | Interleaved mastery problem set"
Tips
- Lecture Note Method: If the user's lecturer emphasizes a specific note-taking method, follow it strictly
- Tutorials are gold: Transcribe tutorial questions fully — they're primary study material
- Cross-reference: Link concept pages to multiple courses when topics overlap
- Status tags: Use
status/seedlingfor new pages,status/evergreenfor mature pages - Parallelize: Always use subagents for multi-file ingestion — it's much faster
- Math Formulas: Use LaTeX ($...$ for inline, $$...$$ for display) for all mathematical expressions
- Lecturer Identification: Look for signatures at end of slides (e.g., "Dr Name", "Thank you, Dr X")
- Empty Files: Remove empty or corrupted files from inbox after noting them
- Archive After Processing: Always move processed files from inbox to raw/ to maintain clean workflow
- Wiki Health: Update the health stats in index.md after each ingestion
- Synthesis Layer: Use synthesis/ for cross-cutting materials like problem sets, exam prep, comparative analyses