Prompt Engineering for KGGSum¶

This page shows how to combine KGGSum documentation with generative AI in a professional, efficient way. The goal is simple: use better prompts, get better explanations, and build real KGGSum skills step by step.

Recommended first step before prompting AI

Before writing detailed prompts, export Full Document as a PDF and provide that PDF to your AI assistant as reference context. This helps the AI learn KGGSum terminology, workflow, and option details first, so later answers are more accurate, consistent, and practical. After the AI reads the PDF, ask it to summarize key modules (association, causation, annotation) before moving to task-specific prompts.

Why this works¶

Generative AI is strongest when you give:

clear context (your input files, goals, module choice)
precise constraints (what you want, and what you do not want)
expected output format (checklist, command draft, interpretation table)

If you ask vague questions, you get vague answers.
If you ask structured questions, you get reusable workflows.

Core prompt framework (use this every time)¶

Use this structure for most KGGSum tasks:

Role: You are an expert in GWAS post-analysis using KGGSum.
Goal: [What I want to achieve]
Context:
- Module: [association / causation / annotation]
- Inputs I have: [sum-file, ref-gty-file, optional xQTL/gene score files]
- Genome build: [hg19 / hg38]
- Constraints: [compute resources, time, reproducibility requirements]
What I need from you:
1) Recommend the best KGGSum strategy and why.
2) Give a runnable command template with placeholders.
3) Explain key output files and a scan order.
4) List common mistakes and how to avoid them.
Output format:
- Decision summary
- Command block
- Output interpretation checklist
- Risk control checklist

Phase 1: Beginner (understand and run safely)¶

Learning target¶

Understand mandatory inputs and command skeletons.
Successfully run one end-to-end example per module.
Learn to read top-level outputs without over-interpreting.

Prompt template¶

I am a beginner in KGGSum.
Please teach me [association/causation/annotation] with a minimal safe example.

My current files:
- sum-file: [path + columns]
- ref-gty-file: [path + build]
- optional: [none or list]

Please provide:
1) the minimum command I can run first,
2) what each argument means in one line,
3) which output file to inspect first,
4) 3 common beginner errors.

Exit criteria¶

You can explain what --sum-file, --ref-gty-file, and --output do.
You can run one module and identify its primary result file.

Phase 2: Intermediate (choose methods correctly)¶

Learning target¶

Select the right method based on scientific question.
Compare options within each module.
Produce reproducible analysis notes.

Prompt template¶

Help me choose among KGGSum methods for this task.

Scientific question:
[e.g., prioritize genes vs infer causality vs annotate/filter variants]

Data conditions:
- sample ancestry: [EUR/EAS/...]
- available resources: [xQTL, gene-score, db resources]
- target output: [gene list / causal effect / filtered variants]

Please return:
1) method comparison table (recommended vs alternatives),
2) command template for the recommended method,
3) sensitivity-analysis suggestions,
4) interpretation do/don't list.

Exit criteria¶

You can justify why a method is chosen.
You can produce a reproducible command + interpretation checklist.

Phase 3: Advanced (robust interpretation and troubleshooting)¶

Learning target¶

Diagnose failed runs and suspicious results.
Evaluate robustness (thresholds, assumptions, consistency).
Build reusable prompt+command playbooks.

Prompt template¶

I ran KGGSum and got unexpected results.
Please do a structured diagnosis.

Run context:
- module + command: [paste]
- logs/errors: [paste]
- key output snapshot: [paste 10-30 lines]

Please provide:
1) probable root causes ranked by likelihood,
2) exact command-level fixes,
3) verification tests for each fix,
4) an interpretation safety checklist to avoid false claims.

Exit criteria¶

You can debug command/data mismatches quickly.
You can distinguish “no signal” from “pipeline/setup issue.”

Module-specific prompt starters¶

Association¶

For KGGSum association, help me decide between gene-level analysis only vs adding cell/drug/spatial enrichment.
Return:
1) decision logic,
2) exact options to add,
3) output scan order from gene-level to enrichment-level.

Causation¶

For KGGSum causation, help me choose between EMIC and PCMR based on my exposure/outcome design.
Return:
1) method choice criteria,
2) command template,
3) how to interpret pleiotropy-related outputs safely.

Annotation¶

For KGGSum annotation, design a practical filtering strategy using gene features + frequency + functional annotations.
Return:
1) recommended annotation databases/fields,
2) filter thresholds with rationale,
3) quality-control checks for retained variants.

Professional prompt writing rules¶

Be explicit about genome build (hg19 vs hg38).
Always provide column mappings for summary files (cp12Cols, pbsCols).
Ask AI to separate “facts from docs” and “assumptions/suggestions.”
Force structured outputs (table/checklist) for reproducibility.
Ask for “failure modes + mitigations” before running large jobs.

Common anti-patterns (avoid these)¶

“Give me one best command” without data context.
Ignoring genome build consistency across resources.
Reading only one p-value column without effect-size context.
Asking AI to interpret results without sharing method and thresholds.

Weekly mastery routine (4-week plan)¶

Week 1¶

Run one minimal example for each module.
Build a personal glossary of 20 core options.

Week 2¶

For one real project, ask AI for method selection rationale.
Produce one reproducible command sheet.

Week 3¶

Run sensitivity checks (thresholds, optional flags, alternative methods).
Record differences and interpretation boundaries.

Week 4¶

Build your own “KGGSum Copilot Prompt Library”:
task scoping prompt
command drafting prompt
output interpretation prompt
troubleshooting prompt

By the end of week 4, you should be able to design, execute, interpret, and debug KGGSum analyses with high confidence.

Final checklist: Are you at power-user level?¶

You can map research questions to the right KGGSum module quickly.
You can generate clean commands with correct inputs/build/options.
You can interpret outputs in a staged, method-aware way.
You can troubleshoot failed runs systematically.
You maintain reusable prompt templates for repeated analysis tasks.