[Week4] Self-RAG 기반 트러블슈팅 RAG - baseline 대비 6개 질문 비교 by latteeea · Pull Request #19 · ApptiveDev/rag-agent-study

latteeea · 2026-06-19T07:59:42Z

구현 요약

week3에서 구축한 Domain RAG(2-step)를 Self-RAG 스타일 Agentic RAG로 리팩토링했다. Week3 baseline은 검색->생성의 단일 파이프라인이었고, 검색 miss나 관련 없는 문서가 섞여도 그대로 답변을 생성했다. 또한 qa.py 프롬프트상 중간 단계가 비어있으면 [추측]으로 채우도록 되어있어 인과 관계 질문에서 환각/단절이 발생할 수 있었다.

이번 주차에서는 LangGraph로 아래 보정 루프를 추가했고

Query Rewrite - 자연어 질문을 기록 속 기술 키워드로 확장 후 재검색
Retrieval Grader - 검색 문서 관련성 필터링
Hallucination Checker - 생성 답변이 문서에 근거하는지 검증
Answer Grader - 질문(특히 A->B->C 인과)을 실제로 해결했는지 검증

(BM25/RRF, Cohere reranker는 이번 주 범위에서 제외함 - retrieval 레이어 튜닝과 agentic 보정을 동시에 하지 않기 위해)

Agentic RAG Pipeline

question
-> retrieve (FAISS, markdown chunking)
-> grade_documents (retrieval grader)
    ├─ relevant docs 없음 → transform_query (query rewrite) → retrieve
    └─ relevant docs 있음 → generate
        -> grade_generation (hallucination + answer grader)
            ├─ hallucination → generate (재생성)
            ├─ not relevant → transform_query → retrieve
            └─ relevant → END

추가한 컴포넌트

Query Rewrite

감정적/비유적 표현 -> 기술 용어로 확장
ex) '커서가 멍청하다' -> 'surface-level fix'

Retrieval Grader

검색된 각 chunk가 질문에 도움이 되는지 yes/no 평가
관련 없는 문서 제거 후, 전부 탈락 시 query write 트리거

Hallucination Checker

생성 답변이 검색 문서에 근거하는지 검증

Answer Grader

인과 질문이면 A->B->C 체인이 이어지는지, 원인-해결-인사이트 구분 여부 평가
미달 시 query write 로 돌아가기

Causal Reconstruction

단순 요약이 아닌 A->B->C로 흐름 복원

Baseline vs Agentic RAG 비교

Query	Baseline sources	Agentic sources	개선 여부	해석
Cursor가 왜 멍청해 보였는지 인과적으로 설명해줘	cursor_claude.md, long_polling.md	cursor_claude.md, long_polling.md	-	자연어 질문은 파이프라인 문서로 빗나가기 쉬움. rewrite 후 cursor_claude.md 쪽으로 수렴 기대
X-Username 인증 문제와 surface-level fix의 관계	X-Username.md	X-Username.md	-	baseline은 키워드 일부만 맞아도 답함. agentic은 두 개념의 인과 연결을 요구
long polling으로 서버 다운된 사례	long_polling.md	long_polling.md	-	Week3에서 markdown이 흐름 유지에 유리했던 케이스. grader로 노이즈 chunk 추가 제거 기대
canonical mapping / fragmentation 문제	claude_cursor.md, long_polling.md	claude_cursor.md	-	기술 용어 질문은 rewrite 없이도 검색 가능하나, hallucination check로 근거 없는 설명 차단 기대
외부 API 변경으로 장애가 발생한 사례	X-Username.md, spotify_api.md	X-Username.md, spotify_api.md	-	Week3 비교에서 X-Username이 잘 잡혔던 케이스. agentic은 원인·해결 구분이 더 명확해질 것으로 기대
안드로이드 빌드 배포 에러 사례	android_network_security2.md, spotify_api.md	android_network_security2.md	-	Week3에서 recursive는 pgadmin_killed 노이즈 있었음. grader로 무관 문서 걸러지는지 확인

줄이려고 한 실패

실패 타입	Week3 baseline	Week4 Agentic
검색 miss	질문 그대로 1회 검색	query rewrite → 재검색 (최대 2회)
관련 없는 chunk 입력	k=4 전부 context에 투입	retrieval grader로 필터링
환각 / 추측	빈 단계를 [추측]으로 채움	hallucination checker + 추측 금지 프롬프트
인과 단절	검색된 chunk 나열 수준 답변	causal reconstruction + answer grader

…into latteeea/week1-react-graph

latteeea added 5 commits May 8, 2026 16:33

feat: mock_data 및 tools 3개 + 응답 스키마 작성

6f2a90b

feat: 토크나이저 추가 및 tool description 추가

426b8b3

Merge branch 'main' of https://github.com/ApptiveDev/rag-agent-study …

a51fdff

…into latteeea/week1-react-graph

chore: week3에서 필요한 파일 들고오기

98760c5

feat: nodes.py Agentic RAG 노드로 교체 및 graph.py Self RAG 그래프로 변경

cbadb7d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Week4] Self-RAG 기반 트러블슈팅 RAG - baseline 대비 6개 질문 비교 #19

[Week4] Self-RAG 기반 트러블슈팅 RAG - baseline 대비 6개 질문 비교 #19
latteeea wants to merge 5 commits into
mainfrom
latteeea/week4-agentic-rag

latteeea commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

latteeea commented Jun 19, 2026

구현 요약

Agentic RAG Pipeline

추가한 컴포넌트

Baseline vs Agentic RAG 비교

줄이려고 한 실패

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant