Trustworthy AI Legal and Governmental Content Validator
This project is part of CS5374 Software Verification and Validation at Texas Tech University, Department of Computer Science. The project builds a Trustworthy AI validation pipeline that verifies legal and governmental content against authoritative Texas open data before any AI system presents it to users.
Project Personnel
| Role |
Name |
Contact |
Link |
| Student |
Scott Weeden |
sweeden@ttu.edu |
LinkedIn |
| Instructor |
Dr. Akbar S. Namin |
akbar.namin@ttu.edu |
TTU CS Faculty |
Course: CS 5374 - Software Verification and Validation | Spring 2026
Repository: CS5374 Software V&V on GitHub
The Problem: AI Hallucination in Legal Research
Large language models and retrieval-augmented generation (RAG) systems are increasingly used to answer questions about legal and governmental matters, yet they frequently hallucinate or return outdated information. Invented judge names, non-existent laws, fabricated election details, or unverified court documents can cause serious harm: incorrect legal advice, misrepresentation of officials, and invalid citations presented as binding authority.
Notable Cases & Studies
| Reference |
Description |
| Stanford Law - “Hallucination-Free?” |
Assessing the reliability of leading AI legal research tools (link) |
| Stanford Law - “Large Legal Fictions” |
Profiling legal hallucinations in large language models (link) |
| Mata v. Avianca, Inc. |
Court sanctions for AI-generated fake citations (link) |
Content Verification Architecture
The pipeline verifies content across seven domains using authoritative Texas sources:
| Content Type |
Authoritative Texas Source |
Verification Approach |
| Legal/Government News |
Trust lists, NewsGuard, AllSides |
URL and domain checks; cross-check with Texas agency press releases |
| Judges |
Texas judicial directories, court rosters |
Name and court match against official rosters |
| Elected Officials |
data.texas.gov, data.capitol.texas.gov |
Match names, offices, and terms to official datasets |
| Elections & Opponents |
Capitol Data Portal (116+ datasets) |
Certified filings and results; candidate/race verification |
| Laws & Ordinances |
Texas Legislature, agency sites |
Citation and text match against official code/statute datasets |
| Court Documents |
Texas court datasets, e-filing metadata |
Docket/case ID and document metadata validation |
| Legal Templates |
Texas court form registries |
Checksum and version validation against known good templates |
Note: Federal sources (CourtListener, PACER, FEC) are not used as primary authorities; the focus is on Texas legal and governmental sources via the Texas Open Data Portal and Capitol Data Portal.
LangGraph Validation Pipeline
The system uses LangChain and LangGraph to implement validator agents that ingest, parse, and verify content at each stage.
Pipeline Stages
- Content Extraction - Parse and normalize input content
- Schema Validation - Verify required fields and data types
- Source Authority Check - Validate against allowlist of authoritative domains
- Temporal Validation - Verify timestamps are valid and current
- Content Verification - Cross-reference with authoritative Texas databases
- Provenance Attribution - Attach verification metadata to all outputs
Key Features
- Schema validation at every stage
- Source grounding requirements before indexing
- Pass/fail routing with retry or escalation
- Provenance metadata on all outputs (source, date, verification status)
- Only content that passes verification is indexed and made available to downstream AI systems
AI Agent Design Patterns
The project leverages 21 AI agent design patterns documented in the DesignPatterns repository:
| Pattern |
Application in Project |
| 01 - Prompt Chaining |
Sequential validation steps where output of one step feeds the next |
| 02 - Routing |
Content-type classification directing to appropriate validators |
| 03 - Parallelization |
Concurrent checking of multiple authoritative sources |
| 04 - Reflection |
Self-verification of validator outputs before acceptance |
| 05 - Tool Use |
Integration with Texas Open Data APIs |
| 06 - Planning |
Multi-step validation workflows for complex content |
| 07 - Multi-Agent Collaboration |
Distributed validators for different content types |
| 08 - Memory Management |
Preservation of verification context across pipeline |
| 09 - Learning & Adaptation |
Pattern learning from verification results |
| 10 - Model Context Protocol |
Standardized context passing between agents |
| 11 - Goal Setting |
Defining verification thresholds and targets |
| 12 - Exception Handling |
Graceful handling of API failures and timeouts |
| 13 - Human-in-the-Loop |
Escalation paths for ambiguous verifications |
| 14 - RAG (Retrieval-Augmented Generation) |
Ground truth retrieval from Texas databases |
| 15 - Inter-Agent Communication |
Coordination between validator nodes |
| 16 - Resource-Aware Optimization |
Efficient API usage and rate limiting |
| 17 - Reasoning Techniques |
Logical inference for complex content types |
| 18 - Guardrails & Safety |
Input sanitization and output validation |
| 19 - Evaluation & Monitoring |
Metrics tracking with LangSmith/Phoenix |
| 20 - Prioritization |
Queue management for verification tasks |
| 21 - Exploration & Discovery |
New source identification and validation |
Experiments & Evaluation
Experiment 1: Baseline Hallucination Rate
- Objective: Establish baseline hallucination rate for LLM on Texas legal citation tasks without verification
- Data: Held-out set of legal questions with ground-truth citations from data.texas.gov
- Metrics: Proportion of generated citations that do not exist, are misattributed, or have incorrect holdings
- Tools: LangSmith, Ragas, DeepEval, promptfoo
Experiment 2: Verification Pipeline Effectiveness
- Objective: Measure impact of Texas-data-backed validator on hallucination and citation quality
- Setup: Same Texas legal citation tasks passed through LLM, then through validator
- Metrics: Precision, Recall, Hallucination rate reduction
- Tools: Ragas, LangSmith, Phoenix, DeepEval
Experiment 3: Validator Nodes vs Post-Hoc Verification
- Objective: Compare LangGraph with validator nodes (reject/retry on failure) vs simple RAG with post-hoc filtering
- Metrics: End-to-end accuracy and latency
- Tools: LangSmith, promptfoo, TruLens, Phoenix
Experiment 4: Security Red-Team Evaluation
- Objective: Apply adversarial testing to the validator pipeline
- Tests: Prompt injection, data exfiltration, source spoofing
- Tools: GARAK (NVIDIA), LLM Canary, TextAttack, OpenAttack
- Deliverable: Documented vulnerabilities and mitigations
LLM / AI Evaluation & Testing
| Tool |
Role |
| DeepEval |
LLM evaluation metrics (faithfulness, answer relevancy) |
| promptfoo |
Local testing of LLM application behavior; regression tests |
| Ragas |
RAG evaluation using Texas-sourced context and ground truth |
| LangSmith |
Tracing and evaluation of LangChain/LangGraph runs |
| TruLens |
LLM evaluation framework for monitoring pipeline |
| Phoenix (Arize) |
Observability and hallucination detection |
| Langfuse |
Open-source LLM engineering platform |
Adversarial & Robustness Testing
| Tool |
Role |
| GARAK (NVIDIA) |
Red-teaming and vulnerability scanning |
| LLM Canary |
Security benchmarking test suite |
| TextAttack |
Adversarial attacks on validator inputs |
| OpenAttack |
Textual adversarial attack toolkit |
Systematic Testing & Error Analysis
| Tool |
Role |
| Azimuth |
Dataset and error analysis for classifiers |
| CheckList |
Behavioral NLP testing for validator logic |
| Deepchecks |
Validation of ML/data components |
Project Deliverables
First Round
- Design document and threat model for validation pipeline
- Implemented validator modules:
- Legal news source verification
- Judge name verification against Texas court rosters
- Elected official verification against Texas data portals
- LangGraph prototype with validator nodes
- Unit and integration tests with documented coverage
Final Round
- Full validator suite (7 content types)
- Integration with at least one authoritative Texas source per content type
- End-to-end RAG pipeline with validation gates
- Security review report (GARAK red-team results)
- Evaluation metrics report (Experiments 1-3)
Texas Open Data Resources
| Category |
Links |
| Core Frameworks |
LangChain, LangGraph, LangSmith |
| Evaluation |
DeepEval, Ragas, TruLens, Phoenix, Langfuse |
| Testing |
promptfoo, CheckList, Deepchecks |
| Security |
GARAK, LLM Canary, TextAttack, OpenAttack |
Course Alignment
| Syllabus Week |
Course Topic |
Project Alignment |
| Week 1 |
Introduction to V&V |
Problem definition; verification vs. validation |
| Week 2 |
Adequacy criterion |
Defining “verified” criteria (Texas source, schema, provenance) |
| Week 4 |
Black-box testing |
Black-box validation of LLM outputs against Texas data |
| Week 12 |
Formal verification |
Formal spec for verification contracts |
| Week 13 |
Model checking |
Model checking for validator correctness |
| Week 16 |
LangSmith + hands-on |
LangSmith tracing and evaluation |
| Week 17 |
AI/LLM/RL evaluation |
LLM evaluation and hallucination detection |
About This Project
This validation pipeline ensures that information about legal news, judges, elected officials, elections, laws, court documents, and legal templates is grounded in verifiable data from Texas government open data portals, with clear provenance on every output.
Disclaimer: This is an academic project for CS5374 Software Verification and Validation at Texas Tech University. The validation pipeline is designed to reduce hallucination rates but should not be used as the sole source for legal research or advice.
Sources & Verification
Verified: 2026-03-21