Aug 2025 - Present
AI & Data Engineering Student Developer
Developing data pipelines and retrieval systems for an AI support platform that transforms unstructured archives into searchable, citation-backed knowledge.
- Built ETL pipelines to scrape, parse, and clean 20+ years of AMBER mailing list archives and tutorials using BeautifulSoup, regex, and Python, producing 42,000+ structured documents for downstream vector indexing.
- Developed automated data ingestion workflows using Apache Airflow to schedule daily scraping, HTML to JSON conversion, metadata extraction, and ChromaDB indexing tasks.
- Implemented a RAG pipeline using ChromaDB vector search and FAISS-based PDF retrieval to generate responses with automated citation links, served via a Llama 3.1 inference server on Linux HPC infrastructure.


